From b695f60bc68caa7881f594ea552f5016189d93e0 Mon Sep 17 00:00:00 2001 From: philipp Date: Thu, 9 Apr 2026 14:52:56 +0200 Subject: [PATCH] Strengthen quote lifecycle proof guardrails Proof: The active quote-lifecycle turn now explicitly requires semantic truth, forbidden overclaim labels, negative invariant tests, and naming cleanup so submission evidence cannot be presented as trade completion. Assumptions: The repo can derive truthful first-pass quote lifecycle states from existing decision and execution records, and prevention should be enforced by code and tests rather than reviewer memory. Still fake: This commit only tightens the live planning docs; the implementation work to rename legacy fields and derive lifecycle-backed summaries is still outstanding. --- IMPLEMENTATION.md | 61 ++++++++++++++++++++++++++++++++++++++++++++--- PROOF.md | 46 +++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+), 3 deletions(-) diff --git a/IMPLEMENTATION.md b/IMPLEMENTATION.md index 0b8a851..ec2546e 100644 --- a/IMPLEMENTATION.md +++ b/IMPLEMENTATION.md @@ -15,6 +15,8 @@ Replace ambiguous quote and decision wording with a truthful per-quote lifecycle - Prefer one explicit lifecycle derivation path shared by backend and dashboard over ad hoc page-specific wording. - Do not invent downstream certainty where durable evidence is absent. - Remove `Actionable` completely from operator-facing copy. +- Do not use stronger operator words than the durable evidence supports. +- Fix semantic bugs by changing both the code and the tests that encoded the wrong assumption. ## Problem statement for this turn The current dashboard still forces operators to infer too much: @@ -29,6 +31,11 @@ The repo already stores enough of the real lifecycle to do better: - emitted command id - execution result status and result code +The recent submission-versus-trade bug showed the broader prevention gap: +- wrong semantics were encoded in backend query names +- the dashboard rendered stronger claims than the evidence supported +- tests asserted the wrong meaning instead of protecting the truth + The turn therefore needs to improve: - lifecycle derivation - durable reason mapping @@ -53,6 +60,8 @@ The first mandatory states are: - `Awaiting outcome` - `Completed` +These states must become the repo-owned evidence vocabulary for operator surfaces and summaries. + Suggested meanings: - `Filtered` quote never entered the active trade path or was excluded before strategy decision @@ -94,13 +103,34 @@ If the exact reason is missing: - expose `reason_unknown` - keep the row truthful instead of synthesizing an explanation +## Semantic guardrails for this turn + +### 1. Ban overloaded certainty words unless evidence justifies them +Review and remove or rename operator-facing and backend terms such as: +- `successfulTradeCount` +- `lastSuccessfulTradeAt` +- `loadSuccessfulTradesPage` +- `trade_asset_changes` +- any UI label implying trade completion, realized asset movement, or PnL attribution from mere submission evidence + +Allowed wording must be tied to the strongest durable evidence actually present. + +### 2. Encode semantic invariants in code and tests +Add explicit checks and regression coverage for: +- `submitted != completed` +- `submitted != realized asset delta` +- executor blocking != strategy rejection +- no UI label may claim trade completion from submission-only evidence + +Negative tests are required, not just positive-path tests. + ## Backend changes ### 1. Add a lifecycle derivation helper Create or extend a backend module that derives quote lifecycle from: - recent trade decisions - recent execution results -- successful trade records +- later terminal records only where they are real - any available quote-status or venue result surfaces It should emit a normalized row object with: @@ -121,9 +151,11 @@ The backend should no longer leave the frontend to infer execution from isolated For each recent quote/decision row: - attach the matching execution result by `command_id`, `decision_id`, or `quote_id` -- attach successful-trade or later terminal evidence where available +- attach terminal completion or non-fill evidence only where it is genuinely available - expose whether the row is strategy-only, strategy-plus-command, or strategy-plus-execution +As part of this phase, rename misleading backend aggregation helpers and payload fields where practical so code meaning matches evidence meaning. + ### 3. Preserve operator drilldown identifiers Ensure the bootstrap payload exposes: - full quote id @@ -142,6 +174,8 @@ Remove `Actionable` from: Replace it with explicit state labels driven by lifecycle derivation. +Also remove or rename any remaining wording that presents submission evidence as trade completion or realized asset movement. + ### 5. Make recent rows self-explanatory For each row, render: - primary lifecycle state @@ -179,6 +213,7 @@ If a strategy-only summary remains, it must be visually separate from per-quote Inspect quote and system surfaces for similar ambiguity and align the wording if they expose the same concepts. Do not let one page say `Submitted` while another page still says `Actionable` for the same row. +Do not let one page say `trade` while another page only has `submitted` evidence for the same row. ## Data and state edge cases - Strategy decision exists, no command emitted: @@ -193,6 +228,8 @@ Do not let one page say `Submitted` while another page still says `Actionable` f render as `Submitted` or `Awaiting outcome` - Successful trade summary exists but no explicit per-quote completion event: only promote to `Completed` where the durable linkage is real +- Submission evidence appears in profitability or summary widgets: + rename and constrain those widgets so they do not imply realized trade truth ## Concrete implementation order @@ -200,11 +237,14 @@ Do not let one page say `Submitted` while another page still says `Actionable` f - inspect current durable decision and execution payloads - write the normalized lifecycle state mapping - define forbidden and allowed operator labels +- list misleading backend and UI names that must be changed +- define the semantic invariant tests up front ### Phase 2. Implement backend aggregation - derive unified recent lifecycle rows - expose full identifiers and reason codes - keep old consumers working until the frontend is switched +- rename misleading submission-as-trade helpers and summary fields where touched ### Phase 3. Update Strategy page rendering - replace verdict column with lifecycle state @@ -216,11 +256,13 @@ Do not let one page say `Submitted` while another page still says `Actionable` f - remove `Actionable` - align supporting labels - ensure blocked vs rejected vs submitted are clearly distinct +- ensure submitted vs completed vs realized asset movement are clearly distinct ### Phase 5. Validate with live recent rows - verify a row rejected due to executor disarmed renders as blocked with reason - verify a submitted row renders as submitted - verify quote ids can be copied and used for tracing +- verify no submission-only row is rendered as a trade, completion, or realized asset delta ## Test plan - unit tests for lifecycle derivation from: @@ -228,14 +270,18 @@ Do not let one page say `Submitted` while another page still says `Actionable` f - executor-disarmed rows - submission-failed rows - submitted rows +- unit tests for semantic invariants: + - submitted rows must not be counted as completed trades + - submission-only rows must not render as asset deltas - dashboard bootstrap tests for: - forbidden `Actionable` removal - explicit lifecycle labels - reason text rendering - identifier exposure +- dashboard summary tests for renamed or narrowed submission metrics - frontend component tests if needed for copy affordance or row rendering logic -No lifecycle ambiguity fix is complete without a regression test proving the old ambiguous wording cannot return. +No lifecycle ambiguity fix is complete without a regression test proving the old ambiguous wording or overclaim cannot return. ## Validation checklist against the proof - `Actionable` no longer appears @@ -243,9 +289,18 @@ No lifecycle ambiguity fix is complete without a regression test proving the old - recent blocked rows explain why they did not trade - recent submitted rows show that they were submitted - quote ids are directly usable from the dashboard +- submission-only evidence is no longer rendered as trade completion or asset delta truth ## Failure modes to plan for - the backend joins rows incorrectly and attributes the wrong execution result - the UI uses softer wording than the backend lifecycle state - older rows lack enough evidence and the UI pretends certainty - ids are still truncated without a copy or expand path +- misleading legacy names remain in place and create new semantic drift later + +## Truth review checklist for this turn +For every operator-facing label, metric, table, or badge touched in this proof: +- what exact durable table or event backs it? +- what is the strongest claim the evidence supports? +- what wording would overclaim certainty? +- what negative regression test locks that boundary in? diff --git a/PROOF.md b/PROOF.md index d72565c..cf173b8 100644 --- a/PROOF.md +++ b/PROOF.md @@ -12,6 +12,7 @@ The concrete target is the live NEAR Intents BTC/EURe system: - execution submission must be distinguishable from strategy approval - blocked, rejected, submitted, failed, and not-filled paths must be visibly different - quote identifiers must be directly usable by operators for tracing and support +- operator-facing labels must not overclaim beyond the durable evidence actually stored ## Why this is a meaningful architecture test The current operator surface still fails a core thesis requirement: @@ -22,6 +23,13 @@ The current operator surface still fails a core thesis requirement: That is not just a copy problem. It is an observability gap in the trading product itself. If the system cannot explain a quote outcome precisely, execution is outrunning observability. +The immediate trigger for this turn is a real semantic failure: +- the dashboard treated `trade_execution_results.status = submitted` as a successful trade +- recent submitted quote terms were rendered as if they were realized asset deltas +- tests passed because that wrong assumption had been encoded into the test suite itself + +This turn must therefore fix both the UI and the conditions that allowed the mistake through. + ## Hypothesis `unrip` becomes more trustworthy if quote handling is modeled and rendered as an explicit lifecycle instead of a single strategy verdict: - strategy evaluation is only one stage in the lifecycle @@ -49,9 +57,12 @@ The turn passes only if an operator can inspect a quote and immediately understa - The existing durable stores already contain enough information for at least the current live path through strategy decision and executor result. - Some downstream venue-outcome states may still be partially fake or unavailable for older rows; if so, the UI must say that plainly rather than implying more certainty. - The immediate turn should prioritize truthful lifecycle explanation over broader analytics such as markout or long-window outcome attribution. +- The prevention strategy must be implemented in repo code and tests rather than left to reviewer judgment alone. ## Turn-shaping rules - `Actionable` is forbidden as an operator-facing state or label. +- Operator-facing labels must not overstate event certainty. +- Terms such as `trade`, `success`, `filled`, `completed`, `profit`, and `asset delta` are forbidden unless backed by a durable event explicitly representing that fact. - Do not add a second analytics product. Stay focused on per-quote lifecycle truth for the live active pair. - Do not invent lifecycle states that cannot be backed by durable repo-owned evidence. - If a state transition is inferred rather than durably observed, the UI must make that distinction explicit. @@ -78,6 +89,19 @@ For each visible quote or decision row, the operator must be able to identify th Exact labels may vary, but they must be specific and mutually meaningful. +The repo must adopt a hard evidence-state vocabulary for this turn. At minimum: +- `observed` +- `evaluated` +- `command_emitted` +- `rejected` +- `blocked` +- `submitted` +- `failed` +- `awaiting_outcome` +- `completed` + +No operator surface may collapse these into softer or stronger claims. + ### Reason truth Each non-terminal or terminal non-trade state must expose a clear decisive reason, such as: - unsupported pair @@ -111,16 +135,27 @@ Any replacement label must answer a concrete operator question, such as: - was it submitted? - did it fail? +### Semantic invariants +The implementation and tests must enforce at least these invariants: +- `submitted` is not `completed` +- `submitted` is not a realized asset delta +- executor-side blocking is not strategy rejection +- stronger labels must not be rendered from weaker evidence + +These invariants are proof-critical, not optional cleanup. + ## Definition of done - `Actionable` is removed from operator-facing dashboard surfaces. - A durable quote lifecycle model exists in repo-owned code and is used by the dashboard. - At least the current live quote path through strategy decision and executor result is rendered coherently per quote. - The operator can tell, from one row, why a recent quote did or did not turn into a submitted trade. - Quote ids are copyable and clearly visible enough for tracing. +- overloaded backend and UI names that imply stronger certainty than the evidence supports are removed or renamed - Regression tests cover at least: - strategy-approved but executor-disarmed rows - submitted rows - forbidden ambiguous label removal + - forbidden semantic overclaims such as treating `submitted` as `completed` For this turn to close with status `passed`, the specific operator question: @@ -134,6 +169,7 @@ must be answerable directly from the dashboard for recent rows without needing m - direct evidence that a submitted row renders as submitted - direct evidence that quote ids are directly usable for tracing - automated test evidence for lifecycle derivation and dashboard rendering +- automated test evidence for negative semantic invariants, especially `submitted != completed` ## Failure conditions - `Actionable` still appears in the dashboard @@ -141,6 +177,8 @@ must be answerable directly from the dashboard for recent rows without needing m - non-trade rows still lack a decisive reason - quote ids remain hidden or non-copyable - lifecycle labels are only cosmetic and not backed by durable repo-owned state +- the repo still uses `trade` or `asset delta` language for mere submission evidence +- tests still encode the old overclaiming semantics ## Current real before this turn - strategy decisions are stored durably @@ -152,3 +190,11 @@ must be answerable directly from the dashboard for recent rows without needing m - full venue settlement attribution for all historic trades - generalized quote analytics beyond lifecycle explanation - multi-venue lifecycle harmonization + +## Prevention requirements for this proof +- Add a truth-review checklist to the implementation work: + - what exact durable table or event backs this label? + - what is the strongest claim the evidence supports? + - what would make this wording false? + - what negative regression test prevents that overclaim from returning? +- Separate lifecycle derivation from summary metrics so summaries are computed from lifecycle states rather than raw convenience queries.