Strengthen quote lifecycle proof guardrails

Proof: The active quote-lifecycle turn now explicitly requires semantic truth, forbidden overclaim labels, negative invariant tests, and naming cleanup so submission evidence cannot be presented as trade completion. Assumptions: The repo can derive truthful first-pass quote lifecycle states from existing decision and execution records, and prevention should be enforced by code and tests rather than reviewer memory. Still fake: This commit only tightens the live planning docs; the implementation work to rename legacy fields and derive lifecycle-backed summaries is still outstanding.
2026-04-09 14:52:56 +02:00 · 2026-04-09 14:52:56 +02:00 · b695f60bc6
commit b695f60bc6
parent 7ddefb500e
2 changed files with 104 additions and 3 deletions
--- a/IMPLEMENTATION.md
+++ b/IMPLEMENTATION.md
@ -15,6 +15,8 @@ Replace ambiguous quote and decision wording with a truthful per-quote lifecycle
 - Prefer one explicit lifecycle derivation path shared by backend and dashboard over ad hoc page-specific wording.
 - Do not invent downstream certainty where durable evidence is absent.
 - Remove `Actionable` completely from operator-facing copy.
 - Do not use stronger operator words than the durable evidence supports.
 - Fix semantic bugs by changing both the code and the tests that encoded the wrong assumption.
 ## Problem statement for this turn
 The current dashboard still forces operators to infer too much:
@ -29,6 +31,11 @@ The repo already stores enough of the real lifecycle to do better:
 - emitted command id
 - execution result status and result code
 The recent submission-versus-trade bug showed the broader prevention gap:
 - wrong semantics were encoded in backend query names
 - the dashboard rendered stronger claims than the evidence supported
 - tests asserted the wrong meaning instead of protecting the truth
 The turn therefore needs to improve:
 - lifecycle derivation
 - durable reason mapping
@ -53,6 +60,8 @@ The first mandatory states are:
 - `Awaiting outcome`
 - `Completed`
 These states must become the repo-owned evidence vocabulary for operator surfaces and summaries.
 Suggested meanings:
 - `Filtered`
  quote never entered the active trade path or was excluded before strategy decision
@ -94,13 +103,34 @@ If the exact reason is missing:
 - expose `reason_unknown`
 - keep the row truthful instead of synthesizing an explanation
 ## Semantic guardrails for this turn
 ### 1. Ban overloaded certainty words unless evidence justifies them
 Review and remove or rename operator-facing and backend terms such as:
 - `successfulTradeCount`
 - `lastSuccessfulTradeAt`
 - `loadSuccessfulTradesPage`
 - `trade_asset_changes`
 - any UI label implying trade completion, realized asset movement, or PnL attribution from mere submission evidence
 Allowed wording must be tied to the strongest durable evidence actually present.
 ### 2. Encode semantic invariants in code and tests
 Add explicit checks and regression coverage for:
 - `submitted != completed`
 - `submitted != realized asset delta`
 - executor blocking != strategy rejection
 - no UI label may claim trade completion from submission-only evidence
 Negative tests are required, not just positive-path tests.
 ## Backend changes
 ### 1. Add a lifecycle derivation helper
 Create or extend a backend module that derives quote lifecycle from:
 - recent trade decisions
 - recent execution results
- successful trade records
+- later terminal records only where they are real
 - any available quote-status or venue result surfaces
 It should emit a normalized row object with:
@ -121,9 +151,11 @@ The backend should no longer leave the frontend to infer execution from isolated
 For each recent quote/decision row:
 - attach the matching execution result by `command_id`, `decision_id`, or `quote_id`
- attach successful-trade or later terminal evidence where available
+- attach terminal completion or non-fill evidence only where it is genuinely available
 - expose whether the row is strategy-only, strategy-plus-command, or strategy-plus-execution
 As part of this phase, rename misleading backend aggregation helpers and payload fields where practical so code meaning matches evidence meaning.
 ### 3. Preserve operator drilldown identifiers
 Ensure the bootstrap payload exposes:
 - full quote id
@ -142,6 +174,8 @@ Remove `Actionable` from:
 Replace it with explicit state labels driven by lifecycle derivation.
 Also remove or rename any remaining wording that presents submission evidence as trade completion or realized asset movement.
 ### 5. Make recent rows self-explanatory
 For each row, render:
 - primary lifecycle state
@ -179,6 +213,7 @@ If a strategy-only summary remains, it must be visually separate from per-quote
 Inspect quote and system surfaces for similar ambiguity and align the wording if they expose the same concepts.
 Do not let one page say `Submitted` while another page still says `Actionable` for the same row.
 Do not let one page say `trade` while another page only has `submitted` evidence for the same row.
 ## Data and state edge cases
 - Strategy decision exists, no command emitted:
@ -193,6 +228,8 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
  render as `Submitted` or `Awaiting outcome`
 - Successful trade summary exists but no explicit per-quote completion event:
  only promote to `Completed` where the durable linkage is real
 - Submission evidence appears in profitability or summary widgets:
  rename and constrain those widgets so they do not imply realized trade truth
 ## Concrete implementation order
@ -200,11 +237,14 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
 - inspect current durable decision and execution payloads
 - write the normalized lifecycle state mapping
 - define forbidden and allowed operator labels
 - list misleading backend and UI names that must be changed
 - define the semantic invariant tests up front
 ### Phase 2. Implement backend aggregation
 - derive unified recent lifecycle rows
 - expose full identifiers and reason codes
 - keep old consumers working until the frontend is switched
 - rename misleading submission-as-trade helpers and summary fields where touched
 ### Phase 3. Update Strategy page rendering
 - replace verdict column with lifecycle state
@ -216,11 +256,13 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
 - remove `Actionable`
 - align supporting labels
 - ensure blocked vs rejected vs submitted are clearly distinct
 - ensure submitted vs completed vs realized asset movement are clearly distinct
 ### Phase 5. Validate with live recent rows
 - verify a row rejected due to executor disarmed renders as blocked with reason
 - verify a submitted row renders as submitted
 - verify quote ids can be copied and used for tracing
 - verify no submission-only row is rendered as a trade, completion, or realized asset delta
 ## Test plan
 - unit tests for lifecycle derivation from:
@ -228,14 +270,18 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
  - executor-disarmed rows
  - submission-failed rows
  - submitted rows
 - unit tests for semantic invariants:
  - submitted rows must not be counted as completed trades
  - submission-only rows must not render as asset deltas
 - dashboard bootstrap tests for:
  - forbidden `Actionable` removal
  - explicit lifecycle labels
  - reason text rendering
  - identifier exposure
 - dashboard summary tests for renamed or narrowed submission metrics
 - frontend component tests if needed for copy affordance or row rendering logic
-No lifecycle ambiguity fix is complete without a regression test proving the old ambiguous wording cannot return.
+No lifecycle ambiguity fix is complete without a regression test proving the old ambiguous wording or overclaim cannot return.
 ## Validation checklist against the proof
 - `Actionable` no longer appears
@ -243,9 +289,18 @@ No lifecycle ambiguity fix is complete without a regression test proving the old
 - recent blocked rows explain why they did not trade
 - recent submitted rows show that they were submitted
 - quote ids are directly usable from the dashboard
 - submission-only evidence is no longer rendered as trade completion or asset delta truth
 ## Failure modes to plan for
 - the backend joins rows incorrectly and attributes the wrong execution result
 - the UI uses softer wording than the backend lifecycle state
 - older rows lack enough evidence and the UI pretends certainty
 - ids are still truncated without a copy or expand path
 - misleading legacy names remain in place and create new semantic drift later
 ## Truth review checklist for this turn
 For every operator-facing label, metric, table, or badge touched in this proof:
 - what exact durable table or event backs it?
 - what is the strongest claim the evidence supports?
 - what wording would overclaim certainty?
 - what negative regression test locks that boundary in?
--- a/PROOF.md
+++ b/PROOF.md
@ -12,6 +12,7 @@ The concrete target is the live NEAR Intents BTC/EURe system:
 - execution submission must be distinguishable from strategy approval
 - blocked, rejected, submitted, failed, and not-filled paths must be visibly different
 - quote identifiers must be directly usable by operators for tracing and support
 - operator-facing labels must not overclaim beyond the durable evidence actually stored
 ## Why this is a meaningful architecture test
 The current operator surface still fails a core thesis requirement:
@ -22,6 +23,13 @@ The current operator surface still fails a core thesis requirement:
 That is not just a copy problem. It is an observability gap in the trading product itself. If the system cannot explain a quote outcome precisely, execution is outrunning observability.
 The immediate trigger for this turn is a real semantic failure:
 - the dashboard treated `trade_execution_results.status = submitted` as a successful trade
 - recent submitted quote terms were rendered as if they were realized asset deltas
 - tests passed because that wrong assumption had been encoded into the test suite itself
 This turn must therefore fix both the UI and the conditions that allowed the mistake through.
 ## Hypothesis
 `unrip` becomes more trustworthy if quote handling is modeled and rendered as an explicit lifecycle instead of a single strategy verdict:
 - strategy evaluation is only one stage in the lifecycle
@ -49,9 +57,12 @@ The turn passes only if an operator can inspect a quote and immediately understa
 - The existing durable stores already contain enough information for at least the current live path through strategy decision and executor result.
 - Some downstream venue-outcome states may still be partially fake or unavailable for older rows; if so, the UI must say that plainly rather than implying more certainty.
 - The immediate turn should prioritize truthful lifecycle explanation over broader analytics such as markout or long-window outcome attribution.
 - The prevention strategy must be implemented in repo code and tests rather than left to reviewer judgment alone.
 ## Turn-shaping rules
 - `Actionable` is forbidden as an operator-facing state or label.
 - Operator-facing labels must not overstate event certainty.
 - Terms such as `trade`, `success`, `filled`, `completed`, `profit`, and `asset delta` are forbidden unless backed by a durable event explicitly representing that fact.
 - Do not add a second analytics product. Stay focused on per-quote lifecycle truth for the live active pair.
 - Do not invent lifecycle states that cannot be backed by durable repo-owned evidence.
 - If a state transition is inferred rather than durably observed, the UI must make that distinction explicit.
@ -78,6 +89,19 @@ For each visible quote or decision row, the operator must be able to identify th
 Exact labels may vary, but they must be specific and mutually meaningful.
 The repo must adopt a hard evidence-state vocabulary for this turn. At minimum:
 - `observed`
 - `evaluated`
 - `command_emitted`
 - `rejected`
 - `blocked`
 - `submitted`
 - `failed`
 - `awaiting_outcome`
 - `completed`
 No operator surface may collapse these into softer or stronger claims.
 ### Reason truth
 Each non-terminal or terminal non-trade state must expose a clear decisive reason, such as:
 - unsupported pair
@ -111,16 +135,27 @@ Any replacement label must answer a concrete operator question, such as:
 - was it submitted?
 - did it fail?
 ### Semantic invariants
 The implementation and tests must enforce at least these invariants:
 - `submitted` is not `completed`
 - `submitted` is not a realized asset delta
 - executor-side blocking is not strategy rejection
 - stronger labels must not be rendered from weaker evidence
 These invariants are proof-critical, not optional cleanup.
 ## Definition of done
 - `Actionable` is removed from operator-facing dashboard surfaces.
 - A durable quote lifecycle model exists in repo-owned code and is used by the dashboard.
 - At least the current live quote path through strategy decision and executor result is rendered coherently per quote.
 - The operator can tell, from one row, why a recent quote did or did not turn into a submitted trade.
 - Quote ids are copyable and clearly visible enough for tracing.
 - overloaded backend and UI names that imply stronger certainty than the evidence supports are removed or renamed
 - Regression tests cover at least:
  - strategy-approved but executor-disarmed rows
  - submitted rows
  - forbidden ambiguous label removal
  - forbidden semantic overclaims such as treating `submitted` as `completed`
 For this turn to close with status `passed`, the specific operator question:
@ -134,6 +169,7 @@ must be answerable directly from the dashboard for recent rows without needing m
 - direct evidence that a submitted row renders as submitted
 - direct evidence that quote ids are directly usable for tracing
 - automated test evidence for lifecycle derivation and dashboard rendering
 - automated test evidence for negative semantic invariants, especially `submitted != completed`
 ## Failure conditions
 - `Actionable` still appears in the dashboard
@ -141,6 +177,8 @@ must be answerable directly from the dashboard for recent rows without needing m
 - non-trade rows still lack a decisive reason
 - quote ids remain hidden or non-copyable
 - lifecycle labels are only cosmetic and not backed by durable repo-owned state
 - the repo still uses `trade` or `asset delta` language for mere submission evidence
 - tests still encode the old overclaiming semantics
 ## Current real before this turn
 - strategy decisions are stored durably
@ -152,3 +190,11 @@ must be answerable directly from the dashboard for recent rows without needing m
 - full venue settlement attribution for all historic trades
 - generalized quote analytics beyond lifecycle explanation
 - multi-venue lifecycle harmonization
 ## Prevention requirements for this proof
 - Add a truth-review checklist to the implementation work:
  - what exact durable table or event backs this label?
  - what is the strongest claim the evidence supports?
  - what would make this wording false?
  - what negative regression test prevents that overclaim from returning?
 - Separate lifecycle derivation from summary metrics so summaries are computed from lifecycle states rather than raw convenience queries.