From b695f60bc68caa7881f594ea552f5016189d93e0 Mon Sep 17 00:00:00 2001
From: philipp <klein.philipp@gmail.com>
Date: Thu, 9 Apr 2026 14:52:56 +0200
Subject: [PATCH] Strengthen quote lifecycle proof guardrails

Proof: The active quote-lifecycle turn now explicitly requires semantic truth, forbidden overclaim labels, negative invariant tests, and naming cleanup so submission evidence cannot be presented as trade completion.

Assumptions: The repo can derive truthful first-pass quote lifecycle states from existing decision and execution records, and prevention should be enforced by code and tests rather than reviewer memory.

Still fake: This commit only tightens the live planning docs; the implementation work to rename legacy fields and derive lifecycle-backed summaries is still outstanding.
---
 IMPLEMENTATION.md | 61 ++++++++++++++++++++++++++++++++++++++++++++---
 PROOF.md          | 46 +++++++++++++++++++++++++++++++++++
 2 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/IMPLEMENTATION.md b/IMPLEMENTATION.md
index 0b8a851..ec2546e 100644
--- a/IMPLEMENTATION.md
+++ b/IMPLEMENTATION.md
@@ -15,6 +15,8 @@ Replace ambiguous quote and decision wording with a truthful per-quote lifecycle
 - Prefer one explicit lifecycle derivation path shared by backend and dashboard over ad hoc page-specific wording.
 - Do not invent downstream certainty where durable evidence is absent.
 - Remove `Actionable` completely from operator-facing copy.
+- Do not use stronger operator words than the durable evidence supports.
+- Fix semantic bugs by changing both the code and the tests that encoded the wrong assumption.
 
 ## Problem statement for this turn
 The current dashboard still forces operators to infer too much:
@@ -29,6 +31,11 @@ The repo already stores enough of the real lifecycle to do better:
 - emitted command id
 - execution result status and result code
 
+The recent submission-versus-trade bug showed the broader prevention gap:
+- wrong semantics were encoded in backend query names
+- the dashboard rendered stronger claims than the evidence supported
+- tests asserted the wrong meaning instead of protecting the truth
+
 The turn therefore needs to improve:
 - lifecycle derivation
 - durable reason mapping
@@ -53,6 +60,8 @@ The first mandatory states are:
 - `Awaiting outcome`
 - `Completed`
 
+These states must become the repo-owned evidence vocabulary for operator surfaces and summaries.
+
 Suggested meanings:
 - `Filtered`
   quote never entered the active trade path or was excluded before strategy decision
@@ -94,13 +103,34 @@ If the exact reason is missing:
 - expose `reason_unknown`
 - keep the row truthful instead of synthesizing an explanation
 
+## Semantic guardrails for this turn
+
+### 1. Ban overloaded certainty words unless evidence justifies them
+Review and remove or rename operator-facing and backend terms such as:
+- `successfulTradeCount`
+- `lastSuccessfulTradeAt`
+- `loadSuccessfulTradesPage`
+- `trade_asset_changes`
+- any UI label implying trade completion, realized asset movement, or PnL attribution from mere submission evidence
+
+Allowed wording must be tied to the strongest durable evidence actually present.
+
+### 2. Encode semantic invariants in code and tests
+Add explicit checks and regression coverage for:
+- `submitted != completed`
+- `submitted != realized asset delta`
+- executor blocking != strategy rejection
+- no UI label may claim trade completion from submission-only evidence
+
+Negative tests are required, not just positive-path tests.
+
 ## Backend changes
 
 ### 1. Add a lifecycle derivation helper
 Create or extend a backend module that derives quote lifecycle from:
 - recent trade decisions
 - recent execution results
-- successful trade records
+- later terminal records only where they are real
 - any available quote-status or venue result surfaces
 
 It should emit a normalized row object with:
@@ -121,9 +151,11 @@ The backend should no longer leave the frontend to infer execution from isolated
 
 For each recent quote/decision row:
 - attach the matching execution result by `command_id`, `decision_id`, or `quote_id`
-- attach successful-trade or later terminal evidence where available
+- attach terminal completion or non-fill evidence only where it is genuinely available
 - expose whether the row is strategy-only, strategy-plus-command, or strategy-plus-execution
 
+As part of this phase, rename misleading backend aggregation helpers and payload fields where practical so code meaning matches evidence meaning.
+
 ### 3. Preserve operator drilldown identifiers
 Ensure the bootstrap payload exposes:
 - full quote id
@@ -142,6 +174,8 @@ Remove `Actionable` from:
 
 Replace it with explicit state labels driven by lifecycle derivation.
 
+Also remove or rename any remaining wording that presents submission evidence as trade completion or realized asset movement.
+
 ### 5. Make recent rows self-explanatory
 For each row, render:
 - primary lifecycle state
@@ -179,6 +213,7 @@ If a strategy-only summary remains, it must be visually separate from per-quote
 Inspect quote and system surfaces for similar ambiguity and align the wording if they expose the same concepts.
 
 Do not let one page say `Submitted` while another page still says `Actionable` for the same row.
+Do not let one page say `trade` while another page only has `submitted` evidence for the same row.
 
 ## Data and state edge cases
 - Strategy decision exists, no command emitted:
@@ -193,6 +228,8 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
   render as `Submitted` or `Awaiting outcome`
 - Successful trade summary exists but no explicit per-quote completion event:
   only promote to `Completed` where the durable linkage is real
+- Submission evidence appears in profitability or summary widgets:
+  rename and constrain those widgets so they do not imply realized trade truth
 
 ## Concrete implementation order
 
@@ -200,11 +237,14 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
 - inspect current durable decision and execution payloads
 - write the normalized lifecycle state mapping
 - define forbidden and allowed operator labels
+- list misleading backend and UI names that must be changed
+- define the semantic invariant tests up front
 
 ### Phase 2. Implement backend aggregation
 - derive unified recent lifecycle rows
 - expose full identifiers and reason codes
 - keep old consumers working until the frontend is switched
+- rename misleading submission-as-trade helpers and summary fields where touched
 
 ### Phase 3. Update Strategy page rendering
 - replace verdict column with lifecycle state
@@ -216,11 +256,13 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
 - remove `Actionable`
 - align supporting labels
 - ensure blocked vs rejected vs submitted are clearly distinct
+- ensure submitted vs completed vs realized asset movement are clearly distinct
 
 ### Phase 5. Validate with live recent rows
 - verify a row rejected due to executor disarmed renders as blocked with reason
 - verify a submitted row renders as submitted
 - verify quote ids can be copied and used for tracing
+- verify no submission-only row is rendered as a trade, completion, or realized asset delta
 
 ## Test plan
 - unit tests for lifecycle derivation from:
@@ -228,14 +270,18 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
   - executor-disarmed rows
   - submission-failed rows
   - submitted rows
+- unit tests for semantic invariants:
+  - submitted rows must not be counted as completed trades
+  - submission-only rows must not render as asset deltas
 - dashboard bootstrap tests for:
   - forbidden `Actionable` removal
   - explicit lifecycle labels
   - reason text rendering
   - identifier exposure
+- dashboard summary tests for renamed or narrowed submission metrics
 - frontend component tests if needed for copy affordance or row rendering logic
 
-No lifecycle ambiguity fix is complete without a regression test proving the old ambiguous wording cannot return.
+No lifecycle ambiguity fix is complete without a regression test proving the old ambiguous wording or overclaim cannot return.
 
 ## Validation checklist against the proof
 - `Actionable` no longer appears
@@ -243,9 +289,18 @@ No lifecycle ambiguity fix is complete without a regression test proving the old
 - recent blocked rows explain why they did not trade
 - recent submitted rows show that they were submitted
 - quote ids are directly usable from the dashboard
+- submission-only evidence is no longer rendered as trade completion or asset delta truth
 
 ## Failure modes to plan for
 - the backend joins rows incorrectly and attributes the wrong execution result
 - the UI uses softer wording than the backend lifecycle state
 - older rows lack enough evidence and the UI pretends certainty
 - ids are still truncated without a copy or expand path
+- misleading legacy names remain in place and create new semantic drift later
+
+## Truth review checklist for this turn
+For every operator-facing label, metric, table, or badge touched in this proof:
+- what exact durable table or event backs it?
+- what is the strongest claim the evidence supports?
+- what wording would overclaim certainty?
+- what negative regression test locks that boundary in?
diff --git a/PROOF.md b/PROOF.md
index d72565c..cf173b8 100644
--- a/PROOF.md
+++ b/PROOF.md
@@ -12,6 +12,7 @@ The concrete target is the live NEAR Intents BTC/EURe system:
 - execution submission must be distinguishable from strategy approval
 - blocked, rejected, submitted, failed, and not-filled paths must be visibly different
 - quote identifiers must be directly usable by operators for tracing and support
+- operator-facing labels must not overclaim beyond the durable evidence actually stored
 
 ## Why this is a meaningful architecture test
 The current operator surface still fails a core thesis requirement:
@@ -22,6 +23,13 @@ The current operator surface still fails a core thesis requirement:
 
 That is not just a copy problem. It is an observability gap in the trading product itself. If the system cannot explain a quote outcome precisely, execution is outrunning observability.
 
+The immediate trigger for this turn is a real semantic failure:
+- the dashboard treated `trade_execution_results.status = submitted` as a successful trade
+- recent submitted quote terms were rendered as if they were realized asset deltas
+- tests passed because that wrong assumption had been encoded into the test suite itself
+
+This turn must therefore fix both the UI and the conditions that allowed the mistake through.
+
 ## Hypothesis
 `unrip` becomes more trustworthy if quote handling is modeled and rendered as an explicit lifecycle instead of a single strategy verdict:
 - strategy evaluation is only one stage in the lifecycle
@@ -49,9 +57,12 @@ The turn passes only if an operator can inspect a quote and immediately understa
 - The existing durable stores already contain enough information for at least the current live path through strategy decision and executor result.
 - Some downstream venue-outcome states may still be partially fake or unavailable for older rows; if so, the UI must say that plainly rather than implying more certainty.
 - The immediate turn should prioritize truthful lifecycle explanation over broader analytics such as markout or long-window outcome attribution.
+- The prevention strategy must be implemented in repo code and tests rather than left to reviewer judgment alone.
 
 ## Turn-shaping rules
 - `Actionable` is forbidden as an operator-facing state or label.
+- Operator-facing labels must not overstate event certainty.
+- Terms such as `trade`, `success`, `filled`, `completed`, `profit`, and `asset delta` are forbidden unless backed by a durable event explicitly representing that fact.
 - Do not add a second analytics product. Stay focused on per-quote lifecycle truth for the live active pair.
 - Do not invent lifecycle states that cannot be backed by durable repo-owned evidence.
 - If a state transition is inferred rather than durably observed, the UI must make that distinction explicit.
@@ -78,6 +89,19 @@ For each visible quote or decision row, the operator must be able to identify th
 
 Exact labels may vary, but they must be specific and mutually meaningful.
 
+The repo must adopt a hard evidence-state vocabulary for this turn. At minimum:
+- `observed`
+- `evaluated`
+- `command_emitted`
+- `rejected`
+- `blocked`
+- `submitted`
+- `failed`
+- `awaiting_outcome`
+- `completed`
+
+No operator surface may collapse these into softer or stronger claims.
+
 ### Reason truth
 Each non-terminal or terminal non-trade state must expose a clear decisive reason, such as:
 - unsupported pair
@@ -111,16 +135,27 @@ Any replacement label must answer a concrete operator question, such as:
 - was it submitted?
 - did it fail?
 
+### Semantic invariants
+The implementation and tests must enforce at least these invariants:
+- `submitted` is not `completed`
+- `submitted` is not a realized asset delta
+- executor-side blocking is not strategy rejection
+- stronger labels must not be rendered from weaker evidence
+
+These invariants are proof-critical, not optional cleanup.
+
 ## Definition of done
 - `Actionable` is removed from operator-facing dashboard surfaces.
 - A durable quote lifecycle model exists in repo-owned code and is used by the dashboard.
 - At least the current live quote path through strategy decision and executor result is rendered coherently per quote.
 - The operator can tell, from one row, why a recent quote did or did not turn into a submitted trade.
 - Quote ids are copyable and clearly visible enough for tracing.
+- overloaded backend and UI names that imply stronger certainty than the evidence supports are removed or renamed
 - Regression tests cover at least:
   - strategy-approved but executor-disarmed rows
   - submitted rows
   - forbidden ambiguous label removal
+  - forbidden semantic overclaims such as treating `submitted` as `completed`
 
 For this turn to close with status `passed`, the specific operator question:
 
@@ -134,6 +169,7 @@ must be answerable directly from the dashboard for recent rows without needing m
 - direct evidence that a submitted row renders as submitted
 - direct evidence that quote ids are directly usable for tracing
 - automated test evidence for lifecycle derivation and dashboard rendering
+- automated test evidence for negative semantic invariants, especially `submitted != completed`
 
 ## Failure conditions
 - `Actionable` still appears in the dashboard
@@ -141,6 +177,8 @@ must be answerable directly from the dashboard for recent rows without needing m
 - non-trade rows still lack a decisive reason
 - quote ids remain hidden or non-copyable
 - lifecycle labels are only cosmetic and not backed by durable repo-owned state
+- the repo still uses `trade` or `asset delta` language for mere submission evidence
+- tests still encode the old overclaiming semantics
 
 ## Current real before this turn
 - strategy decisions are stored durably
@@ -152,3 +190,11 @@ must be answerable directly from the dashboard for recent rows without needing m
 - full venue settlement attribution for all historic trades
 - generalized quote analytics beyond lifecycle explanation
 - multi-venue lifecycle harmonization
+
+## Prevention requirements for this proof
+- Add a truth-review checklist to the implementation work:
+  - what exact durable table or event backs this label?
+  - what is the strongest claim the evidence supports?
+  - what would make this wording false?
+  - what negative regression test prevents that overclaim from returning?
+- Separate lifecycle derivation from summary metrics so summaries are computed from lifecycle states rather than raw convenience queries.