Strengthen quote lifecycle proof guardrails
All checks were successful
deploy / deploy (push) Successful in 31s

Proof: The active quote-lifecycle turn now explicitly requires semantic truth, forbidden overclaim labels, negative invariant tests, and naming cleanup so submission evidence cannot be presented as trade completion.

Assumptions: The repo can derive truthful first-pass quote lifecycle states from existing decision and execution records, and prevention should be enforced by code and tests rather than reviewer memory.

Still fake: This commit only tightens the live planning docs; the implementation work to rename legacy fields and derive lifecycle-backed summaries is still outstanding.
This commit is contained in:
philipp 2026-04-09 14:52:56 +02:00
parent 7ddefb500e
commit b695f60bc6
2 changed files with 104 additions and 3 deletions

View file

@ -15,6 +15,8 @@ Replace ambiguous quote and decision wording with a truthful per-quote lifecycle
- Prefer one explicit lifecycle derivation path shared by backend and dashboard over ad hoc page-specific wording. - Prefer one explicit lifecycle derivation path shared by backend and dashboard over ad hoc page-specific wording.
- Do not invent downstream certainty where durable evidence is absent. - Do not invent downstream certainty where durable evidence is absent.
- Remove `Actionable` completely from operator-facing copy. - Remove `Actionable` completely from operator-facing copy.
- Do not use stronger operator words than the durable evidence supports.
- Fix semantic bugs by changing both the code and the tests that encoded the wrong assumption.
## Problem statement for this turn ## Problem statement for this turn
The current dashboard still forces operators to infer too much: The current dashboard still forces operators to infer too much:
@ -29,6 +31,11 @@ The repo already stores enough of the real lifecycle to do better:
- emitted command id - emitted command id
- execution result status and result code - execution result status and result code
The recent submission-versus-trade bug showed the broader prevention gap:
- wrong semantics were encoded in backend query names
- the dashboard rendered stronger claims than the evidence supported
- tests asserted the wrong meaning instead of protecting the truth
The turn therefore needs to improve: The turn therefore needs to improve:
- lifecycle derivation - lifecycle derivation
- durable reason mapping - durable reason mapping
@ -53,6 +60,8 @@ The first mandatory states are:
- `Awaiting outcome` - `Awaiting outcome`
- `Completed` - `Completed`
These states must become the repo-owned evidence vocabulary for operator surfaces and summaries.
Suggested meanings: Suggested meanings:
- `Filtered` - `Filtered`
quote never entered the active trade path or was excluded before strategy decision quote never entered the active trade path or was excluded before strategy decision
@ -94,13 +103,34 @@ If the exact reason is missing:
- expose `reason_unknown` - expose `reason_unknown`
- keep the row truthful instead of synthesizing an explanation - keep the row truthful instead of synthesizing an explanation
## Semantic guardrails for this turn
### 1. Ban overloaded certainty words unless evidence justifies them
Review and remove or rename operator-facing and backend terms such as:
- `successfulTradeCount`
- `lastSuccessfulTradeAt`
- `loadSuccessfulTradesPage`
- `trade_asset_changes`
- any UI label implying trade completion, realized asset movement, or PnL attribution from mere submission evidence
Allowed wording must be tied to the strongest durable evidence actually present.
### 2. Encode semantic invariants in code and tests
Add explicit checks and regression coverage for:
- `submitted != completed`
- `submitted != realized asset delta`
- executor blocking != strategy rejection
- no UI label may claim trade completion from submission-only evidence
Negative tests are required, not just positive-path tests.
## Backend changes ## Backend changes
### 1. Add a lifecycle derivation helper ### 1. Add a lifecycle derivation helper
Create or extend a backend module that derives quote lifecycle from: Create or extend a backend module that derives quote lifecycle from:
- recent trade decisions - recent trade decisions
- recent execution results - recent execution results
- successful trade records - later terminal records only where they are real
- any available quote-status or venue result surfaces - any available quote-status or venue result surfaces
It should emit a normalized row object with: It should emit a normalized row object with:
@ -121,9 +151,11 @@ The backend should no longer leave the frontend to infer execution from isolated
For each recent quote/decision row: For each recent quote/decision row:
- attach the matching execution result by `command_id`, `decision_id`, or `quote_id` - attach the matching execution result by `command_id`, `decision_id`, or `quote_id`
- attach successful-trade or later terminal evidence where available - attach terminal completion or non-fill evidence only where it is genuinely available
- expose whether the row is strategy-only, strategy-plus-command, or strategy-plus-execution - expose whether the row is strategy-only, strategy-plus-command, or strategy-plus-execution
As part of this phase, rename misleading backend aggregation helpers and payload fields where practical so code meaning matches evidence meaning.
### 3. Preserve operator drilldown identifiers ### 3. Preserve operator drilldown identifiers
Ensure the bootstrap payload exposes: Ensure the bootstrap payload exposes:
- full quote id - full quote id
@ -142,6 +174,8 @@ Remove `Actionable` from:
Replace it with explicit state labels driven by lifecycle derivation. Replace it with explicit state labels driven by lifecycle derivation.
Also remove or rename any remaining wording that presents submission evidence as trade completion or realized asset movement.
### 5. Make recent rows self-explanatory ### 5. Make recent rows self-explanatory
For each row, render: For each row, render:
- primary lifecycle state - primary lifecycle state
@ -179,6 +213,7 @@ If a strategy-only summary remains, it must be visually separate from per-quote
Inspect quote and system surfaces for similar ambiguity and align the wording if they expose the same concepts. Inspect quote and system surfaces for similar ambiguity and align the wording if they expose the same concepts.
Do not let one page say `Submitted` while another page still says `Actionable` for the same row. Do not let one page say `Submitted` while another page still says `Actionable` for the same row.
Do not let one page say `trade` while another page only has `submitted` evidence for the same row.
## Data and state edge cases ## Data and state edge cases
- Strategy decision exists, no command emitted: - Strategy decision exists, no command emitted:
@ -193,6 +228,8 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
render as `Submitted` or `Awaiting outcome` render as `Submitted` or `Awaiting outcome`
- Successful trade summary exists but no explicit per-quote completion event: - Successful trade summary exists but no explicit per-quote completion event:
only promote to `Completed` where the durable linkage is real only promote to `Completed` where the durable linkage is real
- Submission evidence appears in profitability or summary widgets:
rename and constrain those widgets so they do not imply realized trade truth
## Concrete implementation order ## Concrete implementation order
@ -200,11 +237,14 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
- inspect current durable decision and execution payloads - inspect current durable decision and execution payloads
- write the normalized lifecycle state mapping - write the normalized lifecycle state mapping
- define forbidden and allowed operator labels - define forbidden and allowed operator labels
- list misleading backend and UI names that must be changed
- define the semantic invariant tests up front
### Phase 2. Implement backend aggregation ### Phase 2. Implement backend aggregation
- derive unified recent lifecycle rows - derive unified recent lifecycle rows
- expose full identifiers and reason codes - expose full identifiers and reason codes
- keep old consumers working until the frontend is switched - keep old consumers working until the frontend is switched
- rename misleading submission-as-trade helpers and summary fields where touched
### Phase 3. Update Strategy page rendering ### Phase 3. Update Strategy page rendering
- replace verdict column with lifecycle state - replace verdict column with lifecycle state
@ -216,11 +256,13 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
- remove `Actionable` - remove `Actionable`
- align supporting labels - align supporting labels
- ensure blocked vs rejected vs submitted are clearly distinct - ensure blocked vs rejected vs submitted are clearly distinct
- ensure submitted vs completed vs realized asset movement are clearly distinct
### Phase 5. Validate with live recent rows ### Phase 5. Validate with live recent rows
- verify a row rejected due to executor disarmed renders as blocked with reason - verify a row rejected due to executor disarmed renders as blocked with reason
- verify a submitted row renders as submitted - verify a submitted row renders as submitted
- verify quote ids can be copied and used for tracing - verify quote ids can be copied and used for tracing
- verify no submission-only row is rendered as a trade, completion, or realized asset delta
## Test plan ## Test plan
- unit tests for lifecycle derivation from: - unit tests for lifecycle derivation from:
@ -228,14 +270,18 @@ Do not let one page say `Submitted` while another page still says `Actionable` f
- executor-disarmed rows - executor-disarmed rows
- submission-failed rows - submission-failed rows
- submitted rows - submitted rows
- unit tests for semantic invariants:
- submitted rows must not be counted as completed trades
- submission-only rows must not render as asset deltas
- dashboard bootstrap tests for: - dashboard bootstrap tests for:
- forbidden `Actionable` removal - forbidden `Actionable` removal
- explicit lifecycle labels - explicit lifecycle labels
- reason text rendering - reason text rendering
- identifier exposure - identifier exposure
- dashboard summary tests for renamed or narrowed submission metrics
- frontend component tests if needed for copy affordance or row rendering logic - frontend component tests if needed for copy affordance or row rendering logic
No lifecycle ambiguity fix is complete without a regression test proving the old ambiguous wording cannot return. No lifecycle ambiguity fix is complete without a regression test proving the old ambiguous wording or overclaim cannot return.
## Validation checklist against the proof ## Validation checklist against the proof
- `Actionable` no longer appears - `Actionable` no longer appears
@ -243,9 +289,18 @@ No lifecycle ambiguity fix is complete without a regression test proving the old
- recent blocked rows explain why they did not trade - recent blocked rows explain why they did not trade
- recent submitted rows show that they were submitted - recent submitted rows show that they were submitted
- quote ids are directly usable from the dashboard - quote ids are directly usable from the dashboard
- submission-only evidence is no longer rendered as trade completion or asset delta truth
## Failure modes to plan for ## Failure modes to plan for
- the backend joins rows incorrectly and attributes the wrong execution result - the backend joins rows incorrectly and attributes the wrong execution result
- the UI uses softer wording than the backend lifecycle state - the UI uses softer wording than the backend lifecycle state
- older rows lack enough evidence and the UI pretends certainty - older rows lack enough evidence and the UI pretends certainty
- ids are still truncated without a copy or expand path - ids are still truncated without a copy or expand path
- misleading legacy names remain in place and create new semantic drift later
## Truth review checklist for this turn
For every operator-facing label, metric, table, or badge touched in this proof:
- what exact durable table or event backs it?
- what is the strongest claim the evidence supports?
- what wording would overclaim certainty?
- what negative regression test locks that boundary in?

View file

@ -12,6 +12,7 @@ The concrete target is the live NEAR Intents BTC/EURe system:
- execution submission must be distinguishable from strategy approval - execution submission must be distinguishable from strategy approval
- blocked, rejected, submitted, failed, and not-filled paths must be visibly different - blocked, rejected, submitted, failed, and not-filled paths must be visibly different
- quote identifiers must be directly usable by operators for tracing and support - quote identifiers must be directly usable by operators for tracing and support
- operator-facing labels must not overclaim beyond the durable evidence actually stored
## Why this is a meaningful architecture test ## Why this is a meaningful architecture test
The current operator surface still fails a core thesis requirement: The current operator surface still fails a core thesis requirement:
@ -22,6 +23,13 @@ The current operator surface still fails a core thesis requirement:
That is not just a copy problem. It is an observability gap in the trading product itself. If the system cannot explain a quote outcome precisely, execution is outrunning observability. That is not just a copy problem. It is an observability gap in the trading product itself. If the system cannot explain a quote outcome precisely, execution is outrunning observability.
The immediate trigger for this turn is a real semantic failure:
- the dashboard treated `trade_execution_results.status = submitted` as a successful trade
- recent submitted quote terms were rendered as if they were realized asset deltas
- tests passed because that wrong assumption had been encoded into the test suite itself
This turn must therefore fix both the UI and the conditions that allowed the mistake through.
## Hypothesis ## Hypothesis
`unrip` becomes more trustworthy if quote handling is modeled and rendered as an explicit lifecycle instead of a single strategy verdict: `unrip` becomes more trustworthy if quote handling is modeled and rendered as an explicit lifecycle instead of a single strategy verdict:
- strategy evaluation is only one stage in the lifecycle - strategy evaluation is only one stage in the lifecycle
@ -49,9 +57,12 @@ The turn passes only if an operator can inspect a quote and immediately understa
- The existing durable stores already contain enough information for at least the current live path through strategy decision and executor result. - The existing durable stores already contain enough information for at least the current live path through strategy decision and executor result.
- Some downstream venue-outcome states may still be partially fake or unavailable for older rows; if so, the UI must say that plainly rather than implying more certainty. - Some downstream venue-outcome states may still be partially fake or unavailable for older rows; if so, the UI must say that plainly rather than implying more certainty.
- The immediate turn should prioritize truthful lifecycle explanation over broader analytics such as markout or long-window outcome attribution. - The immediate turn should prioritize truthful lifecycle explanation over broader analytics such as markout or long-window outcome attribution.
- The prevention strategy must be implemented in repo code and tests rather than left to reviewer judgment alone.
## Turn-shaping rules ## Turn-shaping rules
- `Actionable` is forbidden as an operator-facing state or label. - `Actionable` is forbidden as an operator-facing state or label.
- Operator-facing labels must not overstate event certainty.
- Terms such as `trade`, `success`, `filled`, `completed`, `profit`, and `asset delta` are forbidden unless backed by a durable event explicitly representing that fact.
- Do not add a second analytics product. Stay focused on per-quote lifecycle truth for the live active pair. - Do not add a second analytics product. Stay focused on per-quote lifecycle truth for the live active pair.
- Do not invent lifecycle states that cannot be backed by durable repo-owned evidence. - Do not invent lifecycle states that cannot be backed by durable repo-owned evidence.
- If a state transition is inferred rather than durably observed, the UI must make that distinction explicit. - If a state transition is inferred rather than durably observed, the UI must make that distinction explicit.
@ -78,6 +89,19 @@ For each visible quote or decision row, the operator must be able to identify th
Exact labels may vary, but they must be specific and mutually meaningful. Exact labels may vary, but they must be specific and mutually meaningful.
The repo must adopt a hard evidence-state vocabulary for this turn. At minimum:
- `observed`
- `evaluated`
- `command_emitted`
- `rejected`
- `blocked`
- `submitted`
- `failed`
- `awaiting_outcome`
- `completed`
No operator surface may collapse these into softer or stronger claims.
### Reason truth ### Reason truth
Each non-terminal or terminal non-trade state must expose a clear decisive reason, such as: Each non-terminal or terminal non-trade state must expose a clear decisive reason, such as:
- unsupported pair - unsupported pair
@ -111,16 +135,27 @@ Any replacement label must answer a concrete operator question, such as:
- was it submitted? - was it submitted?
- did it fail? - did it fail?
### Semantic invariants
The implementation and tests must enforce at least these invariants:
- `submitted` is not `completed`
- `submitted` is not a realized asset delta
- executor-side blocking is not strategy rejection
- stronger labels must not be rendered from weaker evidence
These invariants are proof-critical, not optional cleanup.
## Definition of done ## Definition of done
- `Actionable` is removed from operator-facing dashboard surfaces. - `Actionable` is removed from operator-facing dashboard surfaces.
- A durable quote lifecycle model exists in repo-owned code and is used by the dashboard. - A durable quote lifecycle model exists in repo-owned code and is used by the dashboard.
- At least the current live quote path through strategy decision and executor result is rendered coherently per quote. - At least the current live quote path through strategy decision and executor result is rendered coherently per quote.
- The operator can tell, from one row, why a recent quote did or did not turn into a submitted trade. - The operator can tell, from one row, why a recent quote did or did not turn into a submitted trade.
- Quote ids are copyable and clearly visible enough for tracing. - Quote ids are copyable and clearly visible enough for tracing.
- overloaded backend and UI names that imply stronger certainty than the evidence supports are removed or renamed
- Regression tests cover at least: - Regression tests cover at least:
- strategy-approved but executor-disarmed rows - strategy-approved but executor-disarmed rows
- submitted rows - submitted rows
- forbidden ambiguous label removal - forbidden ambiguous label removal
- forbidden semantic overclaims such as treating `submitted` as `completed`
For this turn to close with status `passed`, the specific operator question: For this turn to close with status `passed`, the specific operator question:
@ -134,6 +169,7 @@ must be answerable directly from the dashboard for recent rows without needing m
- direct evidence that a submitted row renders as submitted - direct evidence that a submitted row renders as submitted
- direct evidence that quote ids are directly usable for tracing - direct evidence that quote ids are directly usable for tracing
- automated test evidence for lifecycle derivation and dashboard rendering - automated test evidence for lifecycle derivation and dashboard rendering
- automated test evidence for negative semantic invariants, especially `submitted != completed`
## Failure conditions ## Failure conditions
- `Actionable` still appears in the dashboard - `Actionable` still appears in the dashboard
@ -141,6 +177,8 @@ must be answerable directly from the dashboard for recent rows without needing m
- non-trade rows still lack a decisive reason - non-trade rows still lack a decisive reason
- quote ids remain hidden or non-copyable - quote ids remain hidden or non-copyable
- lifecycle labels are only cosmetic and not backed by durable repo-owned state - lifecycle labels are only cosmetic and not backed by durable repo-owned state
- the repo still uses `trade` or `asset delta` language for mere submission evidence
- tests still encode the old overclaiming semantics
## Current real before this turn ## Current real before this turn
- strategy decisions are stored durably - strategy decisions are stored durably
@ -152,3 +190,11 @@ must be answerable directly from the dashboard for recent rows without needing m
- full venue settlement attribution for all historic trades - full venue settlement attribution for all historic trades
- generalized quote analytics beyond lifecycle explanation - generalized quote analytics beyond lifecycle explanation
- multi-venue lifecycle harmonization - multi-venue lifecycle harmonization
## Prevention requirements for this proof
- Add a truth-review checklist to the implementation work:
- what exact durable table or event backs this label?
- what is the strongest claim the evidence supports?
- what would make this wording false?
- what negative regression test prevents that overclaim from returning?
- Separate lifecycle derivation from summary metrics so summaries are computed from lifecycle states rather than raw convenience queries.