orderbooks/docs/BOOK_RECONSTRUCTION.md
2026-04-19 19:17:56 +02:00

30 lines
2.3 KiB
Markdown

# Book Reconstruction Method
Checkpoint 10C reconstructs order-book state from raw Polymarket market websocket messages captured in Checkpoint 10B.
## Source Of Truth
Raw websocket and REST checkpoint gzip JSONL files are immutable source evidence. Reconstruction outputs are derived and reference the input file paths, line numbers, websocket message sequence spans, and REST checkpoint sequences.
## Applied Events
- `book` and `book_without_event_type` messages initialize or replace the full per-token bid/ask maps.
- `price_change` messages are applied after initialization. Observed `side=BUY` updates bids and `side=SELL` updates asks.
- Observed `size=0` is treated as level removal. Non-zero size replaces the level size at that price.
- `best_bid_ask`, `last_trade_price`, and unrelated `new_market` messages are preserved and counted but do not mutate the book map.
## Comparison
For each REST checkpoint, the reconstructor compares REST `/books` payloads with local websocket state after applying all websocket messages received at or before the REST checkpoint receive time. The comparison includes best bid, best ask, spread, bid/ask level counts, and top 10 levels by default.
## Limits
The sample is short and network timing can produce REST-vs-websocket divergences. Divergence rows include raw websocket and REST references so follow-up can inspect whether differences are timing, feed semantics, or reconstruction defects.
## Checkpoint 10C Divergence Result
The accepted 10C sample produced 20 REST comparison rows: 8 exact top-10 matches and 12 divergent rows. In every divergent row, best bid, best ask, spread, level counts, and top-N price membership matched. The observed divergences were size-only deltas within shared top-N price levels.
Size-only divergence still matters. It can change depth, fillability assumptions, queue-size estimates, and any later answer about whether a hypothetical trade was observable and reproducible from the archived feed.
This result is useful evidence for the websocket path, but it is not production readiness. The sample is bounded, the timing relationship between REST checkpoints and websocket delivery is imperfect, and long-running reconnect, stale-feed, rotation, upload, and alert behavior still need their own checkpoint before deployment.