# Polymarket Websocket Sample Recorder This document describes the bounded Checkpoint 10B sample path. It is separate from the live Kubernetes REST collector and does not replace it. ## Scope The recorder captures public Polymarket market websocket messages for active BTC up/down outcome tokens and writes REST `/books` checkpoints during the same run. It does not trade, sign requests, use private keys, require API keys, or handle private account data. ## Discovery Run the existing discovery first so token IDs are current: ```bash python scripts/discover_polymarket_btc_markets.py ``` The recorder reads `data/discovery/polymarket_btc_markets_latest.json`, selects active BTC up/down markets, and preserves `market_slug`, `condition_id`, `token_id`, `outcome`, and `end_time_utc` in every raw websocket envelope. ## Sample Run Default bounded run: ```bash python scripts/record_polymarket_ws_sample.py --config config/polymarket_ws_sample.example.yaml ``` Useful overrides: ```bash python scripts/record_polymarket_ws_sample.py --market-limit 2 --duration-seconds 150 --rest-checkpoint-interval-seconds 30 ``` The default endpoint is: ```text wss://ws-subscriptions-clob.polymarket.com/ws/market ``` The subscription body is: ```json {"assets_ids":[""],"type":"market","custom_feature_enabled":true} ``` For multiple tokens, `assets_ids` contains all selected Up/Down token IDs. ## Raw Websocket Output Websocket text messages are written as gzip JSONL under: ```text data/ws_sample/polymarket/ws_raw//polymarket_ws_raw_.jsonl.gz ``` Each row preserves the raw text payload in `raw_text`, plus parsed JSON in `json` when parsing succeeds. Unknown message shapes are retained and counted in the manifest. Important envelope fields include: - `received_at_utc` - `session_id` - `connection_sequence` - `message_sequence` - `global_message_sequence` - `websocket.url` - `subscription.assets_ids` - `tokens_tracked` - `opcode` - `payload_length_bytes` - `payload_sha256` - `raw_text` - `json` - `json_error` - `classified_event_types` ## REST Checkpoints REST checkpoints are written as gzip JSONL under: ```text data/ws_sample/polymarket/rest_checkpoints//polymarket_rest_checkpoints_.jsonl.gz ``` Each row records one POST to: ```text https://clob.polymarket.com/books ``` The request body contains the same token IDs as the websocket subscription. The response JSON is preserved in `response.raw_response_json`, with safe response headers only. Secret-bearing headers are not recorded. ## Manifest And Gate The checkpoint manifest is: ```text data/manifests/checkpoint_010b_ws_raw_sample.json ``` The report is: ```text reports/checkpoints/checkpoint_010b_ws_raw_sample.md ``` `WS_RAW_SAMPLE_PASS` requires at least one selected BTC market with both outcome tokens, at least one parseable websocket text message, at least two successful REST checkpoints, parseable gzip JSONL outputs, and checksum summaries. If the websocket connects but no market messages arrive, the recorder must gate as `WS_RAW_SAMPLE_NEEDS_REVIEW` rather than pretending the websocket path is proven. ## Checkpoint 10D Runtime Direction The long-running runtime recorder is `scripts/collect_polymarket_ws_orderbooks.py`. It is separate from the bounded 10B sample script. The runtime recorder is intended to run as `orderbooks-ws-recorder` beside the existing REST collector. It preserves raw websocket messages under `raw_orderbooks/polymarket/ws_raw/`, keeps REST `/books` checkpoints under `raw_orderbooks/polymarket/rest_checkpoints/`, rotates closed gzip archives hourly, writes manifests under `/var/lib/orderbooks/manifests`, and records reconnect, stale-feed, REST failure, parser, and divergence counters. Current gzip files use hidden `.open` names until closed. The uploader skips open/temporary files and deletes local archives only when `--cleanup-after-verify` is used after rclone verification succeeds.