3.9 KiB
Polymarket Websocket Sample Recorder
This document describes the bounded Checkpoint 10B sample path. It is separate from the live Kubernetes REST collector and does not replace it.
Scope
The recorder captures public Polymarket market websocket messages for active BTC up/down outcome tokens and writes REST /books checkpoints during the same run. It does not trade, sign requests, use private keys, require API keys, or handle private account data.
Discovery
Run the existing discovery first so token IDs are current:
python scripts/discover_polymarket_btc_markets.py
The recorder reads data/discovery/polymarket_btc_markets_latest.json, selects active BTC up/down markets, and preserves market_slug, condition_id, token_id, outcome, and end_time_utc in every raw websocket envelope.
Sample Run
Default bounded run:
python scripts/record_polymarket_ws_sample.py --config config/polymarket_ws_sample.example.yaml
Useful overrides:
python scripts/record_polymarket_ws_sample.py --market-limit 2 --duration-seconds 150 --rest-checkpoint-interval-seconds 30
The default endpoint is:
wss://ws-subscriptions-clob.polymarket.com/ws/market
The subscription body is:
{"assets_ids":["<token_id>"],"type":"market","custom_feature_enabled":true}
For multiple tokens, assets_ids contains all selected Up/Down token IDs.
Raw Websocket Output
Websocket text messages are written as gzip JSONL under:
data/ws_sample/polymarket/ws_raw/<run_id>/polymarket_ws_raw_<run_id>.jsonl.gz
Each row preserves the raw text payload in raw_text, plus parsed JSON in json when parsing succeeds. Unknown message shapes are retained and counted in the manifest.
Important envelope fields include:
received_at_utcsession_idconnection_sequencemessage_sequenceglobal_message_sequencewebsocket.urlsubscription.assets_idstokens_trackedopcodepayload_length_bytespayload_sha256raw_textjsonjson_errorclassified_event_types
REST Checkpoints
REST checkpoints are written as gzip JSONL under:
data/ws_sample/polymarket/rest_checkpoints/<run_id>/polymarket_rest_checkpoints_<run_id>.jsonl.gz
Each row records one POST to:
https://clob.polymarket.com/books
The request body contains the same token IDs as the websocket subscription. The response JSON is preserved in response.raw_response_json, with safe response headers only. Secret-bearing headers are not recorded.
Manifest And Gate
The checkpoint manifest is:
data/manifests/checkpoint_010b_ws_raw_sample.json
The report is:
reports/checkpoints/checkpoint_010b_ws_raw_sample.md
WS_RAW_SAMPLE_PASS requires at least one selected BTC market with both outcome tokens, at least one parseable websocket text message, at least two successful REST checkpoints, parseable gzip JSONL outputs, and checksum summaries.
If the websocket connects but no market messages arrive, the recorder must gate as WS_RAW_SAMPLE_NEEDS_REVIEW rather than pretending the websocket path is proven.
Checkpoint 10D Runtime Direction
The long-running runtime recorder is scripts/collect_polymarket_ws_orderbooks.py.
It is separate from the bounded 10B sample script. The runtime recorder is
intended to run as orderbooks-ws-recorder beside the existing REST collector.
It preserves raw websocket messages under raw_orderbooks/polymarket/ws_raw/,
keeps REST /books checkpoints under raw_orderbooks/polymarket/rest_checkpoints/,
rotates closed gzip archives hourly, writes manifests under /var/lib/orderbooks/manifests,
and records reconnect, stale-feed, REST failure, parser, and divergence counters.
Current gzip files use hidden .open names until closed. The uploader skips
open/temporary files and deletes local archives only when --cleanup-after-verify
is used after rclone verification succeeds.