orderbooks/docs/POLYMARKET_WEBSOCKET_RECORDER.md
2026-04-19 19:17:56 +02:00

3.9 KiB

Polymarket Websocket Sample Recorder

This document describes the bounded Checkpoint 10B sample path. It is separate from the live Kubernetes REST collector and does not replace it.

Scope

The recorder captures public Polymarket market websocket messages for active BTC up/down outcome tokens and writes REST /books checkpoints during the same run. It does not trade, sign requests, use private keys, require API keys, or handle private account data.

Discovery

Run the existing discovery first so token IDs are current:

python scripts/discover_polymarket_btc_markets.py

The recorder reads data/discovery/polymarket_btc_markets_latest.json, selects active BTC up/down markets, and preserves market_slug, condition_id, token_id, outcome, and end_time_utc in every raw websocket envelope.

Sample Run

Default bounded run:

python scripts/record_polymarket_ws_sample.py --config config/polymarket_ws_sample.example.yaml

Useful overrides:

python scripts/record_polymarket_ws_sample.py   --market-limit 2   --duration-seconds 150   --rest-checkpoint-interval-seconds 30

The default endpoint is:

wss://ws-subscriptions-clob.polymarket.com/ws/market

The subscription body is:

{"assets_ids":["<token_id>"],"type":"market","custom_feature_enabled":true}

For multiple tokens, assets_ids contains all selected Up/Down token IDs.

Raw Websocket Output

Websocket text messages are written as gzip JSONL under:

data/ws_sample/polymarket/ws_raw/<run_id>/polymarket_ws_raw_<run_id>.jsonl.gz

Each row preserves the raw text payload in raw_text, plus parsed JSON in json when parsing succeeds. Unknown message shapes are retained and counted in the manifest.

Important envelope fields include:

  • received_at_utc
  • session_id
  • connection_sequence
  • message_sequence
  • global_message_sequence
  • websocket.url
  • subscription.assets_ids
  • tokens_tracked
  • opcode
  • payload_length_bytes
  • payload_sha256
  • raw_text
  • json
  • json_error
  • classified_event_types

REST Checkpoints

REST checkpoints are written as gzip JSONL under:

data/ws_sample/polymarket/rest_checkpoints/<run_id>/polymarket_rest_checkpoints_<run_id>.jsonl.gz

Each row records one POST to:

https://clob.polymarket.com/books

The request body contains the same token IDs as the websocket subscription. The response JSON is preserved in response.raw_response_json, with safe response headers only. Secret-bearing headers are not recorded.

Manifest And Gate

The checkpoint manifest is:

data/manifests/checkpoint_010b_ws_raw_sample.json

The report is:

reports/checkpoints/checkpoint_010b_ws_raw_sample.md

WS_RAW_SAMPLE_PASS requires at least one selected BTC market with both outcome tokens, at least one parseable websocket text message, at least two successful REST checkpoints, parseable gzip JSONL outputs, and checksum summaries.

If the websocket connects but no market messages arrive, the recorder must gate as WS_RAW_SAMPLE_NEEDS_REVIEW rather than pretending the websocket path is proven.

Checkpoint 10D Runtime Direction

The long-running runtime recorder is scripts/collect_polymarket_ws_orderbooks.py. It is separate from the bounded 10B sample script. The runtime recorder is intended to run as orderbooks-ws-recorder beside the existing REST collector. It preserves raw websocket messages under raw_orderbooks/polymarket/ws_raw/, keeps REST /books checkpoints under raw_orderbooks/polymarket/rest_checkpoints/, rotates closed gzip archives hourly, writes manifests under /var/lib/orderbooks/manifests, and records reconnect, stale-feed, REST failure, parser, and divergence counters.

Current gzip files use hidden .open names until closed. The uploader skips open/temporary files and deletes local archives only when --cleanup-after-verify is used after rclone verification succeeds.