orderbooks/docs/POLYMARKET_COLLECTOR.md
philipp 284e465588
Some checks failed
deploy / deploy (push) Has been cancelled
Prepare Kubernetes orderbooks deployment
2026-04-18 11:23:28 +02:00

3.5 KiB

Polymarket Collector

Artifact status: valid

Scope

This document covers the Checkpoint 4 bounded raw order-book sample collector.

It does not describe a production service. It does not include normalization, upload, systemd, dashboards, databases, strategies, trading, wallet logic, private keys, API keys, or private endpoints.

Inputs

The collector reads active BTC markets from:

data/discovery/polymarket_btc_markets_latest.json

Checkpoint 3 writes normalized market records with condition_id and tokens preserving the Up and Down outcome-token mapping. The collector uses only those records and does not perform market discovery itself.

If the discovery file is stale or contains no usable active markets, run:

python3 scripts/discover_polymarket_btc_markets.py

Endpoint

The sample uses the public CLOB batch order-book endpoint:

POST https://clob.polymarket.com/books

Request body shape:

[
  {"token_id": "<up_token_id>"},
  {"token_id": "<down_token_id>"}
]

No authentication is used.

Running A Bounded Sample

Default sample command:

python3 scripts/collect_polymarket_orderbooks.py

The default config is:

config/polymarket_collector.example.yaml

The example config is deliberately small:

  • market_limit: 2
  • interval_seconds: 30
  • duration_seconds: 300
  • market_end_safety_seconds: 420

This produces a 5-minute sample for at most 2 markets, fetching both Up and Down outcome tokens by batch request.

Outputs

Raw gzip JSONL snapshots are written under:

data/live_sample/polymarket/orderbooks/<run_id>/

The sample manifest is written to:

data/manifests/orderbook_collector_sample_manifest.json

Files rotate by run for this checkpoint. Hourly rotation is intentionally left for a later sustained runtime checkpoint.

Raw JSONL Envelope

Each gzip JSONL line is a raw-first envelope:

{
  "schema_name": "raw_orderbook_snapshot",
  "schema_version": 1,
  "collector": {
    "name": "polymarket_orderbook_collector",
    "version": "0.1.0"
  },
  "market": {
    "market_name": "polymarket",
    "market_slug": "example",
    "condition_id": "0x...",
    "token_id": "123",
    "outcome": "Up",
    "market_end_time_utc": "2026-04-14T22:00:00Z"
  },
  "collection": {
    "collected_at_utc": "2026-04-14T21:00:00Z",
    "sequence": 1,
    "response_index": 0
  },
  "request": {
    "method": "POST",
    "url": "https://clob.polymarket.com/books",
    "params": null,
    "json_body": [{"token_id": "123"}],
    "status_code": 200,
    "duration_ms": 123,
    "attempts": []
  },
  "raw": {}
}

The raw object is the unmodified order-book object returned by CLOB for that token.

Rate-Limit Handling

The sample is conservative:

  • Uses a small market cap by default.
  • Uses a fixed interval between batch requests.
  • Applies request timeout.
  • Retries 429 and 5xx responses with exponential backoff.
  • Does not use concurrent requests.

Shutdown

SIGINT and SIGTERM set a stop flag. The current request, if any, finishes or times out, the gzip file closes, and the manifest is written with a shutdown warning.

Known Gaps

  • This is a short run-rotated sample, not a daemon.
  • It does not prove 24/7 reliability.
  • It does not implement hourly rotation.
  • It does not refresh discovery during a run.
  • It does not normalize snapshots.
  • It does not upload files.
  • It does not use websockets.

The project must not claim production readiness until the later 24h soak test passes with documented quality metrics.