# Polymarket Collector Artifact status: `valid` ## Scope This document covers the Checkpoint 4 bounded raw order-book sample collector. It does not describe a production service. It does not include normalization, upload, systemd, dashboards, databases, strategies, trading, wallet logic, private keys, API keys, or private endpoints. ## Inputs The collector reads active BTC markets from: ```text data/discovery/polymarket_btc_markets_latest.json ``` Checkpoint 3 writes normalized market records with `condition_id` and `tokens` preserving the `Up` and `Down` outcome-token mapping. The collector uses only those records and does not perform market discovery itself. If the discovery file is stale or contains no usable active markets, run: ```sh python3 scripts/discover_polymarket_btc_markets.py ``` ## Endpoint The sample uses the public CLOB batch order-book endpoint: ```text POST https://clob.polymarket.com/books ``` Request body shape: ```json [ {"token_id": ""}, {"token_id": ""} ] ``` No authentication is used. ## Running A Bounded Sample Default sample command: ```sh python3 scripts/collect_polymarket_orderbooks.py ``` The default config is: ```text config/polymarket_collector.example.yaml ``` The example config is deliberately small: - `market_limit: 2` - `interval_seconds: 30` - `duration_seconds: 300` - `market_end_safety_seconds: 420` This produces a 5-minute sample for at most 2 markets, fetching both `Up` and `Down` outcome tokens by batch request. ## Outputs Raw gzip JSONL snapshots are written under: ```text data/live_sample/polymarket/orderbooks// ``` The sample manifest is written to: ```text data/manifests/orderbook_collector_sample_manifest.json ``` Files rotate by run for this checkpoint. Hourly rotation is intentionally left for a later sustained runtime checkpoint. ## Raw JSONL Envelope Each gzip JSONL line is a raw-first envelope: ```json { "schema_name": "raw_orderbook_snapshot", "schema_version": 1, "collector": { "name": "polymarket_orderbook_collector", "version": "0.1.0" }, "market": { "market_name": "polymarket", "market_slug": "example", "condition_id": "0x...", "token_id": "123", "outcome": "Up", "market_end_time_utc": "2026-04-14T22:00:00Z" }, "collection": { "collected_at_utc": "2026-04-14T21:00:00Z", "sequence": 1, "response_index": 0 }, "request": { "method": "POST", "url": "https://clob.polymarket.com/books", "params": null, "json_body": [{"token_id": "123"}], "status_code": 200, "duration_ms": 123, "attempts": [] }, "raw": {} } ``` The `raw` object is the unmodified order-book object returned by CLOB for that token. ## Rate-Limit Handling The sample is conservative: - Uses a small market cap by default. - Uses a fixed interval between batch requests. - Applies request timeout. - Retries `429` and `5xx` responses with exponential backoff. - Does not use concurrent requests. ## Shutdown `SIGINT` and `SIGTERM` set a stop flag. The current request, if any, finishes or times out, the gzip file closes, and the manifest is written with a shutdown warning. ## Known Gaps - This is a short run-rotated sample, not a daemon. - It does not prove 24/7 reliability. - It does not implement hourly rotation. - It does not refresh discovery during a run. - It does not normalize snapshots. - It does not upload files. - It does not use websockets. The project must not claim production readiness until the later 24h soak test passes with documented quality metrics.