orderbooks/docs/POLYMARKET_COLLECTOR.md
philipp 284e465588
Some checks failed
deploy / deploy (push) Has been cancelled
Prepare Kubernetes orderbooks deployment
2026-04-18 11:23:28 +02:00

149 lines
3.5 KiB
Markdown

# Polymarket Collector
Artifact status: `valid`
## Scope
This document covers the Checkpoint 4 bounded raw order-book sample collector.
It does not describe a production service. It does not include normalization, upload, systemd, dashboards, databases, strategies, trading, wallet logic, private keys, API keys, or private endpoints.
## Inputs
The collector reads active BTC markets from:
```text
data/discovery/polymarket_btc_markets_latest.json
```
Checkpoint 3 writes normalized market records with `condition_id` and `tokens` preserving the `Up` and `Down` outcome-token mapping. The collector uses only those records and does not perform market discovery itself.
If the discovery file is stale or contains no usable active markets, run:
```sh
python3 scripts/discover_polymarket_btc_markets.py
```
## Endpoint
The sample uses the public CLOB batch order-book endpoint:
```text
POST https://clob.polymarket.com/books
```
Request body shape:
```json
[
{"token_id": "<up_token_id>"},
{"token_id": "<down_token_id>"}
]
```
No authentication is used.
## Running A Bounded Sample
Default sample command:
```sh
python3 scripts/collect_polymarket_orderbooks.py
```
The default config is:
```text
config/polymarket_collector.example.yaml
```
The example config is deliberately small:
- `market_limit: 2`
- `interval_seconds: 30`
- `duration_seconds: 300`
- `market_end_safety_seconds: 420`
This produces a 5-minute sample for at most 2 markets, fetching both `Up` and `Down` outcome tokens by batch request.
## Outputs
Raw gzip JSONL snapshots are written under:
```text
data/live_sample/polymarket/orderbooks/<run_id>/
```
The sample manifest is written to:
```text
data/manifests/orderbook_collector_sample_manifest.json
```
Files rotate by run for this checkpoint. Hourly rotation is intentionally left for a later sustained runtime checkpoint.
## Raw JSONL Envelope
Each gzip JSONL line is a raw-first envelope:
```json
{
"schema_name": "raw_orderbook_snapshot",
"schema_version": 1,
"collector": {
"name": "polymarket_orderbook_collector",
"version": "0.1.0"
},
"market": {
"market_name": "polymarket",
"market_slug": "example",
"condition_id": "0x...",
"token_id": "123",
"outcome": "Up",
"market_end_time_utc": "2026-04-14T22:00:00Z"
},
"collection": {
"collected_at_utc": "2026-04-14T21:00:00Z",
"sequence": 1,
"response_index": 0
},
"request": {
"method": "POST",
"url": "https://clob.polymarket.com/books",
"params": null,
"json_body": [{"token_id": "123"}],
"status_code": 200,
"duration_ms": 123,
"attempts": []
},
"raw": {}
}
```
The `raw` object is the unmodified order-book object returned by CLOB for that token.
## Rate-Limit Handling
The sample is conservative:
- Uses a small market cap by default.
- Uses a fixed interval between batch requests.
- Applies request timeout.
- Retries `429` and `5xx` responses with exponential backoff.
- Does not use concurrent requests.
## Shutdown
`SIGINT` and `SIGTERM` set a stop flag. The current request, if any, finishes or times out, the gzip file closes, and the manifest is written with a shutdown warning.
## Known Gaps
- This is a short run-rotated sample, not a daemon.
- It does not prove 24/7 reliability.
- It does not implement hourly rotation.
- It does not refresh discovery during a run.
- It does not normalize snapshots.
- It does not upload files.
- It does not use websockets.
The project must not claim production readiness until the later 24h soak test passes with documented quality metrics.