149 lines
3.5 KiB
Markdown
149 lines
3.5 KiB
Markdown
# Polymarket Collector
|
|
|
|
Artifact status: `valid`
|
|
|
|
## Scope
|
|
|
|
This document covers the Checkpoint 4 bounded raw order-book sample collector.
|
|
|
|
It does not describe a production service. It does not include normalization, upload, systemd, dashboards, databases, strategies, trading, wallet logic, private keys, API keys, or private endpoints.
|
|
|
|
## Inputs
|
|
|
|
The collector reads active BTC markets from:
|
|
|
|
```text
|
|
data/discovery/polymarket_btc_markets_latest.json
|
|
```
|
|
|
|
Checkpoint 3 writes normalized market records with `condition_id` and `tokens` preserving the `Up` and `Down` outcome-token mapping. The collector uses only those records and does not perform market discovery itself.
|
|
|
|
If the discovery file is stale or contains no usable active markets, run:
|
|
|
|
```sh
|
|
python3 scripts/discover_polymarket_btc_markets.py
|
|
```
|
|
|
|
## Endpoint
|
|
|
|
The sample uses the public CLOB batch order-book endpoint:
|
|
|
|
```text
|
|
POST https://clob.polymarket.com/books
|
|
```
|
|
|
|
Request body shape:
|
|
|
|
```json
|
|
[
|
|
{"token_id": "<up_token_id>"},
|
|
{"token_id": "<down_token_id>"}
|
|
]
|
|
```
|
|
|
|
No authentication is used.
|
|
|
|
## Running A Bounded Sample
|
|
|
|
Default sample command:
|
|
|
|
```sh
|
|
python3 scripts/collect_polymarket_orderbooks.py
|
|
```
|
|
|
|
The default config is:
|
|
|
|
```text
|
|
config/polymarket_collector.example.yaml
|
|
```
|
|
|
|
The example config is deliberately small:
|
|
|
|
- `market_limit: 2`
|
|
- `interval_seconds: 30`
|
|
- `duration_seconds: 300`
|
|
- `market_end_safety_seconds: 420`
|
|
|
|
This produces a 5-minute sample for at most 2 markets, fetching both `Up` and `Down` outcome tokens by batch request.
|
|
|
|
## Outputs
|
|
|
|
Raw gzip JSONL snapshots are written under:
|
|
|
|
```text
|
|
data/live_sample/polymarket/orderbooks/<run_id>/
|
|
```
|
|
|
|
The sample manifest is written to:
|
|
|
|
```text
|
|
data/manifests/orderbook_collector_sample_manifest.json
|
|
```
|
|
|
|
Files rotate by run for this checkpoint. Hourly rotation is intentionally left for a later sustained runtime checkpoint.
|
|
|
|
## Raw JSONL Envelope
|
|
|
|
Each gzip JSONL line is a raw-first envelope:
|
|
|
|
```json
|
|
{
|
|
"schema_name": "raw_orderbook_snapshot",
|
|
"schema_version": 1,
|
|
"collector": {
|
|
"name": "polymarket_orderbook_collector",
|
|
"version": "0.1.0"
|
|
},
|
|
"market": {
|
|
"market_name": "polymarket",
|
|
"market_slug": "example",
|
|
"condition_id": "0x...",
|
|
"token_id": "123",
|
|
"outcome": "Up",
|
|
"market_end_time_utc": "2026-04-14T22:00:00Z"
|
|
},
|
|
"collection": {
|
|
"collected_at_utc": "2026-04-14T21:00:00Z",
|
|
"sequence": 1,
|
|
"response_index": 0
|
|
},
|
|
"request": {
|
|
"method": "POST",
|
|
"url": "https://clob.polymarket.com/books",
|
|
"params": null,
|
|
"json_body": [{"token_id": "123"}],
|
|
"status_code": 200,
|
|
"duration_ms": 123,
|
|
"attempts": []
|
|
},
|
|
"raw": {}
|
|
}
|
|
```
|
|
|
|
The `raw` object is the unmodified order-book object returned by CLOB for that token.
|
|
|
|
## Rate-Limit Handling
|
|
|
|
The sample is conservative:
|
|
|
|
- Uses a small market cap by default.
|
|
- Uses a fixed interval between batch requests.
|
|
- Applies request timeout.
|
|
- Retries `429` and `5xx` responses with exponential backoff.
|
|
- Does not use concurrent requests.
|
|
|
|
## Shutdown
|
|
|
|
`SIGINT` and `SIGTERM` set a stop flag. The current request, if any, finishes or times out, the gzip file closes, and the manifest is written with a shutdown warning.
|
|
|
|
## Known Gaps
|
|
|
|
- This is a short run-rotated sample, not a daemon.
|
|
- It does not prove 24/7 reliability.
|
|
- It does not implement hourly rotation.
|
|
- It does not refresh discovery during a run.
|
|
- It does not normalize snapshots.
|
|
- It does not upload files.
|
|
- It does not use websockets.
|
|
|
|
The project must not claim production readiness until the later 24h soak test passes with documented quality metrics.
|