orderbooks/ROADMAP.md
philipp 284e465588
Some checks failed
deploy / deploy (push) Has been cancelled
Prepare Kubernetes orderbooks deployment
2026-04-18 11:23:28 +02:00

6.7 KiB

Roadmap

Project: Cross-Market Live Orderbook Archive

Goal: build a reliable, minimal, always-on archive of live market microstructure data so future research agents can test whether strategies were actually observable, fillable, and reproducible in real time.

The roadmap is checkpoint-driven. Each checkpoint must leave durable artifacts, validation evidence, and an explicit gate result.

Current Status

  • Latest completed checkpoint: Checkpoint 7, Google Drive Offload
  • Latest gate: PASS
  • Next checkpoint: Checkpoint 8, 24h Soak Test Plan
  • Initial market: Polymarket
  • Future market work: gated until Polymarket is stable

Checkpoint 1: Project Scaffold And Methodology

Goal: create the minimum repository structure and rules that keep future agents on track.

Artifacts:

  • AGENTS.md
  • ROADMAP.md
  • docs/METHODOLOGY.md
  • docs/DATA_CONTRACT.md
  • docs/OPERATIONS.md
  • orchestration/prompts/

Requirements:

  • Define project goal.
  • Define anti-fake-progress rules.
  • Define raw-first storage policy.
  • Define checkpoint reporting format.
  • Define no-trading/no-private-key policy.
  • Define how to label deprecated or misleading artifacts instead of deleting them.
  • Define how new market connectors should be added later.

Pass condition: the repo contains durable project rules and the next checkpoint is specific enough to execute.

Checkpoint 2: Polymarket Public Data Source Probe

Goal: determine exactly which public Polymarket endpoints can support live collection.

Questions:

  • How to discover active Polymarket markets?
  • How to filter BTC up/down markets?
  • How to resolve conditionId and token IDs?
  • How to fetch current order book for one token?
  • Is there a batch order-book endpoint?
  • Is there a market websocket for order-book updates?
  • Is there a trade websocket or recent trades endpoint?
  • What rate limits are documented or observed?
  • What fields are returned?
  • What timestamps exist?

Artifacts:

  • scripts/probe_polymarket_public_sources.py
  • data/probes/polymarket_public_sources_probe_v1.json
  • data/probes/polymarket_public_sources_probe_v1.md

Pass condition: we know the exact endpoint set and can fetch at least one active market metadata record and one current order book.

Checkpoint 3: Minimal BTC Market Discovery

Goal: build a small script that finds active BTC up/down Polymarket markets and resolves both outcome token IDs.

Artifacts:

  • scripts/discover_polymarket_btc_markets.py
  • data/discovery/polymarket_btc_markets_latest.json
  • data/discovery/polymarket_btc_markets_manifest.json
  • data/discovery/polymarket_btc_markets.md

Requirements:

  • Public endpoints only.
  • No trading.
  • No API keys unless strictly needed for public data.
  • Never store secrets in the repo.
  • Preserve raw metadata responses.
  • Write normalized market records with slug, question, conditionId, token IDs, outcomes, times, status, source, and fetched_at_utc.

Pass condition: the script reliably outputs currently active BTC markets with token IDs.

Checkpoint 4: Minimal Orderbook Snapshot Collector

Goal: collect raw order-book snapshots for active BTC markets at a fixed interval.

Artifacts:

  • scripts/collect_polymarket_orderbooks.py
  • config/polymarket_collector.example.yaml
  • data/live_sample/...
  • data/manifests/orderbook_collector_sample_manifest.json
  • docs/POLYMARKET_COLLECTOR.md

Requirements:

  • Collect active BTC markets only.
  • Fetch order books for both outcome tokens.
  • Store raw API responses as gzip JSONL.
  • Add local collected_at_utc, collector version, endpoint URL, and request params.
  • Rotate files by hour or run.
  • Include a manifest with timing, markets, request counts, status codes, rows, output files, and checksums.
  • Handle graceful shutdown and rate limits.
  • Do not add a database.

Pass condition: a 5-10 minute sample run creates valid compressed raw snapshots and a manifest.

Checkpoint 5: Normalized Snapshot Extract

Goal: create a derived normalized dataset from raw snapshots while preserving raw files as source of truth.

Artifacts:

  • scripts/normalize_polymarket_orderbooks.py
  • data/normalized_sample/...
  • data/manifests/orderbook_normalization_sample_manifest.json
  • docs/ORDERBOOK_SCHEMA.md

Pass condition: a sample raw file can be normalized and basic sanity checks pass.

Checkpoint 6: VPS Runtime Package

Goal: make the collector deployable on a small VPS.

Artifacts:

  • systemd/polymarket-orderbook-collector.service
  • config/polymarket_collector.vps.example.yaml
  • scripts/run_polymarket_collector_cycle.sh
  • docs/VPS_DEPLOYMENT.md

Uploader service and timer units are deferred to Checkpoint 7 with Google Drive offload. Creating empty uploader units in Checkpoint 6 would be fake progress.

Pass condition: a user can follow docs on a VPS and run the collector.

Checkpoint 7: Google Drive Offload

Goal: add periodic upload to Google Drive using rclone.

Artifacts:

  • scripts/upload_archive_rclone.sh
  • config/rclone.example.md
  • docs/GOOGLE_DRIVE_OFFLOAD.md
  • sample upload manifest format

Pass condition: a dry-run and a real small test upload succeed and are documented.

Checkpoint 8: 24h Soak Test Plan

Goal: run the collector for a real 24h period and validate reliability.

Artifacts:

  • reports/soak_test_YYYY-MM-DD.md
  • data/manifests/...

Metrics:

  • uptime
  • markets tracked
  • total snapshots
  • missed interval estimate
  • API errors
  • rate limits
  • file sizes
  • compression ratio
  • Google Drive upload status
  • restart behavior
  • disk usage
  • data quality checks

Pass condition: a 24h run completes with acceptable data quality and documented issues.

Checkpoint 9: Add Second Market Only After Polymarket Is Stable

Goal: prepare for NEAR or another market only after Polymarket collector reliability is proven.

Do not start this checkpoint until:

  • Polymarket discovery works.
  • Polymarket order-book collection works.
  • Google Drive offload works.
  • The 24h soak test is complete.

Architecture principles:

  • Use collectors/<market_name>/ only when adding the second market.
  • Keep shared code minimal.
  • Avoid abstract base classes until duplication is painful.
  • Keep raw-first, normalized-second, manifest-always file format consistent across markets.

Anti-Fake-Progress Gates

  • No dashboard before 24h data reliability.
  • No database before the file archive becomes painful.
  • No strategy or backtest code in this project.
  • No live trading.
  • No generic multi-market abstraction before the second market exists.
  • No claiming "production-ready" before a 24h soak test.
  • No deleting bad artifacts; label them deprecated or invalid and write why.

Next Smallest Step

Checkpoint 2 is next. It should inspect official Polymarket docs and perform bounded public endpoint probes to determine the exact live collection sources, schemas, timestamps, and rate-limit behavior.