212 lines
6.7 KiB
Markdown
212 lines
6.7 KiB
Markdown
# Roadmap
|
|
|
|
Project: Cross-Market Live Orderbook Archive
|
|
|
|
Goal: build a reliable, minimal, always-on archive of live market microstructure data so future research agents can test whether strategies were actually observable, fillable, and reproducible in real time.
|
|
|
|
The roadmap is checkpoint-driven. Each checkpoint must leave durable artifacts, validation evidence, and an explicit gate result.
|
|
|
|
## Current Status
|
|
|
|
- Latest completed checkpoint: Checkpoint 7, Google Drive Offload
|
|
- Latest gate: PASS
|
|
- Next checkpoint: Checkpoint 8, 24h Soak Test Plan
|
|
- Initial market: Polymarket
|
|
- Future market work: gated until Polymarket is stable
|
|
|
|
## Checkpoint 1: Project Scaffold And Methodology
|
|
|
|
Goal: create the minimum repository structure and rules that keep future agents on track.
|
|
|
|
Artifacts:
|
|
|
|
- `AGENTS.md`
|
|
- `ROADMAP.md`
|
|
- `docs/METHODOLOGY.md`
|
|
- `docs/DATA_CONTRACT.md`
|
|
- `docs/OPERATIONS.md`
|
|
- `orchestration/prompts/`
|
|
|
|
Requirements:
|
|
|
|
- Define project goal.
|
|
- Define anti-fake-progress rules.
|
|
- Define raw-first storage policy.
|
|
- Define checkpoint reporting format.
|
|
- Define no-trading/no-private-key policy.
|
|
- Define how to label deprecated or misleading artifacts instead of deleting them.
|
|
- Define how new market connectors should be added later.
|
|
|
|
Pass condition: the repo contains durable project rules and the next checkpoint is specific enough to execute.
|
|
|
|
## Checkpoint 2: Polymarket Public Data Source Probe
|
|
|
|
Goal: determine exactly which public Polymarket endpoints can support live collection.
|
|
|
|
Questions:
|
|
|
|
- How to discover active Polymarket markets?
|
|
- How to filter BTC up/down markets?
|
|
- How to resolve conditionId and token IDs?
|
|
- How to fetch current order book for one token?
|
|
- Is there a batch order-book endpoint?
|
|
- Is there a market websocket for order-book updates?
|
|
- Is there a trade websocket or recent trades endpoint?
|
|
- What rate limits are documented or observed?
|
|
- What fields are returned?
|
|
- What timestamps exist?
|
|
|
|
Artifacts:
|
|
|
|
- `scripts/probe_polymarket_public_sources.py`
|
|
- `data/probes/polymarket_public_sources_probe_v1.json`
|
|
- `data/probes/polymarket_public_sources_probe_v1.md`
|
|
|
|
Pass condition: we know the exact endpoint set and can fetch at least one active market metadata record and one current order book.
|
|
|
|
## Checkpoint 3: Minimal BTC Market Discovery
|
|
|
|
Goal: build a small script that finds active BTC up/down Polymarket markets and resolves both outcome token IDs.
|
|
|
|
Artifacts:
|
|
|
|
- `scripts/discover_polymarket_btc_markets.py`
|
|
- `data/discovery/polymarket_btc_markets_latest.json`
|
|
- `data/discovery/polymarket_btc_markets_manifest.json`
|
|
- `data/discovery/polymarket_btc_markets.md`
|
|
|
|
Requirements:
|
|
|
|
- Public endpoints only.
|
|
- No trading.
|
|
- No API keys unless strictly needed for public data.
|
|
- Never store secrets in the repo.
|
|
- Preserve raw metadata responses.
|
|
- Write normalized market records with slug, question, conditionId, token IDs, outcomes, times, status, source, and `fetched_at_utc`.
|
|
|
|
Pass condition: the script reliably outputs currently active BTC markets with token IDs.
|
|
|
|
## Checkpoint 4: Minimal Orderbook Snapshot Collector
|
|
|
|
Goal: collect raw order-book snapshots for active BTC markets at a fixed interval.
|
|
|
|
Artifacts:
|
|
|
|
- `scripts/collect_polymarket_orderbooks.py`
|
|
- `config/polymarket_collector.example.yaml`
|
|
- `data/live_sample/...`
|
|
- `data/manifests/orderbook_collector_sample_manifest.json`
|
|
- `docs/POLYMARKET_COLLECTOR.md`
|
|
|
|
Requirements:
|
|
|
|
- Collect active BTC markets only.
|
|
- Fetch order books for both outcome tokens.
|
|
- Store raw API responses as gzip JSONL.
|
|
- Add local `collected_at_utc`, collector version, endpoint URL, and request params.
|
|
- Rotate files by hour or run.
|
|
- Include a manifest with timing, markets, request counts, status codes, rows, output files, and checksums.
|
|
- Handle graceful shutdown and rate limits.
|
|
- Do not add a database.
|
|
|
|
Pass condition: a 5-10 minute sample run creates valid compressed raw snapshots and a manifest.
|
|
|
|
## Checkpoint 5: Normalized Snapshot Extract
|
|
|
|
Goal: create a derived normalized dataset from raw snapshots while preserving raw files as source of truth.
|
|
|
|
Artifacts:
|
|
|
|
- `scripts/normalize_polymarket_orderbooks.py`
|
|
- `data/normalized_sample/...`
|
|
- `data/manifests/orderbook_normalization_sample_manifest.json`
|
|
- `docs/ORDERBOOK_SCHEMA.md`
|
|
|
|
Pass condition: a sample raw file can be normalized and basic sanity checks pass.
|
|
|
|
## Checkpoint 6: VPS Runtime Package
|
|
|
|
Goal: make the collector deployable on a small VPS.
|
|
|
|
Artifacts:
|
|
|
|
- `systemd/polymarket-orderbook-collector.service`
|
|
- `config/polymarket_collector.vps.example.yaml`
|
|
- `scripts/run_polymarket_collector_cycle.sh`
|
|
- `docs/VPS_DEPLOYMENT.md`
|
|
|
|
Uploader service and timer units are deferred to Checkpoint 7 with Google Drive
|
|
offload. Creating empty uploader units in Checkpoint 6 would be fake progress.
|
|
|
|
Pass condition: a user can follow docs on a VPS and run the collector.
|
|
|
|
## Checkpoint 7: Google Drive Offload
|
|
|
|
Goal: add periodic upload to Google Drive using `rclone`.
|
|
|
|
Artifacts:
|
|
|
|
- `scripts/upload_archive_rclone.sh`
|
|
- `config/rclone.example.md`
|
|
- `docs/GOOGLE_DRIVE_OFFLOAD.md`
|
|
- sample upload manifest format
|
|
|
|
Pass condition: a dry-run and a real small test upload succeed and are documented.
|
|
|
|
## Checkpoint 8: 24h Soak Test Plan
|
|
|
|
Goal: run the collector for a real 24h period and validate reliability.
|
|
|
|
Artifacts:
|
|
|
|
- `reports/soak_test_YYYY-MM-DD.md`
|
|
- `data/manifests/...`
|
|
|
|
Metrics:
|
|
|
|
- uptime
|
|
- markets tracked
|
|
- total snapshots
|
|
- missed interval estimate
|
|
- API errors
|
|
- rate limits
|
|
- file sizes
|
|
- compression ratio
|
|
- Google Drive upload status
|
|
- restart behavior
|
|
- disk usage
|
|
- data quality checks
|
|
|
|
Pass condition: a 24h run completes with acceptable data quality and documented issues.
|
|
|
|
## Checkpoint 9: Add Second Market Only After Polymarket Is Stable
|
|
|
|
Goal: prepare for NEAR or another market only after Polymarket collector reliability is proven.
|
|
|
|
Do not start this checkpoint until:
|
|
|
|
- Polymarket discovery works.
|
|
- Polymarket order-book collection works.
|
|
- Google Drive offload works.
|
|
- The 24h soak test is complete.
|
|
|
|
Architecture principles:
|
|
|
|
- Use `collectors/<market_name>/` only when adding the second market.
|
|
- Keep shared code minimal.
|
|
- Avoid abstract base classes until duplication is painful.
|
|
- Keep raw-first, normalized-second, manifest-always file format consistent across markets.
|
|
|
|
## Anti-Fake-Progress Gates
|
|
|
|
- No dashboard before 24h data reliability.
|
|
- No database before the file archive becomes painful.
|
|
- No strategy or backtest code in this project.
|
|
- No live trading.
|
|
- No generic multi-market abstraction before the second market exists.
|
|
- No claiming "production-ready" before a 24h soak test.
|
|
- No deleting bad artifacts; label them deprecated or invalid and write why.
|
|
|
|
## Next Smallest Step
|
|
|
|
Checkpoint 2 is next. It should inspect official Polymarket docs and perform bounded public endpoint probes to determine the exact live collection sources, schemas, timestamps, and rate-limit behavior.
|