orderbooks/docs/OPERATIONS.md
philipp 284e465588
Some checks failed
deploy / deploy (push) Has been cancelled
Prepare Kubernetes orderbooks deployment
2026-04-18 11:23:28 +02:00

93 lines
2.5 KiB
Markdown

# Operations
This document defines operational rules before the collector exists. It should be updated with exact commands as checkpoints add scripts, services, and upload jobs.
## Current Operational Status
- Collector implementation: not started.
- Supported market: none yet; Polymarket is the first planned market.
- Deployment target: small VPS.
- Offload target: Google Drive through `rclone`.
- Reliability status: not production-ready until a documented 24h soak test passes.
## Safety Rules
- No trading.
- No order placement.
- No wallet signing.
- No private keys.
- No secrets in git.
- No dashboards, databases, ML, or strategy code before the roadmap gate allows them.
## Local Runtime Principles
Future scripts should:
- accept a configurable data directory
- write logs to a predictable location
- write raw gzip JSONL snapshots
- rotate files by hour or run
- close files cleanly on shutdown
- write manifests after runs
- avoid corrupting closed files on restart
- handle public endpoint errors and rate limits conservatively
## VPS Deployment Principles
Checkpoint 6 should document:
- Python version and virtualenv setup
- package installation
- environment variables
- systemd or Docker Compose runtime
- service user and file permissions
- data directory ownership
- log locations
- restart policy
- disk usage checks
- safe upgrade and rollback steps
## Google Drive Offload Principles
Checkpoint 7 should use `rclone` and must:
- avoid hardcoded credentials
- upload only closed or rotated files
- support dry-run mode
- verify upload success
- preserve local files until upload is verified
- maintain checksums
- keep the last N days locally
- write an upload manifest
## Incident And Bad-Data Handling
If data looks wrong:
1. Preserve the raw files.
2. Stop relying on the affected derived files.
3. Label the artifact `invalid` or `deprecated`.
4. Write a short note explaining the issue and replacement, if any.
5. Keep the learning in docs or reports.
Examples of bad-data conditions:
- endpoint returned a schema different from expected
- token/outcome mapping was wrong
- timestamps were misunderstood
- rate limits caused large gaps
- gzip file was not closed cleanly
- upload succeeded but checksum did not match
## Minimum Reliability Claim
A short sample run can prove that code writes files. It cannot prove 24/7 reliability.
The project may only claim production readiness after:
- discovery works
- raw order-book collection works
- offload works
- 24h soak test completes
- data quality and gap metrics are documented