orderbooks/docs/OPERATIONS.md
philipp 284e465588
Some checks failed
deploy / deploy (push) Has been cancelled
Prepare Kubernetes orderbooks deployment
2026-04-18 11:23:28 +02:00

2.5 KiB

Operations

This document defines operational rules before the collector exists. It should be updated with exact commands as checkpoints add scripts, services, and upload jobs.

Current Operational Status

  • Collector implementation: not started.
  • Supported market: none yet; Polymarket is the first planned market.
  • Deployment target: small VPS.
  • Offload target: Google Drive through rclone.
  • Reliability status: not production-ready until a documented 24h soak test passes.

Safety Rules

  • No trading.
  • No order placement.
  • No wallet signing.
  • No private keys.
  • No secrets in git.
  • No dashboards, databases, ML, or strategy code before the roadmap gate allows them.

Local Runtime Principles

Future scripts should:

  • accept a configurable data directory
  • write logs to a predictable location
  • write raw gzip JSONL snapshots
  • rotate files by hour or run
  • close files cleanly on shutdown
  • write manifests after runs
  • avoid corrupting closed files on restart
  • handle public endpoint errors and rate limits conservatively

VPS Deployment Principles

Checkpoint 6 should document:

  • Python version and virtualenv setup
  • package installation
  • environment variables
  • systemd or Docker Compose runtime
  • service user and file permissions
  • data directory ownership
  • log locations
  • restart policy
  • disk usage checks
  • safe upgrade and rollback steps

Google Drive Offload Principles

Checkpoint 7 should use rclone and must:

  • avoid hardcoded credentials
  • upload only closed or rotated files
  • support dry-run mode
  • verify upload success
  • preserve local files until upload is verified
  • maintain checksums
  • keep the last N days locally
  • write an upload manifest

Incident And Bad-Data Handling

If data looks wrong:

  1. Preserve the raw files.
  2. Stop relying on the affected derived files.
  3. Label the artifact invalid or deprecated.
  4. Write a short note explaining the issue and replacement, if any.
  5. Keep the learning in docs or reports.

Examples of bad-data conditions:

  • endpoint returned a schema different from expected
  • token/outcome mapping was wrong
  • timestamps were misunderstood
  • rate limits caused large gaps
  • gzip file was not closed cleanly
  • upload succeeded but checksum did not match

Minimum Reliability Claim

A short sample run can prove that code writes files. It cannot prove 24/7 reliability.

The project may only claim production readiness after:

  • discovery works
  • raw order-book collection works
  • offload works
  • 24h soak test completes
  • data quality and gap metrics are documented