orderbooks/docs/KUBERNETES_DEPLOYMENT.md
philipp 284e465588
Some checks failed
deploy / deploy (push) Has been cancelled
Prepare Kubernetes orderbooks deployment
2026-04-18 11:23:28 +02:00

5.3 KiB

Kubernetes Deployment

Status: draft runtime package for Checkpoint 8G

This document describes the Kubernetes package for the Polymarket raw order-book collector. It follows the shared Hetzner k3s cluster model from ../nuri/unrip3: application code, Dockerfile, manifests, and Forgejo workflow live in this repository; platform services, the shared registry, and the shared Forgejo runner remain platform-owned.

This package does not claim production readiness. Production readiness still requires a real Kubernetes runtime smoke run with preserved evidence.

Cluster Decisions

  • Namespace: orderbooks
  • Workstation kubeconfig for validation: ../nuri/unrip3/.state/hetzner/kubeconfig.yaml
  • Shared registry and shared Forgejo runner
  • Existing rclone Secret: orderbooks/orderbooks-rclone-config
  • Secret key mounted by the uploader: rclone.conf

Do not commit or print rclone config contents.

Runtime Layout

The collector and uploader share one PVC:

PVC: orderbooks-data
mount: /var/lib/orderbooks
raw files: /var/lib/orderbooks/raw_orderbooks
manifests: /var/lib/orderbooks/manifests
discovery: /var/lib/orderbooks/discovery

The collector uses one Deployment with one replica. The container runs /app/scripts/run_polymarket_collector_loop.sh, which repeatedly executes the existing bounded collector cycle and records loop failure/interruption manifests instead of relying on Kubernetes crash loops for normal operation.

The uploader uses one CronJob. It runs the existing rclone uploader in execute mode, mounts the same PVC, mounts orderbooks-rclone-config read-only at /etc/rclone/rclone.conf, sets RCLONE_CONFIG to that file, and uploads only closed/aged files.

Bootstrap This App Repo

Run the orderbooks-specific bootstrap from this repository:

scripts/deploy/bootstrap_orderbooks_k8s.sh

The bootstrap loads platform defaults and resolved secrets from the local platform state without printing secret values. It ensures namespace orderbooks, creates or updates orderbooks-registry-creds, verifies the existing orderbooks-rclone-config secret has key rclone.conf, creates or updates the Forgejo repo philipp/orderbooks, and upserts the required Actions secret and variables.

After bootstrap, push a clean source tree to Forgejo main. Do not push local data/, artifacts/, reports/, orchestration/, kubeconfigs, rclone config, .env, private keys, or other local evidence/secrets.

Image Build And Deploy

The Forgejo workflow is .forgejo/workflows/deploy.yml. It follows the shared runner pattern:

  1. load KUBECONFIG_B64 from Forgejo secrets;
  2. clone this repo inside the runner;
  3. create an in-cluster Kaniko Job;
  4. build and push REGISTRY_HOST/orderbooks:<git-sha>;
  5. apply deploy/k8s/base with the built image;
  6. wait for deployment/orderbooks-collector rollout.

Required Forgejo repo secret:

KUBECONFIG_B64

Required Forgejo repo variable:

REGISTRY_HOST

Project defaults used by the workflow:

PROJECT_NAME=orderbooks
PROJECT_NAMESPACE=orderbooks
PROJECT_DEPLOYMENTS=orderbooks-collector
PROJECT_REGISTRY_SECRET_NAME=orderbooks-registry-creds

The registry pull/build secret orderbooks-registry-creds must exist in the orderbooks namespace before the workflow builds and deploys.

Pre-Deploy Validation

From this repository:

bash -n scripts/run_polymarket_collector_loop.sh
bash -n scripts/k8s_runtime_smoke_check.sh
kubectl kustomize deploy/k8s/base
KUBECONFIG=../nuri/unrip3/.state/hetzner/kubeconfig.yaml   kubectl apply -k deploy/k8s/base --dry-run=server
KUBECONFIG=../nuri/unrip3/.state/hetzner/kubeconfig.yaml   kubectl -n orderbooks get secret orderbooks-rclone-config   -o go-template='{{if index .data "rclone.conf"}}rclone_secret_key_present{{else}}rclone_secret_key_missing{{end}}{{"\n"}}'

The last command checks only whether the key exists. It must not print secret data.

Runtime Smoke Gate

After the image is built and the workload is actually deployed, run:

KUBECONFIG=../nuri/unrip3/.state/hetzner/kubeconfig.yaml   scripts/k8s_runtime_smoke_check.sh   --namespace orderbooks   --deployment orderbooks-collector   --cronjob orderbooks-uploader   --raw-dir /var/lib/orderbooks/raw_orderbooks   --manifest-dir /var/lib/orderbooks/manifests   --wait-seconds 1800 \
  --upload-min-age-seconds 600

The smoke gate uses kubectl, not systemd. It writes local JSON evidence under data/manifests/k8s_runtime_smoke_<UTC_TIMESTAMP>.json by default. It verifies:

  • collector pod is running;
  • latest collector manifest has gate_status: PASS, rows_written > 0, and failure_count: 0;
  • raw gzip JSONL parses and is under /var/lib/orderbooks/raw_orderbooks;
  • deleting the collector pod does not corrupt the old raw file checksum or row count;
  • a later post-restart collector cycle writes valid rows;
  • an uploader Job created from the CronJob completes;
  • the latest upload manifest records a verified rclone upload with at least one verified file.

A failed smoke run still writes JSON evidence and exits nonzero. Preserve failed manifests, raw files, upload manifests, and pod logs for review.

Not Included

  • No trading, signing, wallets, private keys, or API keys.
  • No dashboard, database, strategy, backtest, or second-market connector.
  • No websocket rewrite.
  • No rclone config contents in this repository.