orderbooks/docs/KUBERNETES_DEPLOYMENT.md
philipp 284e465588
Some checks failed
deploy / deploy (push) Has been cancelled
Prepare Kubernetes orderbooks deployment
2026-04-18 11:23:28 +02:00

148 lines
5.3 KiB
Markdown

# Kubernetes Deployment
Status: draft runtime package for Checkpoint 8G
This document describes the Kubernetes package for the Polymarket raw
order-book collector. It follows the shared Hetzner k3s cluster model from
`../nuri/unrip3`: application code, Dockerfile, manifests, and Forgejo workflow
live in this repository; platform services, the shared registry, and the shared
Forgejo runner remain platform-owned.
This package does not claim production readiness. Production readiness still
requires a real Kubernetes runtime smoke run with preserved evidence.
## Cluster Decisions
- Namespace: `orderbooks`
- Workstation kubeconfig for validation: `../nuri/unrip3/.state/hetzner/kubeconfig.yaml`
- Shared registry and shared Forgejo runner
- Existing rclone Secret: `orderbooks/orderbooks-rclone-config`
- Secret key mounted by the uploader: `rclone.conf`
Do not commit or print rclone config contents.
## Runtime Layout
The collector and uploader share one PVC:
```text
PVC: orderbooks-data
mount: /var/lib/orderbooks
raw files: /var/lib/orderbooks/raw_orderbooks
manifests: /var/lib/orderbooks/manifests
discovery: /var/lib/orderbooks/discovery
```
The collector uses one Deployment with one replica. The container runs
`/app/scripts/run_polymarket_collector_loop.sh`, which repeatedly executes the
existing bounded collector cycle and records loop failure/interruption manifests
instead of relying on Kubernetes crash loops for normal operation.
The uploader uses one CronJob. It runs the existing rclone uploader in execute
mode, mounts the same PVC, mounts `orderbooks-rclone-config` read-only at
`/etc/rclone/rclone.conf`, sets `RCLONE_CONFIG` to that file, and uploads only
closed/aged files.
## Bootstrap This App Repo
Run the orderbooks-specific bootstrap from this repository:
```sh
scripts/deploy/bootstrap_orderbooks_k8s.sh
```
The bootstrap loads platform defaults and resolved secrets from the local
platform state without printing secret values. It ensures namespace `orderbooks`,
creates or updates `orderbooks-registry-creds`, verifies the existing
`orderbooks-rclone-config` secret has key `rclone.conf`, creates or updates the
Forgejo repo `philipp/orderbooks`, and upserts the required Actions secret and
variables.
After bootstrap, push a clean source tree to Forgejo `main`. Do not push local
`data/`, `artifacts/`, `reports/`, `orchestration/`, kubeconfigs, rclone config,
`.env`, private keys, or other local evidence/secrets.
## Image Build And Deploy
The Forgejo workflow is `.forgejo/workflows/deploy.yml`. It follows the shared
runner pattern:
1. load `KUBECONFIG_B64` from Forgejo secrets;
2. clone this repo inside the runner;
3. create an in-cluster Kaniko Job;
4. build and push `REGISTRY_HOST/orderbooks:<git-sha>`;
5. apply `deploy/k8s/base` with the built image;
6. wait for `deployment/orderbooks-collector` rollout.
Required Forgejo repo secret:
```text
KUBECONFIG_B64
```
Required Forgejo repo variable:
```text
REGISTRY_HOST
```
Project defaults used by the workflow:
```text
PROJECT_NAME=orderbooks
PROJECT_NAMESPACE=orderbooks
PROJECT_DEPLOYMENTS=orderbooks-collector
PROJECT_REGISTRY_SECRET_NAME=orderbooks-registry-creds
```
The registry pull/build secret `orderbooks-registry-creds` must exist in the
`orderbooks` namespace before the workflow builds and deploys.
## Pre-Deploy Validation
From this repository:
```sh
bash -n scripts/run_polymarket_collector_loop.sh
bash -n scripts/k8s_runtime_smoke_check.sh
kubectl kustomize deploy/k8s/base
KUBECONFIG=../nuri/unrip3/.state/hetzner/kubeconfig.yaml kubectl apply -k deploy/k8s/base --dry-run=server
KUBECONFIG=../nuri/unrip3/.state/hetzner/kubeconfig.yaml kubectl -n orderbooks get secret orderbooks-rclone-config -o go-template='{{if index .data "rclone.conf"}}rclone_secret_key_present{{else}}rclone_secret_key_missing{{end}}{{"\n"}}'
```
The last command checks only whether the key exists. It must not print secret
data.
## Runtime Smoke Gate
After the image is built and the workload is actually deployed, run:
```sh
KUBECONFIG=../nuri/unrip3/.state/hetzner/kubeconfig.yaml scripts/k8s_runtime_smoke_check.sh --namespace orderbooks --deployment orderbooks-collector --cronjob orderbooks-uploader --raw-dir /var/lib/orderbooks/raw_orderbooks --manifest-dir /var/lib/orderbooks/manifests --wait-seconds 1800 \
--upload-min-age-seconds 600
```
The smoke gate uses `kubectl`, not systemd. It writes local JSON evidence under
`data/manifests/k8s_runtime_smoke_<UTC_TIMESTAMP>.json` by default. It verifies:
- collector pod is running;
- latest collector manifest has `gate_status: PASS`, `rows_written > 0`, and
`failure_count: 0`;
- raw gzip JSONL parses and is under `/var/lib/orderbooks/raw_orderbooks`;
- deleting the collector pod does not corrupt the old raw file checksum or row
count;
- a later post-restart collector cycle writes valid rows;
- an uploader Job created from the CronJob completes;
- the latest upload manifest records a verified rclone upload with at least one
verified file.
A failed smoke run still writes JSON evidence and exits nonzero. Preserve failed
manifests, raw files, upload manifests, and pod logs for review.
## Not Included
- No trading, signing, wallets, private keys, or API keys.
- No dashboard, database, strategy, backtest, or second-market connector.
- No websocket rewrite.
- No rclone config contents in this repository.