orderbooks/reports/checkpoints/checkpoint_011_verified_purge.md
2026-05-02 17:44:33 +02:00

3.1 KiB

Checkpoint 11: Verified Upload Purge

Gate

READY_FOR_DEPLOY_NOT_LIVE

The purge implementation is validated locally and the Kubernetes apply set passes server dry-run, but this change has not been built into a new cluster image yet.

Goal

Add periodic local deletion of files that have already been uploaded and verified on the remote, without relying only on the current upload run.

What Changed

  • scripts/upload_archive_rclone.sh
    • writes/updates a durable verified-upload index at /var/lib/orderbooks/manifests/upload_verified_index.json
    • records verified-index update summary in each upload manifest
  • scripts/purge_uploaded_local_files.sh
    • reads the verified-upload index
    • deletes only files older than retention with matching local SHA-256
    • protects the verified-upload index itself
    • writes a purge manifest under /var/lib/orderbooks/manifests/
  • deploy/k8s/base/cronjob-uploader.yaml
    • runs upload verification and purge in the same periodic CronJob cycle
  • systemd/polymarket-orderbook-uploader.service
    • runs upload verification and purge in the same periodic service execution
  • docs updated:
    • docs/GOOGLE_DRIVE_OFFLOAD.md
    • docs/KUBERNETES_DEPLOYMENT.md
    • docs/POLYMARKET_WEBSOCKET_RECORDER.md
    • docs/VPS_CUTOVER_RUNBOOK.md

Validation Evidence

Local validation used a temporary data directory and a local rclone destination path, not Google Drive, to prove the full flow:

  1. real rclone copy
  2. real rclone check
  3. verified-upload index update
  4. purge of files older than retention
  5. retention of a newer local file

Durable artifacts:

  • data/manifests/upload_archive_purge_validation_sample.json
  • data/manifests/purge_uploaded_local_validation_sample.json
  • data/manifests/purge_uploaded_local_validation_summary.json

Observed result:

  • upload gate: PASS
  • upload operation: UPLOAD_VERIFIED
  • verified index status: updated
  • purge gate: PASS
  • purge operation: PURGE_PASS
  • deleted files: 2
  • retained newer file: 1

Kubernetes validation:

  • kubectl kustomize deploy/k8s/base
  • KUBECONFIG=../nuri/unrip3/.state/hetzner/kubeconfig.yaml kubectl apply -k deploy/k8s/base --dry-run=server

Both passed.

Live Runtime Context

Before this change, the live cluster was already deleting files older than the 3-day retention window, but only during successful upload runs. The live disk shape still showed many retained recent files, especially manifests within the retention window. This checkpoint adds a separate verified-file purge phase so older already-verified files can be removed based on durable local evidence.

Strongest Fake-Progress Risk

This is not deployed yet. The current cluster image still runs the previous uploader behavior until a new image is built and the canary deploy is applied.

Next Smallest Step

Commit and push this source change to Forgejo main, run scripts/deploy/deploy_ws_canary_kaniko.sh --git-ref <new-sha>, and then check the next upload_archive_*.json, purge_uploaded_local_*.json, and PVC usage to confirm the live CronJob is purging as designed.