orderbooks/docs/VPS_CUTOVER_RUNBOOK.md
2026-05-02 17:44:33 +02:00

344 lines
10 KiB
Markdown

# VPS Cutover Runbook
Status: valid
Checkpoint 8 status is `WAIVED_BY_USER`, not `PASS`. This runbook prepares a
VPS cutover for the existing Polymarket raw order-book collector only. It does
not claim production readiness, second-market support, dashboards, databases,
strategies, or trading.
## Scope
Included:
- VPS prerequisite checks.
- Repository copy/update steps.
- Public Polymarket collector service install.
- Google Drive offload timer install with rclone.
- Liveness, cycle health, and upload verification commands.
- Rollback and stop commands.
Excluded:
- Private API access.
- Wallets, keys, mnemonics, signing, order placement, or trading.
- Database, dashboard, strategy, or second-market work.
## Recommended VPS Layout
Use the existing package paths unless the VPS has a reason to differ:
```text
repository: /opt/orderbooks
python virtualenv: /opt/orderbooks/.venv
config: /etc/orderbooks/polymarket_collector.vps.yaml
collector env: /etc/orderbooks/polymarket-orderbook-collector.env
uploader env: /etc/orderbooks/orderbook-uploader.env
data root: /var/lib/orderbooks
raw files: /var/lib/orderbooks/raw_orderbooks
manifests: /var/lib/orderbooks/manifests
discovery: /var/lib/orderbooks/discovery
```
The `orderbooks` system user should own `/var/lib/orderbooks`. The repository
under `/opt/orderbooks` can be root-owned and world-readable.
## VPS Prerequisites
On Ubuntu or Debian:
```sh
sudo apt-get update
sudo apt-get install -y git python3 python3-venv rclone
sudo useradd --system --home /var/lib/orderbooks --shell /usr/sbin/nologin orderbooks || true
sudo mkdir -p /opt /etc/orderbooks /var/lib/orderbooks/discovery /var/lib/orderbooks/raw_orderbooks /var/lib/orderbooks/manifests /var/log/orderbooks
sudo chown -R orderbooks:orderbooks /var/lib/orderbooks /var/log/orderbooks
```
No API keys, private keys, mnemonics, wallets, or trading credentials are
required by this project. rclone credentials are the only machine-local
credential material expected for Google Drive offload, and they must stay
outside the repository.
## Copy Or Update The Repository
First install:
```sh
cd /opt
sudo git clone <repo-url> orderbooks
```
Update an existing checkout:
```sh
cd /opt/orderbooks
sudo git fetch --all --prune
sudo git pull --ff-only
```
Prepare repository permissions and the Python virtualenv:
```sh
cd /opt/orderbooks
sudo chmod +x scripts/run_polymarket_collector_cycle.sh scripts/upload_archive_rclone.sh scripts/purge_uploaded_local_files.sh scripts/vps_preflight_check.sh scripts/vps_runtime_smoke_check.sh
sudo python3 -m venv .venv
sudo .venv/bin/python -m pip install --upgrade pip
sudo chown -R root:root /opt/orderbooks
sudo chmod -R a+rX /opt/orderbooks
```
The current collector scripts use the Python standard library.
## Configure Public Collector Runtime
Install the example config, then review it:
```sh
sudo install -o root -g root -m 0644 /opt/orderbooks/config/polymarket_collector.vps.example.yaml /etc/orderbooks/polymarket_collector.vps.yaml
sudo editor /etc/orderbooks/polymarket_collector.vps.yaml
```
Optional collector env overrides:
```sh
sudo install -o root -g orderbooks -m 0640 /dev/null /etc/orderbooks/polymarket-orderbook-collector.env
sudo editor /etc/orderbooks/polymarket-orderbook-collector.env
```
Example values:
```text
ORDERBOOKS_DATA_DIR=/var/lib/orderbooks
ORDERBOOKS_OUTPUT_DIR=/var/lib/orderbooks/raw_orderbooks
ORDERBOOKS_DISCOVERY_MAX_PAGES=3
```
## Configure Rclone
Configure rclone as the `orderbooks` user. Do not print or commit
`rclone.conf`.
```sh
sudo -u orderbooks rclone config
sudo -u orderbooks rclone listremotes
sudo -u orderbooks rclone lsf gdrive: --max-depth 1
```
Create the uploader env file:
```sh
sudo install -o root -g orderbooks -m 0640 /dev/null /etc/orderbooks/orderbook-uploader.env
sudo editor /etc/orderbooks/orderbook-uploader.env
```
Example:
```text
ORDERBOOKS_RCLONE_DEST=gdrive:orderbooks/polymarket
ORDERBOOKS_RCLONE_BIN=/usr/bin/rclone
ORDERBOOKS_UPLOAD_MIN_AGE_SECONDS=600
```
The uploader verifies uploads with `rclone check`. Dry runs do not prove remote
write access. Successful uploads update
`/var/lib/orderbooks/manifests/upload_verified_index.json`, and the uploader
service also runs a purge step that deletes older previously verified local
files after the retention window.
## Run VPS Preflight
Run the preflight before installing or starting services:
```sh
cd /opt/orderbooks
sudo -u orderbooks /opt/orderbooks/scripts/vps_preflight_check.sh \
--app-dir /opt/orderbooks \
--python-bin /opt/orderbooks/.venv/bin/python \
--rclone-bin /usr/bin/rclone \
--rclone-remote gdrive:orderbooks/polymarket \
--data-dir /var/lib/orderbooks \
--manifest-dir /var/lib/orderbooks/manifests \
--log-dir /var/log/orderbooks \
--min-free-gib 5
```
The preflight does not print rclone configuration. It checks repository files,
Python compilation, shell syntax, systemd unit parsing when available, rclone
availability, optional remote readability, target directory writability, disk
space, and the absence of required project secrets.
## Install Systemd Units
Install collector and uploader units:
```sh
sudo install -o root -g root -m 0644 /opt/orderbooks/systemd/polymarket-orderbook-collector.service /etc/systemd/system/polymarket-orderbook-collector.service
sudo install -o root -g root -m 0644 /opt/orderbooks/systemd/polymarket-orderbook-uploader.service /etc/systemd/system/polymarket-orderbook-uploader.service
sudo install -o root -g root -m 0644 /opt/orderbooks/systemd/polymarket-orderbook-uploader.timer /etc/systemd/system/polymarket-orderbook-uploader.timer
sudo systemctl daemon-reload
sudo systemd-analyze verify /etc/systemd/system/polymarket-orderbook-collector.service /etc/systemd/system/polymarket-orderbook-uploader.service /etc/systemd/system/polymarket-orderbook-uploader.timer
```
Enable and start:
```sh
sudo systemctl enable --now polymarket-orderbook-collector.service
sudo systemctl enable --now polymarket-orderbook-uploader.timer
```
Run one uploader cycle immediately after the collector has produced closed raw
files:
```sh
sudo systemctl start polymarket-orderbook-uploader.service
```
Run the minimal runtime reliability smoke gate after both units are installed,
rclone is configured, and at least one closed raw file is older than the
uploader minimum age (default: 600 seconds):
```sh
sudo /opt/orderbooks/scripts/vps_runtime_smoke_check.sh \
--app-dir /opt/orderbooks \
--data-dir /var/lib/orderbooks \
--raw-dir /var/lib/orderbooks/raw_orderbooks \
--manifest-dir /var/lib/orderbooks/manifests \
--collector-service polymarket-orderbook-collector.service \
--uploader-service polymarket-orderbook-uploader.service \
--wait-seconds 900
```
This command is the minimal production reliability gate. It records a JSON
evidence manifest under `/var/lib/orderbooks/manifests/`, verifies a valid
collector cycle, forces one collector service restart, verifies the prior raw
gzip file still parses with the same checksum, waits for a later valid cycle,
starts the uploader, and records upload success or failure evidence. Preserve
failed smoke manifests and journal logs for review.
## Check Liveness
Collector service:
```sh
sudo systemctl status polymarket-orderbook-collector.service
sudo journalctl -u polymarket-orderbook-collector.service --since "30 minutes ago"
```
Uploader timer and service:
```sh
sudo systemctl list-timers polymarket-orderbook-uploader.timer
sudo systemctl status polymarket-orderbook-uploader.service
sudo journalctl -u polymarket-orderbook-uploader.service --since "2 hours ago"
```
Recent artifacts:
```sh
find /var/lib/orderbooks/raw_orderbooks -type f -name '*.jsonl.gz' -printf '%TY-%Tm-%TdT%TH:%TM:%TS %s %p\n' | sort | tail
find /var/lib/orderbooks/manifests -type f -name '*.json' -printf '%TY-%Tm-%TdT%TH:%TM:%TS %s %p\n' | sort | tail
```
## Check Latest Cycle Health
Inspect the newest collector manifest:
```sh
latest_collector="$(find /var/lib/orderbooks/manifests -type f -name 'polymarket_orderbook_collector_*.json' | sort | tail -n 1)"
python3 -m json.tool "$latest_collector" | sed -n '1,180p'
```
Minimum healthy signs:
```text
gate_status: PASS
rows_written: greater than 0
failure_count: 0
failures: []
```
Verify the latest raw gzip parses and row count matches its manifest:
```sh
python3 - "$latest_collector" <<'PY'
import gzip
import json
import sys
from pathlib import Path
manifest = json.loads(Path(sys.argv[1]).read_text())
for item in manifest.get("output_files", []):
path = Path(item["path"])
rows = 0
with gzip.open(path, "rt", encoding="utf-8") as handle:
for line in handle:
if line.strip():
json.loads(line)
rows += 1
print({"path": str(path), "rows": rows, "manifest_rows": item.get("rows"), "matches": rows == item.get("rows")})
PY
```
## Verify Uploads
Inspect the newest upload manifest:
```sh
latest_upload="$(find /var/lib/orderbooks/manifests -type f -name 'upload_archive_*.json' | sort | tail -n 1)"
python3 -m json.tool "$latest_upload" | sed -n '1,220p'
```
Minimum healthy signs:
```text
operation_status: UPLOAD_VERIFIED
gate_status: PASS
rclone.copy_exit_code: 0
rclone.check_exit_code: 0
counts.uploaded equals counts.verified
```
Manual remote spot-check without printing config:
```sh
sudo -u orderbooks rclone lsf "$ORDERBOOKS_RCLONE_DEST" --max-depth 2 | head
```
## Rollback Or Stop
Stop uploader timer first:
```sh
sudo systemctl disable --now polymarket-orderbook-uploader.timer
sudo systemctl stop polymarket-orderbook-uploader.service
```
Stop collector:
```sh
sudo systemctl stop polymarket-orderbook-collector.service
```
Disable collector if needed:
```sh
sudo systemctl disable polymarket-orderbook-collector.service
```
Preserve `/var/lib/orderbooks` and `/var/lib/orderbooks/manifests` for evidence.
If an artifact is wrong, label it as invalid or deprecated in a sibling note
rather than deleting it.
## Still Not Production Proven
Because the domestic 24h soak wait was waived by the user, the following remain
unproven:
- A completed 24h collector run with reviewed final metrics.
- 24h interaction between collector rotation and uploader timer.
- VPS-specific long-run disk, network, rclone, and systemd behavior.
- Retention cleanup behavior under verified upload load.
Treat this as cutover preparation. The VPS is not deployed until the commands
are run on the VPS and evidence is written.