orderbooks/docs/VPS_DEPLOYMENT.md
philipp 284e465588
Some checks failed
deploy / deploy (push) Has been cancelled
Prepare Kubernetes orderbooks deployment
2026-04-18 11:23:28 +02:00

298 lines
8.3 KiB
Markdown

# VPS Deployment
Status: valid
This document covers the Checkpoint 6 systemd runtime package for the raw
Polymarket order-book collector.
It does not claim production readiness or 24/7 reliability. That remains gated
on the later 24h soak test.
## Scope
Included:
- systemd service for the raw collector cycle
- Python virtualenv setup
- service user and directory permissions
- configurable data directory
- discovery refresh before each collector cycle
- journal-based logs
- safe restart model for finite collector runs
Excluded:
- Google Drive offload
- `rclone`
- uploader scripts, services, or timers
- normalization changes
- dashboards
- databases
- strategies or backtests
- trading, order placement, signing, or wallet logic
Uploader service and timer units are intentionally deferred to Checkpoint 7.
## Runtime Model
The systemd service runs:
```text
/opt/orderbooks/scripts/run_polymarket_collector_cycle.sh
```
Each cycle:
1. Refreshes BTC market discovery into the configured data directory.
2. Runs `scripts/collect_polymarket_orderbooks.py` once.
3. Writes run-rotated raw gzip JSONL files.
4. Writes a per-cycle collector manifest.
5. Exits after the configured finite duration.
The unit uses `Restart=always`, so systemd starts the next cycle after the prior
cycle exits or fails.
The example config uses a 300 second collection cycle. This is deliberately
short because current BTC up/down markets are short-lived and the collector
refreshes discovery only before a cycle starts. Do not increase the cycle beyond
the practical market horizon unless the collector later learns to refresh market
selection during a run.
## Paths
Default VPS paths:
| Purpose | Path |
| --- | --- |
| Application checkout | `/opt/orderbooks` |
| Python virtualenv | `/opt/orderbooks/.venv` |
| Service config | `/etc/orderbooks/polymarket_collector.vps.yaml` |
| Optional env override file | `/etc/orderbooks/polymarket-orderbook-collector.env` |
| Data directory | `/var/lib/orderbooks` |
| Discovery artifacts | `/var/lib/orderbooks/discovery` |
| Raw order-book output base | `/var/lib/orderbooks/raw_orderbooks` |
| Per-cycle manifests | `/var/lib/orderbooks/manifests` |
Adjust these paths if the repository is installed somewhere other than
`/opt/orderbooks`.
## Environment Variables
The service defines safe defaults and can load overrides from:
```text
/etc/orderbooks/polymarket-orderbook-collector.env
```
Supported variables:
| Variable | Default | Meaning |
| --- | --- | --- |
| `ORDERBOOKS_APP_DIR` | `/opt/orderbooks` | Repository checkout path. |
| `ORDERBOOKS_DATA_DIR` | `/var/lib/orderbooks` | Base directory for data files. |
| `ORDERBOOKS_PYTHON` | `/opt/orderbooks/.venv/bin/python` | Python interpreter. |
| `ORDERBOOKS_COLLECTOR_CONFIG` | `/etc/orderbooks/polymarket_collector.vps.yaml` | Collector config path. |
| `ORDERBOOKS_DISCOVERY_DIR` | `$ORDERBOOKS_DATA_DIR/discovery` | Discovery artifact directory. |
| `ORDERBOOKS_OUTPUT_DIR` | `$ORDERBOOKS_DATA_DIR/raw_orderbooks` | Collector output base directory. |
| `ORDERBOOKS_MANIFEST_DIR` | `$ORDERBOOKS_DATA_DIR/manifests` | Per-cycle manifest directory. |
| `ORDERBOOKS_DISCOVERY_LIMIT` | `100` | Gamma event page limit per discovery page. |
| `ORDERBOOKS_DISCOVERY_MAX_PAGES` | `3` | Discovery page cap per cycle. |
| `ORDERBOOKS_DISCOVERY_TIMEOUT` | `15` | Discovery request timeout in seconds. |
Example override file:
```text
ORDERBOOKS_DATA_DIR=/var/lib/orderbooks
ORDERBOOKS_DISCOVERY_MAX_PAGES=3
```
No API keys are required for this checkpoint.
## Install On Ubuntu Or Debian
Run package and account setup as root or with `sudo`:
```sh
sudo apt-get update
sudo apt-get install -y git python3 python3-venv
sudo useradd --system --home /var/lib/orderbooks --shell /usr/sbin/nologin orderbooks
sudo mkdir -p /opt /etc/orderbooks /var/lib/orderbooks/discovery /var/lib/orderbooks/raw_orderbooks /var/lib/orderbooks/manifests
```
Install or update the repository under `/opt/orderbooks`. One option is:
```sh
cd /opt
sudo git clone <repo-url> orderbooks
```
If the checkout already exists:
```sh
cd /opt/orderbooks
sudo git pull --ff-only
```
Prepare permissions:
```sh
sudo chown -R root:root /opt/orderbooks
sudo chmod -R a+rX /opt/orderbooks
sudo chmod +x /opt/orderbooks/scripts/run_polymarket_collector_cycle.sh
sudo chown -R orderbooks:orderbooks /var/lib/orderbooks
```
Create the virtualenv:
```sh
cd /opt/orderbooks
sudo python3 -m venv .venv
sudo .venv/bin/python -m pip install --upgrade pip
sudo chown -R root:root .venv
sudo chmod -R a+rX .venv
```
The current Checkpoint 6 scripts use only the Python standard library.
Install the VPS config and service unit:
```sh
sudo install -o root -g root -m 0644 /opt/orderbooks/config/polymarket_collector.vps.example.yaml /etc/orderbooks/polymarket_collector.vps.yaml
sudo install -o root -g root -m 0644 /opt/orderbooks/systemd/polymarket-orderbook-collector.service /etc/systemd/system/polymarket-orderbook-collector.service
```
Review `/etc/orderbooks/polymarket_collector.vps.yaml` before starting the
service. The example writes under `/var/lib/orderbooks`.
Enable and start:
```sh
sudo systemctl daemon-reload
sudo systemctl enable --now polymarket-orderbook-collector.service
```
## Logs And Status
Use the systemd journal:
```sh
sudo systemctl status polymarket-orderbook-collector.service
sudo journalctl -u polymarket-orderbook-collector.service -f
```
Recent logs without following:
```sh
sudo journalctl -u polymarket-orderbook-collector.service --since "1 hour ago"
```
## Output Files
Raw gzip JSONL files are written under:
```text
/var/lib/orderbooks/raw_orderbooks/polymarket/orderbooks/<run_id>/
```
Per-cycle manifests are written under:
```text
/var/lib/orderbooks/manifests/polymarket_orderbook_collector_<cycle_id>.json
```
Discovery artifacts are refreshed under:
```text
/var/lib/orderbooks/discovery/
```
## Restart And Stop Behavior
The unit uses:
```text
Restart=always
RestartSec=30s
TimeoutStopSec=90s
KillSignal=SIGTERM
KillMode=control-group
```
The collector handles `SIGTERM` by finishing or timing out the current request,
closing the gzip output, and writing the manifest. Every cycle writes to a new
run directory, so closed files are not reopened by the next cycle.
Stop the service with:
```sh
sudo systemctl stop polymarket-orderbook-collector.service
```
Start it again with:
```sh
sudo systemctl start polymarket-orderbook-collector.service
```
## Local Validation Without Starting The Service
These checks do not require root:
```sh
python3 -m py_compile scripts/discover_polymarket_btc_markets.py scripts/collect_polymarket_orderbooks.py
bash -n scripts/run_polymarket_collector_cycle.sh
python3 - <<'PY'
from pathlib import Path
from scripts.collect_polymarket_orderbooks import load_flat_yaml
cfg = load_flat_yaml(Path('config/polymarket_collector.vps.example.yaml'))
required = {
'discovery_path',
'output_dir',
'manifest_path',
'market_limit',
'interval_seconds',
'duration_seconds',
}
missing = sorted(required - set(cfg))
assert not missing, missing
assert cfg['duration_seconds'] > 0
print('config parse ok')
PY
```
If systemd tools are available locally:
```sh
systemd-analyze verify systemd/polymarket-orderbook-collector.service
```
The local machine may not have `/opt/orderbooks` or the `orderbooks` service
user. Treat missing VPS path or user messages as deployment-environment warnings,
not collector syntax failures.
## Safe Upgrade
Stop the service, update files, rerun validation, then start the service:
```sh
sudo systemctl stop polymarket-orderbook-collector.service
cd /opt/orderbooks
sudo git pull --ff-only
sudo .venv/bin/python -m py_compile scripts/discover_polymarket_btc_markets.py scripts/collect_polymarket_orderbooks.py
sudo systemctl daemon-reload
sudo systemctl start polymarket-orderbook-collector.service
```
Do not remove existing data files during an upgrade. If a bad artifact is found,
preserve it and label it invalid or deprecated with a replacement path when one
exists.
## Current Limits
- This package runs the existing raw collector; it does not add a daemon inside
Python.
- The systemd loop is a restart model around finite collector cycles.
- It does not upload files.
- It does not prove long-run reliability.
- Production readiness remains blocked until discovery, raw collection, offload,
and a documented 24h soak test all pass.