294 lines
6.2 KiB
Markdown
294 lines
6.2 KiB
Markdown
# Google Drive Offload
|
|
|
|
Status: valid
|
|
|
|
This document covers Checkpoint 7: offloading closed raw collector files and
|
|
manifests to Google Drive with `rclone`.
|
|
|
|
This checkpoint does not prove production readiness or 24/7 reliability. A real
|
|
small upload must be run with a configured remote, and the later 24h soak test
|
|
must still pass.
|
|
|
|
## Scope
|
|
|
|
Included:
|
|
|
|
- `scripts/upload_archive_rclone.sh`
|
|
- `systemd/polymarket-orderbook-uploader.service`
|
|
- `systemd/polymarket-orderbook-uploader.timer`
|
|
- dry-run mode by default
|
|
- real upload only with `--execute`
|
|
- rclone verification with `rclone check`
|
|
- per-run upload manifests
|
|
- optional local cleanup only after successful verification
|
|
|
|
Excluded:
|
|
|
|
- dashboards
|
|
- databases
|
|
- strategies or backtests
|
|
- trading, signing, order placement, or wallet logic
|
|
- hardcoded private auth material
|
|
|
|
## Install rclone
|
|
|
|
On Ubuntu or Debian:
|
|
|
|
```sh
|
|
sudo apt-get update
|
|
sudo apt-get install -y rclone
|
|
```
|
|
|
|
Confirm:
|
|
|
|
```sh
|
|
rclone version
|
|
```
|
|
|
|
## Configure A Google Drive Remote
|
|
|
|
Configure the remote outside this repository. For a service-user setup:
|
|
|
|
```sh
|
|
sudo -u orderbooks rclone config
|
|
sudo -u orderbooks rclone lsd gdrive:
|
|
```
|
|
|
|
The example remote path is:
|
|
|
|
```text
|
|
gdrive:orderbooks/polymarket
|
|
```
|
|
|
|
Any valid `rclone` destination may be used. The uploader reads it from:
|
|
|
|
```text
|
|
ORDERBOOKS_RCLONE_DEST
|
|
```
|
|
|
|
For systemd, create:
|
|
|
|
```text
|
|
/etc/orderbooks/orderbook-uploader.env
|
|
```
|
|
|
|
Example:
|
|
|
|
```text
|
|
ORDERBOOKS_RCLONE_DEST=gdrive:orderbooks/polymarket
|
|
```
|
|
|
|
Do not commit the machine-local rclone config or any private auth material.
|
|
|
|
## What Gets Uploaded
|
|
|
|
By default the script targets:
|
|
|
|
| Source | Default path |
|
|
| --- | --- |
|
|
| raw collector files | `/var/lib/orderbooks/raw_orderbooks` |
|
|
| collector manifests | `/var/lib/orderbooks/manifests` |
|
|
|
|
It does not target normalized sample files by default.
|
|
|
|
Files modified within the last 10 minutes are skipped to avoid active collector
|
|
files:
|
|
|
|
```text
|
|
ORDERBOOKS_UPLOAD_MIN_AGE_SECONDS=600
|
|
```
|
|
|
|
The script preserves repository/data-directory relative paths on the remote. For
|
|
example:
|
|
|
|
```text
|
|
/var/lib/orderbooks/raw_orderbooks/polymarket/orderbooks/<run_id>/file.jsonl.gz
|
|
```
|
|
|
|
uploads to:
|
|
|
|
```text
|
|
<remote>/raw_orderbooks/polymarket/orderbooks/<run_id>/file.jsonl.gz
|
|
```
|
|
|
|
## Dry Run
|
|
|
|
Dry-run is the default. It plans files, stages a temporary copy, invokes
|
|
`rclone copy --dry-run`, and writes an upload manifest.
|
|
|
|
Example for a VPS:
|
|
|
|
```sh
|
|
/opt/orderbooks/scripts/upload_archive_rclone.sh \
|
|
--data-dir /var/lib/orderbooks \
|
|
--dest "$ORDERBOOKS_RCLONE_DEST"
|
|
```
|
|
|
|
Example against the repository sample data:
|
|
|
|
```sh
|
|
scripts/upload_archive_rclone.sh \
|
|
--data-dir data \
|
|
--dest gdrive:orderbooks/polymarket/checkpoint7-test \
|
|
--manifest-path data/manifests/upload_archive_real_test_dry_run_manifest.json \
|
|
--min-age-seconds 0 \
|
|
--rclone-bin /usr/bin/rclone
|
|
```
|
|
|
|
Dry-run does not prove remote write access.
|
|
|
|
## Execute Upload
|
|
|
|
Run a real upload only after the remote is configured and the dry-run plan looks
|
|
right:
|
|
|
|
```sh
|
|
/opt/orderbooks/scripts/upload_archive_rclone.sh \
|
|
--execute \
|
|
--data-dir /var/lib/orderbooks \
|
|
--dest "$ORDERBOOKS_RCLONE_DEST"
|
|
```
|
|
|
|
The script runs:
|
|
|
|
```text
|
|
rclone copy <staged files> <remote> --checksum
|
|
rclone check <staged files> <remote> --one-way --checksum
|
|
```
|
|
|
|
The upload gate is `PASS` only when the copy succeeds and verification succeeds.
|
|
|
|
## Retention And Cleanup
|
|
|
|
Local files are kept by default, even after upload verification.
|
|
|
|
Cleanup requires an explicit flag:
|
|
|
|
```sh
|
|
/opt/orderbooks/scripts/upload_archive_rclone.sh \
|
|
--execute \
|
|
--cleanup-after-verify \
|
|
--retention-days 7 \
|
|
--data-dir /var/lib/orderbooks \
|
|
--dest "$ORDERBOOKS_RCLONE_DEST"
|
|
```
|
|
|
|
Cleanup deletes only files that were selected for upload, uploaded, verified, and
|
|
older than the retention window. The default retention window is 7 days.
|
|
|
|
## Upload Manifest
|
|
|
|
Each run writes a manifest such as:
|
|
|
|
```text
|
|
/var/lib/orderbooks/manifests/upload_archive_YYYYMMDDTHHMMSSZ.json
|
|
```
|
|
|
|
The manifest records:
|
|
|
|
- planned files
|
|
- attempted files
|
|
- dry-run files
|
|
- uploaded files
|
|
- verified files
|
|
- skipped open or recent files
|
|
- retained local files
|
|
- deleted local files
|
|
- SHA-256 checksums
|
|
- command mode
|
|
- start/end time
|
|
- rclone copy/check exit codes
|
|
- gate status
|
|
|
|
For this repository, the sample manifest path is:
|
|
|
|
```text
|
|
data/manifests/upload_archive_sample_manifest.json
|
|
```
|
|
|
|
The verified Checkpoint 7 real-test manifest is:
|
|
|
|
```text
|
|
data/manifests/upload_archive_real_test_manifest.json
|
|
```
|
|
|
|
## systemd Timer
|
|
|
|
Install the unit files:
|
|
|
|
```sh
|
|
sudo install -o root -g root -m 0644 /opt/orderbooks/systemd/polymarket-orderbook-uploader.service /etc/systemd/system/polymarket-orderbook-uploader.service
|
|
sudo install -o root -g root -m 0644 /opt/orderbooks/systemd/polymarket-orderbook-uploader.timer /etc/systemd/system/polymarket-orderbook-uploader.timer
|
|
sudo systemctl daemon-reload
|
|
```
|
|
|
|
Create the environment file:
|
|
|
|
```sh
|
|
sudo install -o root -g orderbooks -m 0640 /dev/null /etc/orderbooks/orderbook-uploader.env
|
|
sudo editor /etc/orderbooks/orderbook-uploader.env
|
|
```
|
|
|
|
At minimum, set:
|
|
|
|
```text
|
|
ORDERBOOKS_RCLONE_DEST=gdrive:orderbooks/polymarket
|
|
```
|
|
|
|
Enable the timer:
|
|
|
|
```sh
|
|
sudo systemctl enable --now polymarket-orderbook-uploader.timer
|
|
```
|
|
|
|
Run one upload immediately:
|
|
|
|
```sh
|
|
sudo systemctl start polymarket-orderbook-uploader.service
|
|
```
|
|
|
|
## Logs
|
|
|
|
Use the systemd journal:
|
|
|
|
```sh
|
|
sudo systemctl status polymarket-orderbook-uploader.service
|
|
sudo journalctl -u polymarket-orderbook-uploader.service -f
|
|
sudo systemctl list-timers polymarket-orderbook-uploader.timer
|
|
```
|
|
|
|
## Current Checkpoint 7 Result
|
|
|
|
Initial local validation was blocked when `rclone` was unavailable. That blocked
|
|
manifest remains at:
|
|
|
|
```text
|
|
data/manifests/upload_archive_sample_manifest.json
|
|
```
|
|
|
|
After `rclone` was configured as `/usr/bin/rclone` with remote `gdrive:`, a dry
|
|
run and one tiny real upload were run against:
|
|
|
|
```text
|
|
gdrive:orderbooks/polymarket/checkpoint7-test
|
|
```
|
|
|
|
The real upload manifest records `rclone copy` exit code 0 and `rclone check`
|
|
exit code 0:
|
|
|
|
```text
|
|
data/manifests/upload_archive_real_test_manifest.json
|
|
```
|
|
|
|
Current gate:
|
|
|
|
```text
|
|
PASS
|
|
```
|
|
|
|
## What Remains Unproven
|
|
|
|
- Long-run upload reliability.
|
|
- Interaction between hourly uploads and a 24h collector soak test.
|
|
- Retention cleanup after verified upload.
|
|
- Production readiness.
|