orderbooks/docs/GOOGLE_DRIVE_OFFLOAD.md
2026-05-02 17:44:33 +02:00

7.5 KiB

Google Drive Offload

Status: valid

This document covers Checkpoint 7: offloading closed raw collector files and manifests to Google Drive with rclone.

This checkpoint does not prove production readiness or 24/7 reliability. A real small upload must be run with a configured remote, and the later 24h soak test must still pass.

Scope

Included:

  • scripts/upload_archive_rclone.sh
  • scripts/purge_uploaded_local_files.sh
  • systemd/polymarket-orderbook-uploader.service
  • systemd/polymarket-orderbook-uploader.timer
  • dry-run mode by default
  • real upload only with --execute
  • rclone verification with rclone check
  • per-run upload manifests
  • verified-upload index tracking
  • periodic local purge of previously verified files

Excluded:

  • dashboards
  • databases
  • strategies or backtests
  • trading, signing, order placement, or wallet logic
  • hardcoded private auth material

Install rclone

On Ubuntu or Debian:

sudo apt-get update
sudo apt-get install -y rclone

Confirm:

rclone version

Configure A Google Drive Remote

Configure the remote outside this repository. For a service-user setup:

sudo -u orderbooks rclone config
sudo -u orderbooks rclone lsd gdrive:

The example remote path is:

gdrive:orderbooks/polymarket

Any valid rclone destination may be used. The uploader reads it from:

ORDERBOOKS_RCLONE_DEST

For systemd, create:

/etc/orderbooks/orderbook-uploader.env

Example:

ORDERBOOKS_RCLONE_DEST=gdrive:orderbooks/polymarket

Do not commit the machine-local rclone config or any private auth material.

What Gets Uploaded

By default the script targets:

Source Default path
raw collector files /var/lib/orderbooks/raw_orderbooks
collector manifests /var/lib/orderbooks/manifests

It does not target normalized sample files by default.

Files modified within the last 10 minutes are skipped to avoid active collector files:

ORDERBOOKS_UPLOAD_MIN_AGE_SECONDS=600

The script preserves repository/data-directory relative paths on the remote. For example:

/var/lib/orderbooks/raw_orderbooks/polymarket/orderbooks/<run_id>/file.jsonl.gz

uploads to:

<remote>/raw_orderbooks/polymarket/orderbooks/<run_id>/file.jsonl.gz

Dry Run

Dry-run is the default. It plans files, stages a temporary copy, invokes rclone copy --dry-run, and writes an upload manifest.

Example for a VPS:

/opt/orderbooks/scripts/upload_archive_rclone.sh \
  --data-dir /var/lib/orderbooks \
  --dest "$ORDERBOOKS_RCLONE_DEST"

Example against the repository sample data:

scripts/upload_archive_rclone.sh \
  --data-dir data \
  --dest gdrive:orderbooks/polymarket/checkpoint7-test \
  --manifest-path data/manifests/upload_archive_real_test_dry_run_manifest.json \
  --min-age-seconds 0 \
  --rclone-bin /usr/bin/rclone

Dry-run does not prove remote write access.

Execute Upload

Run a real upload only after the remote is configured and the dry-run plan looks right:

/opt/orderbooks/scripts/upload_archive_rclone.sh \
  --execute \
  --data-dir /var/lib/orderbooks \
  --dest "$ORDERBOOKS_RCLONE_DEST"

The script runs:

rclone copy <staged files> <remote> --checksum
rclone check <staged files> <remote> --one-way --checksum

The upload gate is PASS only when the copy succeeds and verification succeeds.

Retention And Cleanup

Local files are kept by default, even after upload verification.

Immediate same-run cleanup requires an explicit flag:

/opt/orderbooks/scripts/upload_archive_rclone.sh \
  --execute \
  --cleanup-after-verify \
  --retention-days 7 \
  --data-dir /var/lib/orderbooks \
  --dest "$ORDERBOOKS_RCLONE_DEST"

Cleanup deletes only files that were selected for upload, uploaded, verified, and older than the retention window. The default retention window is 7 days.

The uploader also maintains a durable verified-upload index at:

/var/lib/orderbooks/manifests/upload_verified_index.json

That index records files that have already passed rclone copy and rclone check. The periodic purge step uses that index to delete previously verified local files after the retention window, even when the current upload run is not the one that first verified them.

Run the purge manually with:

/opt/orderbooks/scripts/purge_uploaded_local_files.sh \
  --execute \
  --data-dir /var/lib/orderbooks \
  --retention-days 7

The periodic systemd/Kubernetes runtime runs upload and purge together.

Upload Manifest

Each run writes a manifest such as:

/var/lib/orderbooks/manifests/upload_archive_YYYYMMDDTHHMMSSZ.json

The manifest records:

  • planned files
  • attempted files
  • dry-run files
  • uploaded files
  • verified files
  • skipped open or recent files
  • retained local files
  • deleted local files
  • SHA-256 checksums
  • command mode
  • start/end time
  • rclone copy/check exit codes
  • gate status
  • verified-upload index update summary

Each purge run writes a separate manifest such as:

/var/lib/orderbooks/manifests/purge_uploaded_local_YYYYMMDDTHHMMSSZ.json

The purge manifest records:

  • verified-index path and record count
  • eligible files older than retention
  • deleted local files
  • skipped files such as checksum mismatches
  • retention configuration
  • gate and operation status

For this repository, the sample manifest path is:

data/manifests/upload_archive_sample_manifest.json

The verified Checkpoint 7 real-test manifest is:

data/manifests/upload_archive_real_test_manifest.json

systemd Timer

Install the unit files:

sudo install -o root -g root -m 0644 /opt/orderbooks/systemd/polymarket-orderbook-uploader.service /etc/systemd/system/polymarket-orderbook-uploader.service
sudo install -o root -g root -m 0644 /opt/orderbooks/systemd/polymarket-orderbook-uploader.timer /etc/systemd/system/polymarket-orderbook-uploader.timer
sudo systemctl daemon-reload

Create the environment file:

sudo install -o root -g orderbooks -m 0640 /dev/null /etc/orderbooks/orderbook-uploader.env
sudo editor /etc/orderbooks/orderbook-uploader.env

At minimum, set:

ORDERBOOKS_RCLONE_DEST=gdrive:orderbooks/polymarket

Enable the timer:

sudo systemctl enable --now polymarket-orderbook-uploader.timer

Run one upload immediately:

sudo systemctl start polymarket-orderbook-uploader.service

That service now runs upload verification first and then runs the verified-file purge step in the same timer cycle.

Logs

Use the systemd journal:

sudo systemctl status polymarket-orderbook-uploader.service
sudo journalctl -u polymarket-orderbook-uploader.service -f
sudo systemctl list-timers polymarket-orderbook-uploader.timer

Current Checkpoint 7 Result

Initial local validation was blocked when rclone was unavailable. That blocked manifest remains at:

data/manifests/upload_archive_sample_manifest.json

After rclone was configured as /usr/bin/rclone with remote gdrive:, a dry run and one tiny real upload were run against:

gdrive:orderbooks/polymarket/checkpoint7-test

The real upload manifest records rclone copy exit code 0 and rclone check exit code 0:

data/manifests/upload_archive_real_test_manifest.json

Current gate:

PASS

What Remains Unproven

  • Long-run upload reliability.
  • Interaction between hourly uploads and a 24h collector soak test.
  • Long-run purge behavior under repeated intermittent rclone check failures.
  • Production readiness.