# Google Drive Offload Status: valid This document covers Checkpoint 7: offloading closed raw collector files and manifests to Google Drive with `rclone`. This checkpoint does not prove production readiness or 24/7 reliability. A real small upload must be run with a configured remote, and the later 24h soak test must still pass. ## Scope Included: - `scripts/upload_archive_rclone.sh` - `scripts/purge_uploaded_local_files.sh` - `systemd/polymarket-orderbook-uploader.service` - `systemd/polymarket-orderbook-uploader.timer` - dry-run mode by default - real upload only with `--execute` - rclone verification with `rclone check` - per-run upload manifests - verified-upload index tracking - periodic local purge of previously verified files Excluded: - dashboards - databases - strategies or backtests - trading, signing, order placement, or wallet logic - hardcoded private auth material ## Install rclone On Ubuntu or Debian: ```sh sudo apt-get update sudo apt-get install -y rclone ``` Confirm: ```sh rclone version ``` ## Configure A Google Drive Remote Configure the remote outside this repository. For a service-user setup: ```sh sudo -u orderbooks rclone config sudo -u orderbooks rclone lsd gdrive: ``` The example remote path is: ```text gdrive:orderbooks/polymarket ``` Any valid `rclone` destination may be used. The uploader reads it from: ```text ORDERBOOKS_RCLONE_DEST ``` For systemd, create: ```text /etc/orderbooks/orderbook-uploader.env ``` Example: ```text ORDERBOOKS_RCLONE_DEST=gdrive:orderbooks/polymarket ``` Do not commit the machine-local rclone config or any private auth material. ## What Gets Uploaded By default the script targets: | Source | Default path | | --- | --- | | raw collector files | `/var/lib/orderbooks/raw_orderbooks` | | collector manifests | `/var/lib/orderbooks/manifests` | It does not target normalized sample files by default. Files modified within the last 10 minutes are skipped to avoid active collector files: ```text ORDERBOOKS_UPLOAD_MIN_AGE_SECONDS=600 ``` The script preserves repository/data-directory relative paths on the remote. For example: ```text /var/lib/orderbooks/raw_orderbooks/polymarket/orderbooks//file.jsonl.gz ``` uploads to: ```text /raw_orderbooks/polymarket/orderbooks//file.jsonl.gz ``` ## Dry Run Dry-run is the default. It plans files, stages a temporary copy, invokes `rclone copy --dry-run`, and writes an upload manifest. Example for a VPS: ```sh /opt/orderbooks/scripts/upload_archive_rclone.sh \ --data-dir /var/lib/orderbooks \ --dest "$ORDERBOOKS_RCLONE_DEST" ``` Example against the repository sample data: ```sh scripts/upload_archive_rclone.sh \ --data-dir data \ --dest gdrive:orderbooks/polymarket/checkpoint7-test \ --manifest-path data/manifests/upload_archive_real_test_dry_run_manifest.json \ --min-age-seconds 0 \ --rclone-bin /usr/bin/rclone ``` Dry-run does not prove remote write access. ## Execute Upload Run a real upload only after the remote is configured and the dry-run plan looks right: ```sh /opt/orderbooks/scripts/upload_archive_rclone.sh \ --execute \ --data-dir /var/lib/orderbooks \ --dest "$ORDERBOOKS_RCLONE_DEST" ``` The script runs: ```text rclone copy --checksum rclone check --one-way --checksum ``` The upload gate is `PASS` only when the copy succeeds and verification succeeds. ## Retention And Cleanup Local files are kept by default, even after upload verification. Immediate same-run cleanup requires an explicit flag: ```sh /opt/orderbooks/scripts/upload_archive_rclone.sh \ --execute \ --cleanup-after-verify \ --retention-days 7 \ --data-dir /var/lib/orderbooks \ --dest "$ORDERBOOKS_RCLONE_DEST" ``` Cleanup deletes only files that were selected for upload, uploaded, verified, and older than the retention window. The default retention window is 7 days. The uploader also maintains a durable verified-upload index at: ```text /var/lib/orderbooks/manifests/upload_verified_index.json ``` That index records files that have already passed `rclone copy` and `rclone check`. The periodic purge step uses that index to delete previously verified local files after the retention window, even when the current upload run is not the one that first verified them. Run the purge manually with: ```sh /opt/orderbooks/scripts/purge_uploaded_local_files.sh \ --execute \ --data-dir /var/lib/orderbooks \ --retention-days 7 ``` The periodic systemd/Kubernetes runtime runs upload and purge together. ## Upload Manifest Each run writes a manifest such as: ```text /var/lib/orderbooks/manifests/upload_archive_YYYYMMDDTHHMMSSZ.json ``` The manifest records: - planned files - attempted files - dry-run files - uploaded files - verified files - skipped open or recent files - retained local files - deleted local files - SHA-256 checksums - command mode - start/end time - rclone copy/check exit codes - gate status - verified-upload index update summary Each purge run writes a separate manifest such as: ```text /var/lib/orderbooks/manifests/purge_uploaded_local_YYYYMMDDTHHMMSSZ.json ``` The purge manifest records: - verified-index path and record count - eligible files older than retention - deleted local files - skipped files such as checksum mismatches - retention configuration - gate and operation status For this repository, the sample manifest path is: ```text data/manifests/upload_archive_sample_manifest.json ``` The verified Checkpoint 7 real-test manifest is: ```text data/manifests/upload_archive_real_test_manifest.json ``` ## systemd Timer Install the unit files: ```sh sudo install -o root -g root -m 0644 /opt/orderbooks/systemd/polymarket-orderbook-uploader.service /etc/systemd/system/polymarket-orderbook-uploader.service sudo install -o root -g root -m 0644 /opt/orderbooks/systemd/polymarket-orderbook-uploader.timer /etc/systemd/system/polymarket-orderbook-uploader.timer sudo systemctl daemon-reload ``` Create the environment file: ```sh sudo install -o root -g orderbooks -m 0640 /dev/null /etc/orderbooks/orderbook-uploader.env sudo editor /etc/orderbooks/orderbook-uploader.env ``` At minimum, set: ```text ORDERBOOKS_RCLONE_DEST=gdrive:orderbooks/polymarket ``` Enable the timer: ```sh sudo systemctl enable --now polymarket-orderbook-uploader.timer ``` Run one upload immediately: ```sh sudo systemctl start polymarket-orderbook-uploader.service ``` That service now runs upload verification first and then runs the verified-file purge step in the same timer cycle. ## Logs Use the systemd journal: ```sh sudo systemctl status polymarket-orderbook-uploader.service sudo journalctl -u polymarket-orderbook-uploader.service -f sudo systemctl list-timers polymarket-orderbook-uploader.timer ``` ## Current Checkpoint 7 Result Initial local validation was blocked when `rclone` was unavailable. That blocked manifest remains at: ```text data/manifests/upload_archive_sample_manifest.json ``` After `rclone` was configured as `/usr/bin/rclone` with remote `gdrive:`, a dry run and one tiny real upload were run against: ```text gdrive:orderbooks/polymarket/checkpoint7-test ``` The real upload manifest records `rclone copy` exit code 0 and `rclone check` exit code 0: ```text data/manifests/upload_archive_real_test_manifest.json ``` Current gate: ```text PASS ``` ## What Remains Unproven - Long-run upload reliability. - Interaction between hourly uploads and a 24h collector soak test. - Long-run purge behavior under repeated intermittent `rclone check` failures. - Production readiness.