orderbooks/AGENTS.md

# Agent Instructions

Project: Cross-Market Live Orderbook Archive

This repository exists to preserve live market microstructure data that is usually lost: order books, spreads, liquidity, depth, timestamps, request metadata, and enough raw context to later decide whether a trading idea was observable, fillable, and reproducible at the time.

The first market is Polymarket. Future markets may include NEAR-related venues and other prediction or crypto markets, but do not build generic multi-market infrastructure before the second market exists.

## Active Collaboration Model

This project uses a two-role workflow:

- `orchestrator`: coordinates checkpoints with the user, keeps scope narrow, records decisions, reviews evidence, states gates, and decides the next smallest step.
- `builder`: works in a separate session to implement the active checkpoint artifacts, run commands, collect evidence, and write manifests/reports.

The current primary chat session is the `orchestrator`. The orchestrator should not silently become the builder unless the user explicitly asks. The builder should treat `AGENTS.md`, `ROADMAP.md`, `docs/METHODOLOGY.md`, and the active checkpoint report as the durable source of instructions.

Hand-offs between orchestrator and builder must be written to disk under `orchestration/` or `reports/checkpoints/` when they contain decisions, scope changes, endpoint findings, or validation results. Chat-only instructions are not enough for project-critical state.

## Non-Negotiable Rules

1. Preserve raw data first. Raw API and websocket payloads are the source of truth. Derived datasets are secondary and must reference raw files.
2. No trading. Do not add order placement, signing, private-key handling, wallet logic, strategy execution, or bot behavior.
3. No secrets in the repo. Never commit API keys, rclone credentials, wallet material, cookies, or private endpoints.
4. Every checkpoint needs durable evidence on disk: code or docs, config or run instructions, manifest/report, and validation evidence.
5. Do not claim success without commands, outputs, files, checksums, or real collected data to support the claim.
6. Do not delete mistakes. If an artifact is wrong, misleading, partial, or deprecated, preserve it and label it with a reason and replacement.
7. Keep the scope narrow. No dashboard, database, ML, strategy, backtest, or generic framework until the roadmap gate allows it.
8. Public data only unless a later checkpoint explicitly documents why authenticated public-data access is required.
9. "Production-ready" is forbidden until the collector has completed a documented 24h soak test with acceptable quality.

## Expected Workflow

For each checkpoint:

1. Define the smallest useful checkpoint.
2. Build only what is needed for that checkpoint.
3. Validate with real commands and, when applicable, real public data.
4. Write a machine-readable manifest and a short markdown note.
5. State PASS, FAIL, or BLOCKED.
6. Identify the strongest fake-progress risk.
7. Recommend the next smallest step.
8. Stop only when a real user or orchestrator decision is needed.

## Repository Conventions

- `scripts/`: executable probes, discovery scripts, collectors, normalizers, and upload helpers.
- `config/`: example configuration only. Real secrets and machine-local config stay outside git.
- `docs/`: durable methodology, data contracts, operational runbooks, and endpoint notes.
- `orchestration/prompts/`: prompts and templates used by future agents.
- `data/probes/`: bounded endpoint probe outputs and probe notes.
- `data/discovery/`: market discovery outputs and manifests.
- `data/live_sample/`: short sample collector runs.
- `data/normalized_sample/`: derived sample outputs generated from raw samples.
- `data/manifests/`: machine-readable manifests for probes, collectors, normalization, uploads, and checkpoints.
- `reports/`: human-readable checkpoint, soak test, and incident reports.
- `systemd/`: VPS runtime units when added.

The initial Polymarket implementation should remain simple scripts until the collector works. Introduce `collectors/<market_name>/` only when adding a second market or when duplication proves painful.

## Artifact Status Labels

Every durable artifact should be treated as one of:

- `valid`: current and usable.
- `partial`: useful but incomplete.
- `deprecated`: superseded by a newer artifact.
- `invalid`: known to be wrong or misleading.

When marking an artifact `deprecated` or `invalid`, write a sibling markdown note or manifest entry with:

- original artifact path
- status
- reason
- replacement path, if any
- labeled_at_utc
- labeled_by

Do not remove the original artifact unless the user explicitly asks and there is a written reason.

## Adding New Market Connectors Later

Before adding a second market, Polymarket must have working discovery, raw order-book collection, Google Drive offload, and a 24h soak test.

When the gate is met:

1. Create `collectors/<market_name>/` for market-specific code.
2. Keep shared code minimal and concrete.
3. Reuse the same raw-first file layout and manifest format.
4. Document endpoint quirks, timestamp semantics, rate limits, and schema differences in `docs/`.
5. Avoid abstract base classes until at least two real collectors expose repeated code that is painful to maintain.