# near-intents-monitor Production-shaped first slice of the trading system: - **venue ingest**: NEAR Intents solver-bus quote flow - **bus**: Redpanda first, Kafka-compatible by design - **reactor**: dummy decision engine emitting commands - **executor**: dummy execution worker with durable idempotency state - **result consumer**: downstream observer of execution outcomes ## Canonical repo shape ```text src/ apps/ near-intents-ingest.mjs dummy-reactor.mjs dummy-executor.mjs dummy-consumer.mjs bus/ kafka/ producer.mjs consumer.mjs core/ event-envelope.mjs executor-state-store.mjs log.mjs pair-filter.mjs schemas.mjs lib/ config.mjs env.mjs venues/ near-intents/ ingest.mjs normalize.mjs ws.mjs compose.yml Dockerfile docs/contracts.md deploy/hetzner/README.md ``` ## Event flow ```text NEAR Intents WebSocket | +--> raw.near_intents.quote | v norm.swap_demand | v cmd.execute_trade | v exec.trade_result ``` Core rule: services do not call each other directly for trading flow; they communicate through bus topics only. ## Contracts See `docs/contracts.md`. Current topics: - `raw.near_intents.quote` - `norm.swap_demand` - `cmd.execute_trade` - `exec.trade_result` ## Primary deployment path: repo-driven Hetzner bootstrap The primary production path is no longer a Compose-only VM workflow. The intended operating model is: - Terraform provisions a Hetzner single-node environment - cloud-init installs k3s automatically on first boot - a local operator workstation performs the first repo-driven bootstrap - Kubernetes manifests install Redpanda, the app workloads, Forgejo, runner, registry, and ingress-related components - once the in-cluster Git + CI stack is alive, routine app deploys move to self-hosted CI This is a two-phase model: - **Phase 0:** local workstation bootstrap of a brand-new cluster - **Phase 1:** self-hosted Forgejo + runner takes over app delivery Compose still exists for local development and optional single-machine testing, but it is not the canonical production story. ## Prerequisites for first deployment Install locally on the operator workstation: - Terraform `>= 1.6` - `kubectl` - `docker` - `curl` You also need: - a Hetzner Cloud API token - a local SSH public key file for Terraform node provisioning - DNS control for your chosen base domain and Forgejo hostname - preferably a Tailscale tailnet and auth key for private admin/control-plane access - the repo checked out locally ## Required bootstrap secrets and inputs Create the bootstrap env file: ```bash cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env ``` Set at least: - `HCLOUD_TOKEN` - `SSH_PUBLIC_KEY_PATH` - `PUBLIC_DOMAIN` - recommended: - `TAILSCALE_AUTH_KEY` - `TAILSCALE_CONTROL_PLANE_HOSTNAME` - optional fallback: - `TF_ADMIN_CIDR_BLOCKS` - `BASE_DOMAIN` - `FORGEJO_DOMAIN` - `FORGEJO_ROOT_URL` - `REGISTRY_DOMAIN` - `LETSENCRYPT_EMAIL` - `REGISTRY_USERNAME` - `REGISTRY_PASSWORD` - `NEAR_INTENTS_API_KEY` - `FORGEJO_RUNNER_REGISTRATION_TOKEN` - optional DNS automation: - Cloudflare: - `CLOUDFLARE_API_TOKEN` - `CLOUDFLARE_ZONE_ID` - Porkbun: - `PORKBUN_API_KEY` - `PORKBUN_SECRET_API_KEY` Then load them: ```bash source scripts/hetzner/bootstrap-secrets.env ``` ## First bootstrap sequence Run the end-to-end bootstrap from repo root: ```bash bash scripts/hetzner/bootstrap.sh ``` Current repo behavior of that script: 1. runs Terraform in `infra/terraform/hetzner` 2. optionally creates DNS records for the base, Forgejo, and registry hosts via Cloudflare or Porkbun 3. if configured, joins the node to Tailscale and prefers the Tailscale control-plane hostname for Kubernetes API access 4. waits for SSH and the k3s API endpoint to become ready 5. fetches the real k3s kubeconfig from the node and writes it to `.state/hetzner/kubeconfig.yaml` 6. renders the Hetzner single-node overlay from local operator inputs 7. creates registry pull/auth secrets 8. applies the Kubernetes bootstrap manifests 9. builds the app image locally and imports it into k3s on the node 10. performs the first rollout using the imported bootstrap image Use the generated kubeconfig afterward: ```bash export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml kubectl get nodes -o wide kubectl get pods -A kubectl -n unrip get deploy,pods kubectl -n forgejo get deploy,pods,svc ``` ## What is deployed into k3s The repo-managed Kubernetes assets are under `deploy/k8s/`. Current single-node target includes resources for: - `unrip` workloads in namespace `unrip` - Redpanda - Forgejo - Forgejo runner - private registry - ingress-nginx namespace/resources - cert-manager namespace/resources - ACME issuers and ingress definitions - a bootstrap job for Redpanda topic creation Shared platform namespaces: - `forgejo` - `registry` - `ingress-nginx` - `cert-manager` Project-specific namespaces: - `unrip` - future projects should get their own namespace rather than sharing `unrip` Important current-state nuance: - the bootstrap script currently applies `deploy/k8s/base` - the longer-term intended target is `deploy/k8s/overlays/hetzner-single-node` ## Executor persistence in k3s The executor is stateful by design because it persists idempotency/execution tracking. Current persistence boundary: - app env uses `EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state` - in Kubernetes, the executor deployment mounts storage at that path - the Hetzner single-node overlay pins storage to the k3s `local-path` storage class - cloud-init also prepares the host directory boundary for executor state on first boot Operational meaning: - executor state lives on node-backed storage in the single-node k3s environment - if that PVC or underlying node storage is lost, duplicate-suppression history is lost too - treat executor persistence as part of the minimal durable state of the cluster ## Failure recovery and operator checks ### If bootstrap fails before Terraform completes Re-run after fixing the local input problem: - missing token - invalid CIDRs - invalid SSH public key path If the infrastructure must be torn down: ```bash source scripts/hetzner/bootstrap-secrets.env bash scripts/hetzner/destroy.sh ``` ### If Terraform succeeds but Kubernetes is not ready Check the public API and cluster state from the workstation: ```bash export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml kubectl get nodes -o wide kubectl get pods -A kubectl get events -A --sort-by=.lastTimestamp | tail -n 50 ``` Typical next checks: - cloud-init may still be finishing - k3s may still be starting - a workload may be crash-looping due to missing secret values or image-delivery issues ### If workloads do not roll out Inspect the affected namespace: ```bash kubectl -n unrip get pods kubectl -n unrip describe pod kubectl -n unrip logs deploy/dummy-executor --tail=100 kubectl -n forgejo logs deploy/forgejo --tail=100 ``` ### If you need to recreate secrets The workstation bootstrap creates these Secrets: - `unrip/unrip-secrets` - `forgejo/forgejo-secrets` Verify them: ```bash kubectl -n unrip get secret unrip-secrets kubectl -n forgejo get secret forgejo-secrets ``` ### Current known limitations Current colony state already identified an important gap: - bootstrap and CI are not yet fully production-hardened, even though the first deploy path now fetches the real kubeconfig and imports the bootstrap image directly into k3s Treat the current bootstrap as a repo-driven first-deploy path suitable for testing, with hardening still pending. ## Self-hosted CI handoff After cluster bootstrap: - open Forgejo at `https://${FORGEJO_DOMAIN}` - seed or push this repo into Forgejo - create Forgejo repository secrets: - `KUBECONFIG_B64` - `REGISTRY_USERNAME` - `REGISTRY_PASSWORD` - create Forgejo repository variables: - `REGISTRY_HOST=${REGISTRY_DOMAIN}` - optional: `PROJECT_NAME=unrip` - optional: `PROJECT_NAMESPACE=unrip` - optional: `PROJECT_DEPLOYMENTS=near-intents-ingest,dummy-reactor,dummy-executor,dummy-consumer` - push to `main` Routine application deploys then follow `.forgejo/workflows/deploy.yml`: - build image as `REGISTRY_HOST/PROJECT_NAME:${GIT_SHA}` - push to the private registry - `kubectl set image` for each deployment listed in `PROJECT_DEPLOYMENTS` inside `PROJECT_NAMESPACE` - wait for rollout If project variables are omitted, the workflow defaults to the current repo project: - `PROJECT_NAME=unrip` - `PROJECT_NAMESPACE=unrip` - `PROJECT_DEPLOYMENTS=near-intents-ingest,dummy-reactor,dummy-executor,dummy-consumer` Infrastructure changes remain Terraform-driven from the operator workstation unless and until that responsibility is also automated. For the detailed operator runbooks, see: - `docs/hetzner-k3s-bootstrap.md` - `docs/hetzner-self-hosted-ci-runbook.md` - `deploy/k8s/projects/README.md` - `docs/next-session-architecture.md` ## Local development with Compose Compose remains available for local development and debugging. ```bash npm install cp .env.example .env # edit .env docker compose build docker compose up -d ``` Useful commands: ```bash docker compose ps docker compose logs -f docker compose logs -f near-intents-ingest dummy-reactor dummy-executor dummy-consumer docker compose restart dummy-executor docker compose down docker compose down -v ``` ### Individual services ```bash npm run near-intents:ingest npm run dummy-reactor npm run dummy-executor npm run dummy-consumer ``` Optional pair filter: ```bash npm run near-intents:ingest -- --pair 'asset_a->asset_b' ``` ## Idempotent executor behavior - every command has a `command_id` - commands carry `idempotency_key` and `execution_key` - executor persists state under `EXECUTOR_STATE_DIR` - completed commands are skipped after restart or replay ## Env ```env NEAR_INTENTS_API_KEY=your_solver_jwt NEAR_INTENTS_WS_URL=wss://solver-relay-v2.chaindefuser.com/ws KAFKA_BROKERS=redpanda:9092 KAFKA_CLIENT_ID=unrip KAFKA_TOPIC_RAW_NEAR_INTENTS_QUOTE=raw.near_intents.quote KAFKA_TOPIC_NORM_SWAP_DEMAND=norm.swap_demand KAFKA_TOPIC_CMD_EXECUTE_TRADE=cmd.execute_trade KAFKA_TOPIC_EXEC_TRADE_RESULT=exec.trade_result KAFKA_CONSUMER_GROUP_DUMMY=dummy-reactor-v1 KAFKA_CONSUMER_GROUP_EXECUTOR=dummy-executor-v1 EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state ```