doran/README.md

# near-intents-monitor

Production-shaped first slice of the trading system:

- **venue ingest**: NEAR Intents solver-bus quote flow
- **bus**: Redpanda first, Kafka-compatible by design
- **reactor**: dummy decision engine emitting commands
- **executor**: dummy execution worker with durable idempotency state
- **result consumer**: downstream observer of execution outcomes

## Canonical repo shape

```text
src/
  apps/
    near-intents-ingest.mjs
    dummy-reactor.mjs
    dummy-executor.mjs
    dummy-consumer.mjs
  bus/
    kafka/
      producer.mjs
      consumer.mjs
  core/
    event-envelope.mjs
    executor-state-store.mjs
    log.mjs
    pair-filter.mjs
    schemas.mjs
  lib/
    config.mjs
    env.mjs
  venues/
    near-intents/
      ingest.mjs
      normalize.mjs
      ws.mjs
compose.yml
Dockerfile
docs/contracts.md
deploy/hetzner/README.md
```

## Event flow

```text
NEAR Intents WebSocket
        |
        +--> raw.near_intents.quote
        |
        v
norm.swap_demand
        |
        v
cmd.execute_trade
        |
        v
exec.trade_result
```

Core rule: services do not call each other directly for trading flow; they communicate through bus topics only.

## Contracts
See `docs/contracts.md`.

Current topics:
- `raw.near_intents.quote`
- `norm.swap_demand`
- `cmd.execute_trade`
- `exec.trade_result`

## Primary deployment path: repo-driven Hetzner bootstrap

The primary production path is no longer a Compose-only VM workflow.

The intended operating model is:
- Terraform provisions a Hetzner single-node environment
- cloud-init installs k3s automatically on first boot
- a local operator workstation performs the first repo-driven bootstrap
- Kubernetes manifests install Redpanda, the app workloads, Forgejo, runner, registry, and ingress-related components
- once the in-cluster Git + CI stack is alive, routine app deploys move to self-hosted CI

This is a two-phase model:
- **Phase 0:** local workstation bootstrap of a brand-new cluster
- **Phase 1:** self-hosted Forgejo + runner takes over app delivery

Compose still exists for local development and optional single-machine testing, but it is not the canonical production story.

## Prerequisites for first deployment

Install locally on the operator workstation:
- Terraform `>= 1.6`
- `kubectl`
- `docker`
- `curl`

You also need:
- a Hetzner Cloud API token
- a local SSH public key file for Terraform node provisioning
- DNS control for your chosen base domain and Forgejo hostname
- preferably a Tailscale tailnet and auth key for private admin/control-plane access
- the repo checked out locally

## Required bootstrap secrets and inputs

Create the bootstrap env file:

```bash
cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env
```

Set at least:
- `HCLOUD_TOKEN`
- `SSH_PUBLIC_KEY_PATH`
- `PUBLIC_DOMAIN`
- recommended:
  - `TAILSCALE_AUTH_KEY`
  - `TAILSCALE_CONTROL_PLANE_HOSTNAME`
- optional fallback:
  - `TF_ADMIN_CIDR_BLOCKS`
- `BASE_DOMAIN`
- `FORGEJO_DOMAIN`
- `FORGEJO_ROOT_URL`
- `REGISTRY_DOMAIN`
- `LETSENCRYPT_EMAIL`
- `REGISTRY_USERNAME`
- `REGISTRY_PASSWORD`
- `NEAR_INTENTS_API_KEY`
- `FORGEJO_RUNNER_REGISTRATION_TOKEN`
- optional DNS automation:
  - Cloudflare:
    - `CLOUDFLARE_API_TOKEN`
    - `CLOUDFLARE_ZONE_ID`
  - Porkbun:
    - `PORKBUN_API_KEY`
    - `PORKBUN_SECRET_API_KEY`

Then load them:

```bash
source scripts/hetzner/bootstrap-secrets.env
```

## First bootstrap sequence

Run the end-to-end bootstrap from repo root:

```bash
bash scripts/hetzner/bootstrap.sh
```

Current repo behavior of that script:
1. runs Terraform in `infra/terraform/hetzner`
2. optionally creates DNS records for the base, Forgejo, and registry hosts via Cloudflare or Porkbun
3. if configured, joins the node to Tailscale and prefers the Tailscale control-plane hostname for Kubernetes API access
4. waits for SSH and the k3s API endpoint to become ready
5. fetches the real k3s kubeconfig from the node and writes it to `.state/hetzner/kubeconfig.yaml`
6. renders the Hetzner single-node overlay from local operator inputs
7. creates registry pull/auth secrets
8. applies the Kubernetes bootstrap manifests
9. builds the app image locally and imports it into k3s on the node
10. performs the first rollout using the imported bootstrap image

Use the generated kubeconfig afterward:

```bash
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
kubectl get nodes -o wide
kubectl get pods -A
kubectl -n unrip get deploy,pods
kubectl -n forgejo get deploy,pods,svc
```

## What is deployed into k3s

The repo-managed Kubernetes assets are under `deploy/k8s/`.

Current single-node target includes resources for:
- `unrip` workloads in namespace `unrip`
- Redpanda
- Forgejo
- Forgejo runner
- private registry
- ingress-nginx namespace/resources
- cert-manager namespace/resources
- ACME issuers and ingress definitions
- a bootstrap job for Redpanda topic creation

Shared platform namespaces:
- `forgejo`
- `registry`
- `ingress-nginx`
- `cert-manager`

Project-specific namespaces:
- `unrip`
- future projects should get their own namespace rather than sharing `unrip`

Important current-state nuance:
- the bootstrap script currently applies `deploy/k8s/base`
- the longer-term intended target is `deploy/k8s/overlays/hetzner-single-node`

## Executor persistence in k3s

The executor is stateful by design because it persists idempotency/execution tracking.

Current persistence boundary:
- app env uses `EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state`
- in Kubernetes, the executor deployment mounts storage at that path
- the Hetzner single-node overlay pins storage to the k3s `local-path` storage class
- cloud-init also prepares the host directory boundary for executor state on first boot

Operational meaning:
- executor state lives on node-backed storage in the single-node k3s environment
- if that PVC or underlying node storage is lost, duplicate-suppression history is lost too
- treat executor persistence as part of the minimal durable state of the cluster

## Failure recovery and operator checks

### If bootstrap fails before Terraform completes
Re-run after fixing the local input problem:
- missing token
- invalid CIDRs
- invalid SSH public key path

If the infrastructure must be torn down:

```bash
source scripts/hetzner/bootstrap-secrets.env
bash scripts/hetzner/destroy.sh
```

### If Terraform succeeds but Kubernetes is not ready
Check the public API and cluster state from the workstation:

```bash
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
kubectl get nodes -o wide
kubectl get pods -A
kubectl get events -A --sort-by=.lastTimestamp | tail -n 50
```

Typical next checks:
- cloud-init may still be finishing
- k3s may still be starting
- a workload may be crash-looping due to missing secret values or image-delivery issues

### If workloads do not roll out
Inspect the affected namespace:

```bash
kubectl -n unrip get pods
kubectl -n unrip describe pod <pod-name>
kubectl -n unrip logs deploy/dummy-executor --tail=100
kubectl -n forgejo logs deploy/forgejo --tail=100
```

### If you need to recreate secrets
The workstation bootstrap creates these Secrets:
- `unrip/unrip-secrets`
- `forgejo/forgejo-secrets`

Verify them:

```bash
kubectl -n unrip get secret unrip-secrets
kubectl -n forgejo get secret forgejo-secrets
```

### Current known limitations
Current colony state already identified an important gap:
- bootstrap and CI are not yet fully production-hardened, even though the first deploy path now fetches the real kubeconfig and imports the bootstrap image directly into k3s

Treat the current bootstrap as a repo-driven first-deploy path suitable for testing, with hardening still pending.

## Self-hosted CI handoff

After cluster bootstrap:
- open Forgejo at `https://${FORGEJO_DOMAIN}`
- seed or push this repo into Forgejo
- create Forgejo repository secrets:
  - `KUBECONFIG_B64`
  - `REGISTRY_USERNAME`
  - `REGISTRY_PASSWORD`
- create Forgejo repository variables:
  - `REGISTRY_HOST=${REGISTRY_DOMAIN}`
  - optional: `PROJECT_NAME=unrip`
  - optional: `PROJECT_NAMESPACE=unrip`
  - optional: `PROJECT_DEPLOYMENTS=near-intents-ingest,dummy-reactor,dummy-executor,dummy-consumer`
- push to `main`

Routine application deploys then follow `.forgejo/workflows/deploy.yml`:
- build image as `REGISTRY_HOST/PROJECT_NAME:${GIT_SHA}`
- push to the private registry
- `kubectl set image` for each deployment listed in `PROJECT_DEPLOYMENTS` inside `PROJECT_NAMESPACE`
- wait for rollout

If project variables are omitted, the workflow defaults to the current repo project:
- `PROJECT_NAME=unrip`
- `PROJECT_NAMESPACE=unrip`
- `PROJECT_DEPLOYMENTS=near-intents-ingest,dummy-reactor,dummy-executor,dummy-consumer`

Infrastructure changes remain Terraform-driven from the operator workstation unless and until that responsibility is also automated.

For the detailed operator runbooks, see:
- `docs/hetzner-k3s-bootstrap.md`
- `docs/hetzner-self-hosted-ci-runbook.md`
- `deploy/k8s/projects/README.md`
- `docs/next-session-architecture.md`

## Local development with Compose

Compose remains available for local development and debugging.

```bash
npm install
cp .env.example .env
# edit .env

docker compose build
docker compose up -d
```

Useful commands:

```bash
docker compose ps
docker compose logs -f
docker compose logs -f near-intents-ingest dummy-reactor dummy-executor dummy-consumer
docker compose restart dummy-executor
docker compose down
docker compose down -v
```

### Individual services
```bash
npm run near-intents:ingest
npm run dummy-reactor
npm run dummy-executor
npm run dummy-consumer
```

Optional pair filter:
```bash
npm run near-intents:ingest -- --pair 'asset_a->asset_b'
```

## Idempotent executor behavior
- every command has a `command_id`
- commands carry `idempotency_key` and `execution_key`
- executor persists state under `EXECUTOR_STATE_DIR`
- completed commands are skipped after restart or replay

## Env

```env
NEAR_INTENTS_API_KEY=your_solver_jwt
NEAR_INTENTS_WS_URL=wss://solver-relay-v2.chaindefuser.com/ws
KAFKA_BROKERS=redpanda:9092
KAFKA_CLIENT_ID=unrip
KAFKA_TOPIC_RAW_NEAR_INTENTS_QUOTE=raw.near_intents.quote
KAFKA_TOPIC_NORM_SWAP_DEMAND=norm.swap_demand
KAFKA_TOPIC_CMD_EXECUTE_TRADE=cmd.execute_trade
KAFKA_TOPIC_EXEC_TRADE_RESULT=exec.trade_result
KAFKA_CONSUMER_GROUP_DUMMY=dummy-reactor-v1
KAFKA_CONSUMER_GROUP_EXECUTOR=dummy-executor-v1
EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state
```