368 lines
No EOL
10 KiB
Markdown
368 lines
No EOL
10 KiB
Markdown
# near-intents-monitor
|
|
|
|
Production-shaped first slice of the trading system:
|
|
|
|
- **venue ingest**: NEAR Intents solver-bus quote flow
|
|
- **bus**: Redpanda first, Kafka-compatible by design
|
|
- **reactor**: dummy decision engine emitting commands
|
|
- **executor**: dummy execution worker with durable idempotency state
|
|
- **result consumer**: downstream observer of execution outcomes
|
|
|
|
## Canonical repo shape
|
|
|
|
```text
|
|
src/
|
|
apps/
|
|
near-intents-ingest.mjs
|
|
dummy-reactor.mjs
|
|
dummy-executor.mjs
|
|
dummy-consumer.mjs
|
|
bus/
|
|
kafka/
|
|
producer.mjs
|
|
consumer.mjs
|
|
core/
|
|
event-envelope.mjs
|
|
executor-state-store.mjs
|
|
log.mjs
|
|
pair-filter.mjs
|
|
schemas.mjs
|
|
lib/
|
|
config.mjs
|
|
env.mjs
|
|
venues/
|
|
near-intents/
|
|
ingest.mjs
|
|
normalize.mjs
|
|
ws.mjs
|
|
compose.yml
|
|
Dockerfile
|
|
docs/contracts.md
|
|
deploy/hetzner/README.md
|
|
```
|
|
|
|
## Event flow
|
|
|
|
```text
|
|
NEAR Intents WebSocket
|
|
|
|
|
+--> raw.near_intents.quote
|
|
|
|
|
v
|
|
norm.swap_demand
|
|
|
|
|
v
|
|
cmd.execute_trade
|
|
|
|
|
v
|
|
exec.trade_result
|
|
```
|
|
|
|
Core rule: services do not call each other directly for trading flow; they communicate through bus topics only.
|
|
|
|
## Contracts
|
|
See `docs/contracts.md`.
|
|
|
|
Current topics:
|
|
- `raw.near_intents.quote`
|
|
- `norm.swap_demand`
|
|
- `cmd.execute_trade`
|
|
- `exec.trade_result`
|
|
|
|
## Primary deployment path: repo-driven Hetzner bootstrap
|
|
|
|
The primary production path is no longer a Compose-only VM workflow.
|
|
|
|
The intended operating model is:
|
|
- Terraform provisions a Hetzner single-node environment
|
|
- cloud-init installs k3s automatically on first boot
|
|
- a local operator workstation performs the first repo-driven bootstrap
|
|
- Kubernetes manifests install Redpanda, the app workloads, Forgejo, runner, registry, and ingress-related components
|
|
- once the in-cluster Git + CI stack is alive, routine app deploys move to self-hosted CI
|
|
|
|
This is a two-phase model:
|
|
- **Phase 0:** local workstation bootstrap of a brand-new cluster
|
|
- **Phase 1:** self-hosted Forgejo + runner takes over app delivery
|
|
|
|
Compose still exists for local development and optional single-machine testing, but it is not the canonical production story.
|
|
|
|
## Prerequisites for first deployment
|
|
|
|
Install locally on the operator workstation:
|
|
- Terraform `>= 1.6`
|
|
- `kubectl`
|
|
- `docker`
|
|
- `curl`
|
|
|
|
You also need:
|
|
- a Hetzner Cloud API token
|
|
- a local SSH public key file for Terraform node provisioning
|
|
- DNS control for your chosen base domain and Forgejo hostname
|
|
- preferably a Tailscale tailnet and auth key for private admin/control-plane access
|
|
- the repo checked out locally
|
|
|
|
## Required bootstrap secrets and inputs
|
|
|
|
Create the bootstrap env file:
|
|
|
|
```bash
|
|
cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env
|
|
```
|
|
|
|
Set at least:
|
|
- `HCLOUD_TOKEN`
|
|
- `SSH_PUBLIC_KEY_PATH`
|
|
- `PUBLIC_DOMAIN`
|
|
- recommended:
|
|
- `TAILSCALE_AUTH_KEY`
|
|
- `TAILSCALE_CONTROL_PLANE_HOSTNAME`
|
|
- optional fallback:
|
|
- `TF_ADMIN_CIDR_BLOCKS`
|
|
- `BASE_DOMAIN`
|
|
- `FORGEJO_DOMAIN`
|
|
- `FORGEJO_ROOT_URL`
|
|
- `REGISTRY_DOMAIN`
|
|
- `LETSENCRYPT_EMAIL`
|
|
- `REGISTRY_USERNAME`
|
|
- `REGISTRY_PASSWORD`
|
|
- `NEAR_INTENTS_API_KEY`
|
|
- `FORGEJO_RUNNER_REGISTRATION_TOKEN`
|
|
- optional DNS automation:
|
|
- Cloudflare:
|
|
- `CLOUDFLARE_API_TOKEN`
|
|
- `CLOUDFLARE_ZONE_ID`
|
|
- Porkbun:
|
|
- `PORKBUN_API_KEY`
|
|
- `PORKBUN_SECRET_API_KEY`
|
|
|
|
Then load them:
|
|
|
|
```bash
|
|
source scripts/hetzner/bootstrap-secrets.env
|
|
```
|
|
|
|
## First bootstrap sequence
|
|
|
|
Run the end-to-end bootstrap from repo root:
|
|
|
|
```bash
|
|
bash scripts/hetzner/bootstrap.sh
|
|
```
|
|
|
|
Current repo behavior of that script:
|
|
1. runs Terraform in `infra/terraform/hetzner`
|
|
2. optionally creates DNS records for the base, Forgejo, and registry hosts via Cloudflare or Porkbun
|
|
3. if configured, joins the node to Tailscale and prefers the Tailscale control-plane hostname for Kubernetes API access
|
|
4. waits for SSH and the k3s API endpoint to become ready
|
|
5. fetches the real k3s kubeconfig from the node and writes it to `.state/hetzner/kubeconfig.yaml`
|
|
6. renders the Hetzner single-node overlay from local operator inputs
|
|
7. creates registry pull/auth secrets
|
|
8. applies the Kubernetes bootstrap manifests
|
|
9. builds the app image locally and imports it into k3s on the node
|
|
10. performs the first rollout using the imported bootstrap image
|
|
|
|
Use the generated kubeconfig afterward:
|
|
|
|
```bash
|
|
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
|
|
kubectl get nodes -o wide
|
|
kubectl get pods -A
|
|
kubectl -n unrip get deploy,pods
|
|
kubectl -n forgejo get deploy,pods,svc
|
|
```
|
|
|
|
## What is deployed into k3s
|
|
|
|
The repo-managed Kubernetes assets are under `deploy/k8s/`.
|
|
|
|
Current single-node target includes resources for:
|
|
- `unrip` workloads in namespace `unrip`
|
|
- Redpanda
|
|
- Forgejo
|
|
- Forgejo runner
|
|
- private registry
|
|
- ingress-nginx namespace/resources
|
|
- cert-manager namespace/resources
|
|
- ACME issuers and ingress definitions
|
|
- a bootstrap job for Redpanda topic creation
|
|
|
|
Shared platform namespaces:
|
|
- `forgejo`
|
|
- `registry`
|
|
- `ingress-nginx`
|
|
- `cert-manager`
|
|
|
|
Project-specific namespaces:
|
|
- `unrip`
|
|
- future projects should get their own namespace rather than sharing `unrip`
|
|
|
|
Important current-state nuance:
|
|
- the bootstrap script currently applies `deploy/k8s/base`
|
|
- the longer-term intended target is `deploy/k8s/overlays/hetzner-single-node`
|
|
|
|
## Executor persistence in k3s
|
|
|
|
The executor is stateful by design because it persists idempotency/execution tracking.
|
|
|
|
Current persistence boundary:
|
|
- app env uses `EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state`
|
|
- in Kubernetes, the executor deployment mounts storage at that path
|
|
- the Hetzner single-node overlay pins storage to the k3s `local-path` storage class
|
|
- cloud-init also prepares the host directory boundary for executor state on first boot
|
|
|
|
Operational meaning:
|
|
- executor state lives on node-backed storage in the single-node k3s environment
|
|
- if that PVC or underlying node storage is lost, duplicate-suppression history is lost too
|
|
- treat executor persistence as part of the minimal durable state of the cluster
|
|
|
|
## Failure recovery and operator checks
|
|
|
|
### If bootstrap fails before Terraform completes
|
|
Re-run after fixing the local input problem:
|
|
- missing token
|
|
- invalid CIDRs
|
|
- invalid SSH public key path
|
|
|
|
If the infrastructure must be torn down:
|
|
|
|
```bash
|
|
source scripts/hetzner/bootstrap-secrets.env
|
|
bash scripts/hetzner/destroy.sh
|
|
```
|
|
|
|
### If Terraform succeeds but Kubernetes is not ready
|
|
Check the public API and cluster state from the workstation:
|
|
|
|
```bash
|
|
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
|
|
kubectl get nodes -o wide
|
|
kubectl get pods -A
|
|
kubectl get events -A --sort-by=.lastTimestamp | tail -n 50
|
|
```
|
|
|
|
Typical next checks:
|
|
- cloud-init may still be finishing
|
|
- k3s may still be starting
|
|
- a workload may be crash-looping due to missing secret values or image-delivery issues
|
|
|
|
### If workloads do not roll out
|
|
Inspect the affected namespace:
|
|
|
|
```bash
|
|
kubectl -n unrip get pods
|
|
kubectl -n unrip describe pod <pod-name>
|
|
kubectl -n unrip logs deploy/dummy-executor --tail=100
|
|
kubectl -n forgejo logs deploy/forgejo --tail=100
|
|
```
|
|
|
|
### If you need to recreate secrets
|
|
The workstation bootstrap creates these Secrets:
|
|
- `unrip/unrip-secrets`
|
|
- `forgejo/forgejo-secrets`
|
|
|
|
Verify them:
|
|
|
|
```bash
|
|
kubectl -n unrip get secret unrip-secrets
|
|
kubectl -n forgejo get secret forgejo-secrets
|
|
```
|
|
|
|
### Current known limitations
|
|
Current colony state already identified an important gap:
|
|
- bootstrap and CI are not yet fully production-hardened, even though the first deploy path now fetches the real kubeconfig and imports the bootstrap image directly into k3s
|
|
|
|
Treat the current bootstrap as a repo-driven first-deploy path suitable for testing, with hardening still pending.
|
|
|
|
## Self-hosted CI handoff
|
|
|
|
After cluster bootstrap:
|
|
- open Forgejo at `https://${FORGEJO_DOMAIN}`
|
|
- seed or push this repo into Forgejo
|
|
- create Forgejo repository secrets:
|
|
- `KUBECONFIG_B64`
|
|
- `REGISTRY_USERNAME`
|
|
- `REGISTRY_PASSWORD`
|
|
- create Forgejo repository variables:
|
|
- `REGISTRY_HOST=${REGISTRY_DOMAIN}`
|
|
- optional: `PROJECT_NAME=unrip`
|
|
- optional: `PROJECT_NAMESPACE=unrip`
|
|
- optional: `PROJECT_DEPLOYMENTS=near-intents-ingest,dummy-reactor,dummy-executor,dummy-consumer`
|
|
- push to `main`
|
|
|
|
Routine application deploys then follow `.forgejo/workflows/deploy.yml`:
|
|
- build image as `REGISTRY_HOST/PROJECT_NAME:${GIT_SHA}`
|
|
- push to the private registry
|
|
- `kubectl set image` for each deployment listed in `PROJECT_DEPLOYMENTS` inside `PROJECT_NAMESPACE`
|
|
- wait for rollout
|
|
|
|
If project variables are omitted, the workflow defaults to the current repo project:
|
|
- `PROJECT_NAME=unrip`
|
|
- `PROJECT_NAMESPACE=unrip`
|
|
- `PROJECT_DEPLOYMENTS=near-intents-ingest,dummy-reactor,dummy-executor,dummy-consumer`
|
|
|
|
Infrastructure changes remain Terraform-driven from the operator workstation unless and until that responsibility is also automated.
|
|
|
|
For the detailed operator runbooks, see:
|
|
- `docs/hetzner-k3s-bootstrap.md`
|
|
- `docs/hetzner-self-hosted-ci-runbook.md`
|
|
- `deploy/k8s/projects/README.md`
|
|
- `docs/next-session-architecture.md`
|
|
|
|
## Local development with Compose
|
|
|
|
Compose remains available for local development and debugging.
|
|
|
|
```bash
|
|
npm install
|
|
cp .env.example .env
|
|
# edit .env
|
|
|
|
docker compose build
|
|
docker compose up -d
|
|
```
|
|
|
|
Useful commands:
|
|
|
|
```bash
|
|
docker compose ps
|
|
docker compose logs -f
|
|
docker compose logs -f near-intents-ingest dummy-reactor dummy-executor dummy-consumer
|
|
docker compose restart dummy-executor
|
|
docker compose down
|
|
docker compose down -v
|
|
```
|
|
|
|
### Individual services
|
|
```bash
|
|
npm run near-intents:ingest
|
|
npm run dummy-reactor
|
|
npm run dummy-executor
|
|
npm run dummy-consumer
|
|
```
|
|
|
|
Optional pair filter:
|
|
```bash
|
|
npm run near-intents:ingest -- --pair 'asset_a->asset_b'
|
|
```
|
|
|
|
## Idempotent executor behavior
|
|
- every command has a `command_id`
|
|
- commands carry `idempotency_key` and `execution_key`
|
|
- executor persists state under `EXECUTOR_STATE_DIR`
|
|
- completed commands are skipped after restart or replay
|
|
|
|
## Env
|
|
|
|
```env
|
|
NEAR_INTENTS_API_KEY=your_solver_jwt
|
|
NEAR_INTENTS_WS_URL=wss://solver-relay-v2.chaindefuser.com/ws
|
|
KAFKA_BROKERS=redpanda:9092
|
|
KAFKA_CLIENT_ID=unrip
|
|
KAFKA_TOPIC_RAW_NEAR_INTENTS_QUOTE=raw.near_intents.quote
|
|
KAFKA_TOPIC_NORM_SWAP_DEMAND=norm.swap_demand
|
|
KAFKA_TOPIC_CMD_EXECUTE_TRADE=cmd.execute_trade
|
|
KAFKA_TOPIC_EXEC_TRADE_RESULT=exec.trade_result
|
|
KAFKA_CONSUMER_GROUP_DUMMY=dummy-reactor-v1
|
|
KAFKA_CONSUMER_GROUP_EXECUTOR=dummy-executor-v1
|
|
EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state
|
|
``` |