# Hetzner single-node bootstrap (Terraform + cloud-init + k3s) This is the canonical first-production deployment path for the repo. A local operator workstation drives the first deployment end to end: - Terraform provisions Hetzner infrastructure - cloud-init installs k3s automatically on first boot - the workstation waits for the public Kubernetes API - the workstation creates initial Kubernetes Secrets - the workstation applies repo-managed Kubernetes manifests - the workstation performs the first image/bootstrap delivery attempt - once Forgejo + runner are alive, routine app deploys are intended to move to self-hosted CI Compose remains available for local development, but it is not the primary production deployment model. ## Scope of this layer The foundation under `infra/terraform/hetzner` provisions: - one Hetzner Cloud server - one SSH key resource based on your local public key - firewall rules for SSH, Kubernetes API, and HTTP/HTTPS ingress - a private network attachment for future growth - cloud-init user-data for unattended k3s installation and host preparation The repo bootstrap then applies the Hetzner single-node overlay under `deploy/k8s/overlays/hetzner-single-node`, which composes Kubernetes resources under `deploy/k8s/` for: - shared platform namespaces and services - Redpanda - unrip workloads - Forgejo - Forgejo runner - private registry - ingress/TLS-related resources - Redpanda topic bootstrap job ## Prerequisites Install on the operator workstation: - Terraform `>= 1.6` - `kubectl` - `docker` - `curl` You also need: - a Hetzner Cloud API token - an SSH keypair already present locally - access to DNS for your chosen domains - admin CIDRs that can reach the future server on `22/tcp` and `6443/tcp` - this repo checked out locally ## Required bootstrap secrets and inputs Prepare the operator env file: ```bash cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env ${EDITOR:-vi} scripts/hetzner/bootstrap-secrets.env ``` Set at least: - `HCLOUD_TOKEN` - `SSH_PUBLIC_KEY_PATH` - `TF_ADMIN_CIDR_BLOCKS` - `BASE_DOMAIN` - `FORGEJO_DOMAIN` - `FORGEJO_ROOT_URL` - `NEAR_INTENTS_API_KEY` - `FORGEJO_RUNNER_REGISTRATION_TOKEN` Load it into the current shell: ```bash source scripts/hetzner/bootstrap-secrets.env ``` ## Canonical bootstrap sequence Run from repo root: ```bash bash scripts/hetzner/bootstrap.sh ``` Current behavior of the script: 1. validates local tooling 2. runs `terraform init` and `terraform apply` in `infra/terraform/hetzner` 3. reads Terraform outputs such as server IP and `k3s_api_url` 4. waits for the k3s API readiness endpoint 5. writes a local workstation kubeconfig to `.state/hetzner/kubeconfig.yaml` 6. writes overlay secret env input files and creates: - `unrip/unrip-secrets` - `unrip/unrip-registry-creds` - `forgejo/forgejo-secrets` - `registry/registry-secrets` 7. applies `deploy/k8s/platform/base/namespace.yaml` and `deploy/k8s/overlays/hetzner-single-node` 8. builds the repo bootstrap image locally 9. pushes it through the temporary local registry bridge using the active project name 10. updates and waits for rollout status in the active project namespace After the script finishes: ```bash export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml kubectl get nodes -o wide kubectl get pods -A kubectl -n unrip get deploy,pods,jobs kubectl -n forgejo get deploy,pods,svc kubectl -n registry get pods,svc ``` ## Current manifest target Important current-state detail: - `scripts/hetzner/bootstrap.sh` now applies `deploy/k8s/platform/base/namespace.yaml` - it then applies `deploy/k8s/overlays/hetzner-single-node` - bootstrap naming no longer assumes legacy `trading-system` kubeconfig contexts, image tags, or rollout namespaces ## Executor persistence in k3s The dummy executor persists durable idempotency state. Current persistence model: - application path: `EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state` - cloud-init prepares the host boundary for executor storage on first boot - Kubernetes mounts storage at that same path for the executor workload - the Hetzner single-node overlay pins PVC-backed storage to k3s `local-path` Operational consequence: - executor duplicate-suppression state lives on node-backed persistent storage - replacing the node or deleting the PVC without migration loses that history - treat executor state as required operational data, even though the executor is still a dummy implementation ## Failure recovery runbook ### A. Bootstrap fails before infrastructure exists Typical causes: - invalid `HCLOUD_TOKEN` - wrong `SSH_PUBLIC_KEY_PATH` - malformed `TF_ADMIN_CIDR_BLOCKS` Fix the input and rerun: ```bash source scripts/hetzner/bootstrap-secrets.env bash scripts/hetzner/bootstrap.sh ``` If you need to destroy partially created infrastructure: ```bash source scripts/hetzner/bootstrap-secrets.env bash scripts/hetzner/destroy.sh ``` ### B. Terraform succeeds but cluster access is not usable Verify the generated kubeconfig and cluster health: ```bash export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml kubectl get nodes -o wide kubectl get pods -A kubectl get events -A --sort-by=.lastTimestamp | tail -n 50 ``` What to suspect first: - cloud-init still running - k3s still starting - bootstrap kubeconfig/auth not fully aligned yet - public API reachable, but workloads not yet healthy ### C. Secrets were wrong or missing The current bootstrap depends on: - `${PROJECT_NAME:-unrip}-secrets` - `NEAR_INTENTS_API_KEY` - `forgejo-secrets` - `root_url` - `domain` - `runner_registration_token` Verify: ```bash kubectl -n unrip get secret unrip-secrets kubectl -n unrip get secret unrip-registry-creds kubectl -n forgejo get secret forgejo-secrets kubectl -n registry get secret registry-secrets ``` If needed, recreate them from the workstation before restarting the affected deployments. ### D. Workloads are present but not healthy Inspect by namespace: ```bash kubectl -n unrip get pods kubectl -n unrip describe pod kubectl -n unrip logs deploy/dummy-executor --tail=100 kubectl -n forgejo logs deploy/forgejo --tail=100 kubectl -n forgejo logs deploy/forgejo-runner --tail=100 ``` Useful rollout checks: ```bash kubectl -n unrip rollout status deployment/near-intents-ingest --timeout=300s kubectl -n unrip rollout status deployment/dummy-reactor --timeout=300s kubectl -n unrip rollout status deployment/dummy-executor --timeout=300s kubectl -n unrip rollout status deployment/dummy-consumer --timeout=300s kubectl -n forgejo rollout status deployment/forgejo --timeout=300s kubectl -n forgejo rollout status deployment/forgejo-runner --timeout=300s ``` ### E. Need to inspect Terraform outputs directly ```bash cd infra/terraform/hetzner terraform output terraform output server_ipv4 terraform output server_private_ipv4 terraform output k3s_api_url terraform output kubeconfig_strategy ``` ## Self-hosted CI handoff After the cluster is reachable and workloads are up: 1. reach Forgejo at the configured domain or by port-forward 2. perform the initial admin/bootstrap steps in Forgejo 3. create the target repository in Forgejo 4. push or mirror this repo into that Forgejo instance 5. confirm the runner is registered and healthy 6. move routine application deploys to the self-hosted pipeline, which now derives image naming and rollout targets from Forgejo repository variables instead of hard-coding the legacy project Current repo-state caveats already known: - first bootstrap is repo-driven from the workstation - the bootstrap path no longer relies on SSH/scp transport in control flow - the kubeconfig/auth result is not yet fully production-hardened - first rollout still uses a temporary local registry bridge; routine CI deploys are intended to be registry-native and the Forgejo workflow now defaults to `unrip` while allowing per-repo overrides for image name, namespace, and deployment list - Forgejo admin creation, repo creation, and Actions configuration still require operator action after cluster bring-up - DNS automation is currently wired for Cloudflare when credentials are supplied during bootstrap - TLS is expected to come from cert-manager + Let's Encrypt once ingress hostnames resolve publicly ## Terraform-only usage If you only want the infra layer: ```bash cd infra/terraform/hetzner export TF_VAR_hcloud_token="" export TF_VAR_ssh_public_key="$(cat ~/.ssh/id_ed25519.pub)" export TF_VAR_admin_cidr_blocks='["203.0.113.10/32"]' terraform init terraform apply ``` Useful outputs: - `server_ipv4` - `server_private_ipv4` - `server_name` - `server_fqdn` - `k3s_api_url` - `kubeconfig_strategy` For CI/CD details, also see: - `docs/hetzner-k3s-bootstrap.md` - `docs/hetzner-self-hosted-ci-runbook.md` ## Compose status Compose is still useful for: - local development - fast topology debugging - non-production single-machine testing But it should be treated as optional/dev runtime support, not as the primary production deployment path.