8.8 KiB
Hetzner single-node bootstrap (Terraform + cloud-init + k3s)
This is the canonical first-production deployment path for the repo.
A local operator workstation drives the first deployment end to end:
- Terraform provisions Hetzner infrastructure
- cloud-init installs k3s automatically on first boot
- the workstation waits for the public Kubernetes API
- the workstation creates initial Kubernetes Secrets
- the workstation applies repo-managed Kubernetes manifests
- the workstation performs the first image/bootstrap delivery attempt
- once Forgejo + runner are alive, routine app deploys are intended to move to self-hosted CI
Compose remains available for local development, but it is not the primary production deployment model.
Scope of this layer
The foundation under infra/terraform/hetzner provisions:
- one Hetzner Cloud server
- one SSH key resource based on your local public key
- firewall rules for SSH, Kubernetes API, and HTTP/HTTPS ingress
- a private network attachment for future growth
- cloud-init user-data for unattended k3s installation and host preparation
The repo bootstrap then applies the Hetzner single-node overlay under deploy/k8s/overlays/hetzner-single-node, which composes Kubernetes resources under deploy/k8s/ for:
- shared platform namespaces and services
- Redpanda
- unrip workloads
- Forgejo
- Forgejo runner
- private registry
- ingress/TLS-related resources
- Redpanda topic bootstrap job
Prerequisites
Install on the operator workstation:
- Terraform
>= 1.6 kubectldockercurl
You also need:
- a Hetzner Cloud API token
- an SSH keypair already present locally
- access to DNS for your chosen domains
- admin CIDRs that can reach the future server on
22/tcpand6443/tcp - this repo checked out locally
Required bootstrap secrets and inputs
Prepare the operator env file:
cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env
${EDITOR:-vi} scripts/hetzner/bootstrap-secrets.env
Set at least:
HCLOUD_TOKENSSH_PUBLIC_KEY_PATHTF_ADMIN_CIDR_BLOCKSBASE_DOMAINFORGEJO_DOMAINFORGEJO_ROOT_URLNEAR_INTENTS_API_KEYFORGEJO_RUNNER_REGISTRATION_TOKEN
Load it into the current shell:
source scripts/hetzner/bootstrap-secrets.env
Canonical bootstrap sequence
Run from repo root:
bash scripts/hetzner/bootstrap.sh
Current behavior of the script:
- validates local tooling
- runs
terraform initandterraform applyininfra/terraform/hetzner - reads Terraform outputs such as server IP and
k3s_api_url - waits for the k3s API readiness endpoint
- writes a local workstation kubeconfig to
.state/hetzner/kubeconfig.yaml - writes overlay secret env input files and creates:
unrip/unrip-secretsunrip/unrip-registry-credsforgejo/forgejo-secretsregistry/registry-secrets
- applies
deploy/k8s/platform/base/namespace.yamlanddeploy/k8s/overlays/hetzner-single-node - builds the repo bootstrap image locally
- pushes it through the temporary local registry bridge using the active project name
- updates and waits for rollout status in the active project namespace
After the script finishes:
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
kubectl get nodes -o wide
kubectl get pods -A
kubectl -n unrip get deploy,pods,jobs
kubectl -n forgejo get deploy,pods,svc
kubectl -n registry get pods,svc
Current manifest target
Important current-state detail:
scripts/hetzner/bootstrap.shnow appliesdeploy/k8s/platform/base/namespace.yaml- it then applies
deploy/k8s/overlays/hetzner-single-node - bootstrap naming no longer assumes legacy
trading-systemkubeconfig contexts, image tags, or rollout namespaces
Executor persistence in k3s
The dummy executor persists durable idempotency state.
Current persistence model:
- application path:
EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state - cloud-init prepares the host boundary for executor storage on first boot
- Kubernetes mounts storage at that same path for the executor workload
- the Hetzner single-node overlay pins PVC-backed storage to k3s
local-path
Operational consequence:
- executor duplicate-suppression state lives on node-backed persistent storage
- replacing the node or deleting the PVC without migration loses that history
- treat executor state as required operational data, even though the executor is still a dummy implementation
Failure recovery runbook
A. Bootstrap fails before infrastructure exists
Typical causes:
- invalid
HCLOUD_TOKEN - wrong
SSH_PUBLIC_KEY_PATH - malformed
TF_ADMIN_CIDR_BLOCKS
Fix the input and rerun:
source scripts/hetzner/bootstrap-secrets.env
bash scripts/hetzner/bootstrap.sh
If you need to destroy partially created infrastructure:
source scripts/hetzner/bootstrap-secrets.env
bash scripts/hetzner/destroy.sh
B. Terraform succeeds but cluster access is not usable
Verify the generated kubeconfig and cluster health:
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
kubectl get nodes -o wide
kubectl get pods -A
kubectl get events -A --sort-by=.lastTimestamp | tail -n 50
What to suspect first:
- cloud-init still running
- k3s still starting
- bootstrap kubeconfig/auth not fully aligned yet
- public API reachable, but workloads not yet healthy
C. Secrets were wrong or missing
The current bootstrap depends on:
${PROJECT_NAME:-unrip}-secretsNEAR_INTENTS_API_KEY
forgejo-secretsroot_urldomainrunner_registration_token
Verify:
kubectl -n unrip get secret unrip-secrets
kubectl -n unrip get secret unrip-registry-creds
kubectl -n forgejo get secret forgejo-secrets
kubectl -n registry get secret registry-secrets
If needed, recreate them from the workstation before restarting the affected deployments.
D. Workloads are present but not healthy
Inspect by namespace:
kubectl -n unrip get pods
kubectl -n unrip describe pod <pod-name>
kubectl -n unrip logs deploy/dummy-executor --tail=100
kubectl -n forgejo logs deploy/forgejo --tail=100
kubectl -n forgejo logs deploy/forgejo-runner --tail=100
Useful rollout checks:
kubectl -n unrip rollout status deployment/near-intents-ingest --timeout=300s
kubectl -n unrip rollout status deployment/dummy-reactor --timeout=300s
kubectl -n unrip rollout status deployment/dummy-executor --timeout=300s
kubectl -n unrip rollout status deployment/dummy-consumer --timeout=300s
kubectl -n forgejo rollout status deployment/forgejo --timeout=300s
kubectl -n forgejo rollout status deployment/forgejo-runner --timeout=300s
E. Need to inspect Terraform outputs directly
cd infra/terraform/hetzner
terraform output
terraform output server_ipv4
terraform output server_private_ipv4
terraform output k3s_api_url
terraform output kubeconfig_strategy
Self-hosted CI handoff
After the cluster is reachable and workloads are up:
- reach Forgejo at the configured domain or by port-forward
- perform the initial admin/bootstrap steps in Forgejo
- create the target repository in Forgejo
- push or mirror this repo into that Forgejo instance
- confirm the runner is registered and healthy
- move routine application deploys to the self-hosted pipeline, which now derives image naming and rollout targets from Forgejo repository variables instead of hard-coding the legacy project
Current repo-state caveats already known:
- first bootstrap is repo-driven from the workstation
- the bootstrap path no longer relies on SSH/scp transport in control flow
- the kubeconfig/auth result is not yet fully production-hardened
- first rollout still uses a temporary local registry bridge; routine CI deploys are intended to be registry-native and the Forgejo workflow now defaults to
unripwhile allowing per-repo overrides for image name, namespace, and deployment list - Forgejo admin creation, repo creation, and Actions configuration still require operator action after cluster bring-up
- DNS automation is currently wired for Cloudflare when credentials are supplied during bootstrap
- TLS is expected to come from cert-manager + Let's Encrypt once ingress hostnames resolve publicly
Terraform-only usage
If you only want the infra layer:
cd infra/terraform/hetzner
export TF_VAR_hcloud_token="<your-hetzner-token>"
export TF_VAR_ssh_public_key="$(cat ~/.ssh/id_ed25519.pub)"
export TF_VAR_admin_cidr_blocks='["203.0.113.10/32"]'
terraform init
terraform apply
Useful outputs:
server_ipv4server_private_ipv4server_nameserver_fqdnk3s_api_urlkubeconfig_strategy
For CI/CD details, also see:
docs/hetzner-k3s-bootstrap.mddocs/hetzner-self-hosted-ci-runbook.md
Compose status
Compose is still useful for:
- local development
- fast topology debugging
- non-production single-machine testing
But it should be treated as optional/dev runtime support, not as the primary production deployment path.