12 KiB
Hetzner + k3s + self-hosted Git/CI bootstrap
Goal: provision and deploy everything from this repo to a single Hetzner machine with no manual server login.
Stack
- Terraform provisions the Hetzner Cloud VM, private network, and firewall
- cloud-init installs Tailscale first when configured, then installs k3s automatically
- cloud-init leaves only a bootstrap marker on the node; it does not clone this repo or apply Kubernetes assets
- Kubernetes manifests deploy:
- Redpanda
- trading system services
- private registry
- Forgejo
- Loki + Promtail + Grafana + Headlamp observability
- k3s-bundled Traefik ingress resources
- cert-manager
- ACME issuers
- local bootstrap script:
- runs Terraform
- optionally creates DNS records via Cloudflare or Porkbun
- fetches the real kubeconfig from the node
- writes overlay secrets/host patches from local env
- renders
.state/hetzner/generated-overlay/from the checked-in Hetzner overlay template plusdeploy/k8s/platform/base/kustomization.yaml - applies that generated overlay from the operator workstation checkout
- builds the current app image locally
- imports the bootstrap image into k3s for the first rollout
Files
infra/terraform/hetzner/deploy/k8s/base/deploy/k8s/overlays/hetzner-single-node/scripts/hetzner/bootstrap.shscripts/hetzner/configure-cloudflare-dns.shscripts/hetzner/destroy.shscripts/k8s/logs.sh.forgejo/workflows/deploy.yml
Required local tools
Always required:
terraformkubectlcurlpython3sshgitbase64realpathpasswhen using any*_PASSmapping
Conditionally required:
dockeronly forBOOTSTRAP_DELIVERY_MODE=local-image-import, or as a fallback when no nativehtpasswdbinary is available locallyhtpasswdis preferred for registry secret generation and avoids the docker fallback
Required local Python modules:
PyYAML(python3 -m pip install PyYAML) for kubeconfig rendering during bootstrapPyNaCl(python3 -m pip install PyNaCl) only whenBOOTSTRAP_DELIVERY_MODE=forgejo-actionsso bootstrap can encrypt Forgejo Actions secrets
Required local env
Start from:
cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env
${EDITOR:-vi} scripts/hetzner/bootstrap-secrets.env
source scripts/hetzner/bootstrap-secrets.env
The mapping file should contain non-secret config plus pass entry references for secrets. Bootstrap and destroy load the first line from each configured pass entry without echoing it. Explicit env exports still override pass lookups.
When you run scripts/hetzner/bootstrap.sh, it uses this file to materialize local Kubernetes inputs before apply:
- overwrites
deploy/k8s/overlays/hetzner-single-node/secrets/unrip.envwithNEAR_INTENTS_API_KEY - overwrites
deploy/k8s/overlays/hetzner-single-node/secrets/forgejo.envwith Forgejoroot_urlanddomain - overwrites
deploy/k8s/overlays/hetzner-single-node/secrets/observability.envwith Grafana bootstrap credentials and root URL - renders
.state/hetzner/generated-overlay/as the bootstrap-time source of truth - copies the checked-in overlay patch behavior into that generated overlay
- imports platform resources from
deploy/k8s/platform/base/kustomization.yaml, so newly added platform modules such as observability manifests are included automatically - creates
registry-secretsin namespaceregistryfromREGISTRY_USERNAMEandREGISTRY_PASSWORD - creates the project docker-registry pull secret in
PROJECT_NAMESPACEfrom the same registry credentials
This is different from running kubectl apply -k deploy/k8s/overlays/hetzner-single-node manually: plain Kustomize apply only consumes the checked-in overlay files, while bootstrap applies the generated overlay copy. Manual apply still only reads the checked-in files and does not read scripts/hetzner/bootstrap-secrets.env or create the imperative registry auth secrets on its own.
Required values:
HCLOUD_TOKEN_PASSorHCLOUD_TOKENSSH_PUBLIC_KEY_PATHPUBLIC_DOMAINBASE_DOMAIN- recommended Tailscale values:
TAILSCALE_AUTH_KEY_PASSorTAILSCALE_AUTH_KEY- optional
TAILSCALE_CONTROL_PLANE_HOSTNAMEto force a stable Tailscale DNS name for kube access - if
TAILSCALE_CONTROL_PLANE_HOSTNAMEis left empty, bootstrap auto-discovers the node via localtailscale status --json
FORGEJO_DOMAINFORGEJO_ROOT_URLREGISTRY_DOMAINGRAFANA_DOMAINGRAFANA_ROOT_URLHEADLAMP_DOMAINLETSENCRYPT_EMAILREGISTRY_USERNAMEREGISTRY_PASSWORD_PASSorREGISTRY_PASSWORDNEAR_INTENTS_API_KEY_PASSorNEAR_INTENTS_API_KEYFORGEJO_ADMIN_USERNAMEFORGEJO_ADMIN_EMAILFORGEJO_ADMIN_PASSWORD_PASSorFORGEJO_ADMIN_PASSWORDGRAFANA_ADMIN_USERNAME(defaults toadmin)GRAFANA_ADMIN_PASSWORD_PASSorGRAFANA_ADMIN_PASSWORD- optional
HEADLAMP_ADMIN_TOKEN_PASSfor storing the generated Headlamp login token back intopass - optional repo settings:
FORGEJO_REPO_OWNER,FORGEJO_REPO_NAME,FORGEJO_REPO_PRIVATE
Optional for automatic DNS:
- Cloudflare:
CLOUDFLARE_API_TOKEN_PASSorCLOUDFLARE_API_TOKENCLOUDFLARE_ZONE_ID_PASSorCLOUDFLARE_ZONE_ID
- Porkbun:
PORKBUN_API_KEY_PASSorPORKBUN_API_KEYPORKBUN_SECRET_API_KEY_PASSorPORKBUN_SECRET_API_KEY
Bootstrap
bash scripts/hetzner/bootstrap.sh
Outputs:
- Hetzner VM created
- Tailscale joined if configured
- k3s installed
- cloud-init writes
/opt/unrip/bootstrap/README.txtas a marker that node-local repo bootstrap is not active yet - kubeconfig written to
.state/hetzner/kubeconfig.yaml - CI kubeconfig written to
.state/hetzner/kubeconfig.incluster.yaml - overlay secrets and ingress host patches rendered from local env /
pass .state/hetzner/generated-overlay/rendered and applied as the canonical bootstrap manifest set for that run- namespaces, Redpanda, app deployments, Forgejo, registry, Traefik-targeted ingress resources, cert-manager, issuers, and any additional platform resources referenced by
deploy/k8s/platform/base/kustomization.yamlapplied - Headlamp is deployed and wired to the configured public hostname model
- bootstrap stores the generated Headlamp service-account token in
passwhenHEADLAMP_ADMIN_TOKEN_PASSis configured - Forgejo admin account created automatically if missing
- Forgejo runner registration is generated automatically from inside the Forgejo pod and the resulting
/data/.runnerconfig is stored under the sharedforgejo-datapersistent volume used by the runner deployment - Forgejo repository created automatically in either the admin user's namespace or a pre-existing organization named by
FORGEJO_REPO_OWNER - Forgejo Actions secrets and variables configured automatically
- repo pushed to Forgejo automatically in the default
forgejo-actionsdelivery mode via authenticated HTTPS Git push - first deployment triggered from Forgejo Actions by default
Tailscale-first admin access
Recommended mode:
- public firewall exposes only
80/443 - admin access uses Tailscale
- Kubernetes API uses the Tailscale hostname when
TAILSCALE_CONTROL_PLANE_HOSTNAMEis set
TF_ADMIN_CIDR_BLOCKS remains only as a fallback if you intentionally want public admin/API exposure.
DNS and TLS
If DNS provider credentials are present, bootstrap updates:
${PUBLIC_DOMAIN}git.${PUBLIC_DOMAIN}registry.${PUBLIC_DOMAIN}grafana.${PUBLIC_DOMAIN}headlamp.${PUBLIC_DOMAIN}
Supported scripted providers:
- Cloudflare
- Porkbun
TLS is handled in-cluster by cert-manager using Let's Encrypt issuers and the rendered ingress hosts.
Grafana and Headlamp are both wired into the public hostname model by default. Keep Grafana authenticated, and treat the Headlamp token as an operator credential.
The platform base assumes the default k3s Traefik ingress controller is present; it does not install ingress-nginx.
For clean-cluster applies, the base kustomization now includes cert-manager before the ClusterIssuer resources so the issuer CRs can be created in the same bootstrap flow.
Observe the cluster
KUBECONFIG=.state/hetzner/kubeconfig.yaml kubectl get pods -A
bash scripts/k8s/logs.sh
For the web log UI and observability stack, see docs/k8s-observability.md.
Self-hosted CI/CD handoff
Default bootstrap now automates the Forgejo handoff:
- create the Forgejo repo in the admin namespace or in a pre-existing organization named by
FORGEJO_REPO_OWNER - configure the repository Actions secrets:
KUBECONFIG_B64REGISTRY_USERNAMEREGISTRY_PASSWORD
- configure the repository Actions variables:
REGISTRY_HOST=${REGISTRY_DOMAIN}PROJECT_NAMEPROJECT_NAMESPACEPROJECT_DEPLOYMENTS
- push the current repo to
main
The workflow then:
- starts a Kubernetes Job in the target namespace
- checks out the repo inside that Job using the Forgejo job token via
Authorization: Bearer ...HTTP auth - uses Kaniko plus the Kubernetes registry auth secret to build and push
${REGISTRY_DOMAIN}/${PROJECT_NAME}:${GIT_SHA} - updates the app deployments in
PROJECT_NAMESPACE - waits for rollout
Legacy local-image bootstrap remains available with:
BOOTSTRAP_DELIVERY_MODE=local-image-import bash scripts/hetzner/bootstrap.sh
Destroy everything
Default destroy only removes Terraform-managed Hetzner infrastructure:
source scripts/hetzner/bootstrap-secrets.env
bash scripts/hetzner/destroy.sh
Opt-in flags make destructive cleanup of bootstrap-managed leftovers explicit:
source scripts/hetzner/bootstrap-secrets.env
DESTROY_DNS=true \
DESTROY_LOCAL_STATE=true \
DESTROY_FORGEJO_REPO=true \
bash scripts/hetzner/destroy.sh
destroy.sh reads HCLOUD_TOKEN, optional TAILSCALE_AUTH_KEY, optional DNS provider credentials, and optional Forgejo admin credentials via the same *_PASS mapping mechanism as bootstrap.
It uses the same Terraform inputs as bootstrap for the infrastructure resources, then can optionally:
- delete the scripted DNS records for
${PUBLIC_DOMAIN},git.${PUBLIC_DOMAIN},registry.${PUBLIC_DOMAIN},grafana.${PUBLIC_DOMAIN}, andheadlamp.${PUBLIC_DOMAIN} - remove local bootstrap artifacts under
.state/hetzner/,deploy/k8s/overlays/hetzner-single-node/generated/, and the local Terraform working/state files ininfra/terraform/hetzner/ - delete the bootstrap-managed Forgejo repository via the Forgejo API
Supported scripted DNS cleanup providers:
- Cloudflare
- Porkbun
Cleanup defaults are intentionally conservative:
DESTROY_DNS=falsekeeps provider records unless you explicitly opt inDESTROY_LOCAL_STATE=falsekeeps the last kubeconfigs and generated manifests for inspectionDESTROY_FORGEJO_REPO=falsekeeps the remote Git repository unless you explicitly opt in
If any optional cleanup step is enabled but cannot run because credentials are missing, destroy.sh prints a skip message describing what was not removed.
If DNS cleanup or Forgejo repo deletion fails after Terraform teardown, rerun the same cleanup flags or remove the remaining resources manually.
Current limitations
- organization-owned repo bootstrap works only when
FORGEJO_REPO_OWNERnames a pre-existing organization that the configured admin can create repositories in; bootstrap does not create the organization itself - unattended repo seeding now uses an authenticated HTTPS remote built from the configured Forgejo admin credentials, so operators should replace that local remote with a token, SSH, or credential-helper-backed remote after bootstrap if they do not want credentials stored in
.git/config - cloud-init no longer clones a bootstrap repository onto the node; Kubernetes asset delivery is still workstation-driven after Terraform
bootstrap_repo_pathin Terraform is only a reserved marker for a future node-local bootstrap/GitOps flow- bootstrap requires either a local
htpasswdbinary or localdockeras a fallback to generate the registry htpasswd secret - bootstrap and CI authentication paths should still be hardened before production use
- runner identity is persisted under the shared
forgejo-dataPVC, so deleting theforgejo-runnerpod is safe but deleting that PVC forces re-registration on the next bootstrap run