doran/docs/hetzner-k3s-bootstrap.md

12 KiB

Hetzner + k3s + self-hosted Git/CI bootstrap

Goal: provision and deploy everything from this repo to a single Hetzner machine with no manual server login.

Stack

  • Terraform provisions the Hetzner Cloud VM, private network, and firewall
  • cloud-init installs Tailscale first when configured, then installs k3s automatically
  • cloud-init leaves only a bootstrap marker on the node; it does not clone this repo or apply Kubernetes assets
  • Kubernetes manifests deploy:
    • Redpanda
    • trading system services
    • private registry
    • Forgejo
    • Loki + Promtail + Grafana observability
    • k3s-bundled Traefik ingress resources
    • cert-manager
    • ACME issuers
  • local bootstrap script:
    • runs Terraform
    • optionally creates DNS records via Cloudflare or Porkbun
    • fetches the real kubeconfig from the node
    • writes overlay secrets/host patches from local env
    • renders .state/hetzner/generated-overlay/ from the checked-in Hetzner overlay template plus deploy/k8s/platform/base/kustomization.yaml
    • applies that generated overlay from the operator workstation checkout
    • builds the current app image locally
    • imports the bootstrap image into k3s for the first rollout

Files

  • infra/terraform/hetzner/
  • deploy/k8s/base/
  • deploy/k8s/overlays/hetzner-single-node/
  • scripts/hetzner/bootstrap.sh
  • scripts/hetzner/configure-cloudflare-dns.sh
  • scripts/hetzner/destroy.sh
  • scripts/k8s/logs.sh
  • .forgejo/workflows/deploy.yml

Required local tools

Always required:

  • terraform
  • kubectl
  • curl
  • python3
  • ssh
  • git
  • base64
  • realpath
  • pass when using any *_PASS mapping

Conditionally required:

  • docker only for BOOTSTRAP_DELIVERY_MODE=local-image-import, or as a fallback when no native htpasswd binary is available locally
  • htpasswd is preferred for registry secret generation and avoids the docker fallback

Required local Python modules:

  • PyYAML (python3 -m pip install PyYAML) for kubeconfig rendering during bootstrap
  • PyNaCl (python3 -m pip install PyNaCl) only when BOOTSTRAP_DELIVERY_MODE=forgejo-actions so bootstrap can encrypt Forgejo Actions secrets

Required local env

Start from:

cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env
${EDITOR:-vi} scripts/hetzner/bootstrap-secrets.env
source scripts/hetzner/bootstrap-secrets.env

The mapping file should contain non-secret config plus pass entry references for secrets. Bootstrap and destroy load the first line from each configured pass entry without echoing it. Explicit env exports still override pass lookups.

When you run scripts/hetzner/bootstrap.sh, it uses this file to materialize local Kubernetes inputs before apply:

  • overwrites deploy/k8s/overlays/hetzner-single-node/secrets/unrip.env with NEAR_INTENTS_API_KEY
  • overwrites deploy/k8s/overlays/hetzner-single-node/secrets/forgejo.env with Forgejo root_url and domain
  • overwrites deploy/k8s/overlays/hetzner-single-node/secrets/observability.env with Grafana bootstrap credentials and root URL
  • renders .state/hetzner/generated-overlay/ as the bootstrap-time source of truth
  • copies the checked-in overlay patch behavior into that generated overlay
  • imports platform resources from deploy/k8s/platform/base/kustomization.yaml, so newly added platform modules such as observability manifests are included automatically
  • creates registry-secrets in namespace registry from REGISTRY_USERNAME and REGISTRY_PASSWORD
  • creates the project docker-registry pull secret in PROJECT_NAMESPACE from the same registry credentials

This is different from running kubectl apply -k deploy/k8s/overlays/hetzner-single-node manually: plain Kustomize apply only consumes the checked-in overlay files, while bootstrap applies the generated overlay copy. Manual apply still only reads the checked-in files and does not read scripts/hetzner/bootstrap-secrets.env or create the imperative registry auth secrets on its own.

Required values:

  • HCLOUD_TOKEN_PASS or HCLOUD_TOKEN
  • SSH_PUBLIC_KEY_PATH
  • PUBLIC_DOMAIN
  • BASE_DOMAIN
  • recommended Tailscale values:
    • TAILSCALE_AUTH_KEY_PASS or TAILSCALE_AUTH_KEY
    • optional TAILSCALE_CONTROL_PLANE_HOSTNAME to force a stable Tailscale DNS name for kube access
    • if TAILSCALE_CONTROL_PLANE_HOSTNAME is left empty, bootstrap auto-discovers the node via local tailscale status --json
  • FORGEJO_DOMAIN
  • FORGEJO_ROOT_URL
  • REGISTRY_DOMAIN
  • GRAFANA_DOMAIN
  • GRAFANA_ROOT_URL
  • LETSENCRYPT_EMAIL
  • REGISTRY_USERNAME
  • REGISTRY_PASSWORD_PASS or REGISTRY_PASSWORD
  • NEAR_INTENTS_API_KEY_PASS or NEAR_INTENTS_API_KEY
  • FORGEJO_ADMIN_USERNAME
  • FORGEJO_ADMIN_EMAIL
  • FORGEJO_ADMIN_PASSWORD_PASS or FORGEJO_ADMIN_PASSWORD
  • GRAFANA_ADMIN_USERNAME (defaults to admin)
  • GRAFANA_ADMIN_PASSWORD_PASS or GRAFANA_ADMIN_PASSWORD
  • optional repo settings: FORGEJO_REPO_OWNER, FORGEJO_REPO_NAME, FORGEJO_REPO_PRIVATE

Optional for automatic DNS:

  • Cloudflare:
    • CLOUDFLARE_API_TOKEN_PASS or CLOUDFLARE_API_TOKEN
    • CLOUDFLARE_ZONE_ID_PASS or CLOUDFLARE_ZONE_ID
  • Porkbun:
    • PORKBUN_API_KEY_PASS or PORKBUN_API_KEY
    • PORKBUN_SECRET_API_KEY_PASS or PORKBUN_SECRET_API_KEY

Bootstrap

bash scripts/hetzner/bootstrap.sh

Outputs:

  • Hetzner VM created
  • Tailscale joined if configured
  • k3s installed
  • cloud-init writes /opt/unrip/bootstrap/README.txt as a marker that node-local repo bootstrap is not active yet
  • kubeconfig written to .state/hetzner/kubeconfig.yaml
  • CI kubeconfig written to .state/hetzner/kubeconfig.incluster.yaml
  • overlay secrets and ingress host patches rendered from local env / pass
  • .state/hetzner/generated-overlay/ rendered and applied as the canonical bootstrap manifest set for that run
  • namespaces, Redpanda, app deployments, Forgejo, registry, Traefik-targeted ingress resources, cert-manager, issuers, and any additional platform resources referenced by deploy/k8s/platform/base/kustomization.yaml applied
  • Forgejo admin account created automatically if missing
  • Forgejo runner registration is generated automatically from inside the Forgejo pod and the resulting /data/.runner config is stored under the shared forgejo-data persistent volume used by the runner deployment
  • Forgejo repository created automatically in either the admin user's namespace or a pre-existing organization named by FORGEJO_REPO_OWNER
  • Forgejo Actions secrets and variables configured automatically
  • repo pushed to Forgejo automatically in the default forgejo-actions delivery mode via authenticated HTTPS Git push
  • first deployment triggered from Forgejo Actions by default

Tailscale-first admin access

Recommended mode:

  • public firewall exposes only 80/443
  • admin access uses Tailscale
  • Kubernetes API uses the Tailscale hostname when TAILSCALE_CONTROL_PLANE_HOSTNAME is set

TF_ADMIN_CIDR_BLOCKS remains only as a fallback if you intentionally want public admin/API exposure.

DNS and TLS

If DNS provider credentials are present, bootstrap updates:

  • ${PUBLIC_DOMAIN}
  • git.${PUBLIC_DOMAIN}
  • registry.${PUBLIC_DOMAIN}
  • grafana.${PUBLIC_DOMAIN}

Supported scripted providers:

  • Cloudflare
  • Porkbun

TLS is handled in-cluster by cert-manager using Let's Encrypt issuers and the rendered ingress hosts. Grafana is the default observability UI wired into the public hostname model. Keep Grafana authenticated. The platform base assumes the default k3s Traefik ingress controller is present; it does not install ingress-nginx. For clean-cluster applies, the base kustomization now includes cert-manager before the ClusterIssuer resources so the issuer CRs can be created in the same bootstrap flow.

Observe the cluster

KUBECONFIG=.state/hetzner/kubeconfig.yaml kubectl get pods -A
bash scripts/k8s/logs.sh

For the web log UI and observability stack, see docs/k8s-observability.md.

Self-hosted CI/CD handoff

Default bootstrap now automates the Forgejo handoff:

  1. create the Forgejo repo in the admin namespace or in a pre-existing organization named by FORGEJO_REPO_OWNER
  2. configure the repository Actions secrets:
    • KUBECONFIG_B64
    • REGISTRY_USERNAME
    • REGISTRY_PASSWORD
  3. configure the repository Actions variables:
    • REGISTRY_HOST=${REGISTRY_DOMAIN}
    • PROJECT_NAME
    • PROJECT_NAMESPACE
    • PROJECT_DEPLOYMENTS
  4. push the current repo to main

The workflow then:

  • starts a Kubernetes Job in the target namespace
  • checks out the repo inside that Job using the Forgejo job token via Authorization: Bearer ... HTTP auth
  • uses Kaniko plus the Kubernetes registry auth secret to build and push ${REGISTRY_DOMAIN}/${PROJECT_NAME}:${GIT_SHA}
  • updates the app deployments in PROJECT_NAMESPACE
  • waits for rollout

Legacy local-image bootstrap remains available with:

BOOTSTRAP_DELIVERY_MODE=local-image-import bash scripts/hetzner/bootstrap.sh

Destroy everything

Default destroy only removes Terraform-managed Hetzner infrastructure:

source scripts/hetzner/bootstrap-secrets.env
bash scripts/hetzner/destroy.sh

Opt-in flags make destructive cleanup of bootstrap-managed leftovers explicit:

source scripts/hetzner/bootstrap-secrets.env
DESTROY_DNS=true \
DESTROY_LOCAL_STATE=true \
DESTROY_FORGEJO_REPO=true \
bash scripts/hetzner/destroy.sh

destroy.sh reads HCLOUD_TOKEN, optional TAILSCALE_AUTH_KEY, optional DNS provider credentials, and optional Forgejo admin credentials via the same *_PASS mapping mechanism as bootstrap. It uses the same Terraform inputs as bootstrap for the infrastructure resources, then can optionally:

  • delete the scripted DNS records for ${BASE_DOMAIN}, git.${BASE_DOMAIN}, registry.${BASE_DOMAIN}, and grafana.${BASE_DOMAIN}
  • remove local bootstrap artifacts under .state/hetzner/, deploy/k8s/overlays/hetzner-single-node/generated/, and the local Terraform working/state files in infra/terraform/hetzner/
  • delete the bootstrap-managed Forgejo repository via the Forgejo API

Supported scripted DNS cleanup providers:

  • Cloudflare
  • Porkbun

Cleanup defaults are intentionally conservative:

  • DESTROY_DNS=false keeps provider records unless you explicitly opt in
  • DESTROY_LOCAL_STATE=false keeps the last kubeconfigs and generated manifests for inspection
  • DESTROY_FORGEJO_REPO=false keeps the remote Git repository unless you explicitly opt in

If any optional cleanup step is enabled but cannot run because credentials are missing, destroy.sh prints a skip message describing what was not removed. If DNS cleanup or Forgejo repo deletion fails after Terraform teardown, rerun the same cleanup flags or remove the remaining resources manually.

Current limitations

  • organization-owned repo bootstrap works only when FORGEJO_REPO_OWNER names a pre-existing organization that the configured admin can create repositories in; bootstrap does not create the organization itself
  • unattended repo seeding now uses an authenticated HTTPS remote built from the configured Forgejo admin credentials, so operators should replace that local remote with a token, SSH, or credential-helper-backed remote after bootstrap if they do not want credentials stored in .git/config
  • cloud-init no longer clones a bootstrap repository onto the node; Kubernetes asset delivery is still workstation-driven after Terraform
  • bootstrap_repo_path in Terraform is only a reserved marker for a future node-local bootstrap/GitOps flow
  • bootstrap requires either a local htpasswd binary or local docker as a fallback to generate the registry htpasswd secret
  • bootstrap and CI authentication paths should still be hardened before production use
  • runner identity is persisted under the shared forgejo-data PVC, so deleting the forgejo-runner pod is safe but deleting that PVC forces re-registration on the next bootstrap run