Hetzner rebuild pipeline map

This document summarizes the currently intended rebuild flow for the repo-driven Hetzner single-node cluster.

It is a companion to the operator runbooks, not a competing source of truth. Use these first for exact commands and required env:

docs/hetzner-k3s-bootstrap.md
docs/hetzner-self-hosted-ci-runbook.md
docs/k8s-observability.md

High-level rebuild sequence

prepare scripts/hetzner/bootstrap-secrets.env
source it so *_PASS mappings resolve through pass
optionally run scripts/hetzner/destroy.sh
run scripts/hetzner/bootstrap.sh
let bootstrap:
- provision/update Hetzner infra with Terraform
- configure DNS when provider credentials are present
- fetch the real kubeconfig from the node
- render .state/hetzner/generated-overlay/
- apply platform + project manifests
- bootstrap Forgejo admin, runner, repo, and Actions configuration
- seed the repo into Forgejo
- trigger the normal Forgejo Actions build/push/deploy path
verify public/operator surfaces:
- Forgejo
- registry
- Grafana
- Headlamp
verify workload health and CI success

Ownership boundaries

Terraform owns

Hetzner VM
network
firewall
cloud-init user data

Cloud-init owns

OS package prep
optional Tailscale join
k3s installation
a marker file under /opt/unrip/bootstrap/README.txt

Cloud-init does not clone this repo or apply Kubernetes manifests.

Bootstrap script owns

pass-resolved secret loading
DNS automation
kubeconfig retrieval/rendering
generated overlay rendering under .state/hetzner/generated-overlay/
imperative registry auth secret creation
Forgejo bootstrap API calls
repo seeding
Headlamp token export to pass

Kubernetes manifests own

platform services
project services
ingress/TLS resources
observability stack
persistent volume claims and workload specs

Current default runtime model

Platform services:

Forgejo
Forgejo runner
registry
cert-manager
Grafana
Loki
Promtail
Headlamp

Project services:

Redpanda
near-intents-ingest
dummy-reactor
dummy-executor
dummy-consumer

Ingress/controller model:

Traefik bundled with k3s
no ingress-nginx in the active path

Rebuild verification checklist

After bootstrap, verify:

export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
kubectl get nodes -o wide
kubectl get pods -A
kubectl -n observability get deploy,ds,pods,svc,ingress,secrets
kubectl -n forgejo get deploy,pods,svc,ingress
kubectl -n registry get deploy,pods,svc,ingress
kubectl -n unrip get deploy,pods

Public/operator surfaces should respond:

https://git.<public-domain>/
https://registry.<public-domain>/v2/
https://grafana.<public-domain>/
https://headlamp.<public-domain>/

CI should show a successful deploy workflow in Forgejo Actions.

Current caveat

The core Hetzner/k3s/Forgejo path has been rebuilt successfully before. Headlamp was added afterward and validated live on the rebuilt cluster, but a brand-new destroy/rebuild rehearsal with Headlamp included has not yet been re-run from zero.

So the rebuild story is repo-driven and operationally close to fully reproducible, with one remaining value-add validation step: a final clean-room rebuild after the latest Headlamp/docs cleanup.

3.3 KiB Raw Blame History