117 lines
3.3 KiB
Markdown
117 lines
3.3 KiB
Markdown
# Hetzner rebuild pipeline map
|
|
|
|
This document summarizes the currently intended rebuild flow for the repo-driven Hetzner single-node cluster.
|
|
|
|
It is a companion to the operator runbooks, not a competing source of truth.
|
|
Use these first for exact commands and required env:
|
|
|
|
- `docs/hetzner-k3s-bootstrap.md`
|
|
- `docs/hetzner-self-hosted-ci-runbook.md`
|
|
- `docs/k8s-observability.md`
|
|
|
|
## High-level rebuild sequence
|
|
|
|
1. prepare `scripts/hetzner/bootstrap-secrets.env`
|
|
2. source it so `*_PASS` mappings resolve through `pass`
|
|
3. optionally run `scripts/hetzner/destroy.sh`
|
|
4. run `scripts/hetzner/bootstrap.sh`
|
|
5. let bootstrap:
|
|
- provision/update Hetzner infra with Terraform
|
|
- configure DNS when provider credentials are present
|
|
- fetch the real kubeconfig from the node
|
|
- render `.state/hetzner/generated-overlay/`
|
|
- apply platform + project manifests
|
|
- bootstrap Forgejo admin, runner, repo, and Actions configuration
|
|
- seed the repo into Forgejo
|
|
- trigger the normal Forgejo Actions build/push/deploy path
|
|
6. verify public/operator surfaces:
|
|
- Forgejo
|
|
- registry
|
|
- Grafana
|
|
- Headlamp
|
|
7. verify workload health and CI success
|
|
|
|
## Ownership boundaries
|
|
|
|
### Terraform owns
|
|
- Hetzner VM
|
|
- network
|
|
- firewall
|
|
- cloud-init user data
|
|
|
|
### Cloud-init owns
|
|
- OS package prep
|
|
- optional Tailscale join
|
|
- k3s installation
|
|
- a marker file under `/opt/unrip/bootstrap/README.txt`
|
|
|
|
Cloud-init does **not** clone this repo or apply Kubernetes manifests.
|
|
|
|
### Bootstrap script owns
|
|
- `pass`-resolved secret loading
|
|
- DNS automation
|
|
- kubeconfig retrieval/rendering
|
|
- generated overlay rendering under `.state/hetzner/generated-overlay/`
|
|
- imperative registry auth secret creation
|
|
- Forgejo bootstrap API calls
|
|
- repo seeding
|
|
- Headlamp token export to `pass`
|
|
|
|
### Kubernetes manifests own
|
|
- platform services
|
|
- project services
|
|
- ingress/TLS resources
|
|
- observability stack
|
|
- persistent volume claims and workload specs
|
|
|
|
## Current default runtime model
|
|
|
|
Platform services:
|
|
- Forgejo
|
|
- Forgejo runner
|
|
- registry
|
|
- cert-manager
|
|
- Grafana
|
|
- Loki
|
|
- Promtail
|
|
- Headlamp
|
|
|
|
Project services:
|
|
- Redpanda
|
|
- `near-intents-ingest`
|
|
- `dummy-reactor`
|
|
- `dummy-executor`
|
|
- `dummy-consumer`
|
|
|
|
Ingress/controller model:
|
|
- Traefik bundled with k3s
|
|
- no ingress-nginx in the active path
|
|
|
|
## Rebuild verification checklist
|
|
|
|
After bootstrap, verify:
|
|
|
|
```bash
|
|
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
|
|
kubectl get nodes -o wide
|
|
kubectl get pods -A
|
|
kubectl -n observability get deploy,ds,pods,svc,ingress,secrets
|
|
kubectl -n forgejo get deploy,pods,svc,ingress
|
|
kubectl -n registry get deploy,pods,svc,ingress
|
|
kubectl -n unrip get deploy,pods
|
|
```
|
|
|
|
Public/operator surfaces should respond:
|
|
- `https://git.<public-domain>/`
|
|
- `https://registry.<public-domain>/v2/`
|
|
- `https://grafana.<public-domain>/`
|
|
- `https://headlamp.<public-domain>/`
|
|
|
|
CI should show a successful deploy workflow in Forgejo Actions.
|
|
|
|
## Current caveat
|
|
|
|
The core Hetzner/k3s/Forgejo path has been rebuilt successfully before.
|
|
Headlamp was added afterward and validated live on the rebuilt cluster, but a brand-new destroy/rebuild rehearsal with Headlamp included has not yet been re-run from zero.
|
|
|
|
So the rebuild story is repo-driven and operationally close to fully reproducible, with one remaining value-add validation step: a final clean-room rebuild after the latest Headlamp/docs cleanup.
|