doran/docs/hetzner-k3s-bootstrap.md

167 lines
5.5 KiB
Markdown

# Hetzner + k3s + self-hosted Git/CI bootstrap
Goal: provision and deploy everything from this repo to a single Hetzner machine with no manual server login.
## Stack
- Terraform provisions the Hetzner Cloud VM, private network, and firewall
- cloud-init installs Tailscale first when configured, then installs k3s automatically
- Kubernetes manifests deploy:
- Redpanda
- trading system services
- private registry
- Forgejo
- ingress-nginx
- cert-manager
- ACME issuers
- local bootstrap script:
- runs Terraform
- optionally creates DNS records via Cloudflare or Porkbun
- writes overlay secrets/host patches from local env
- applies the Hetzner single-node k8s overlay
- builds the current app image locally
- fetches the real kubeconfig from the node
- imports the bootstrap image into k3s for the first rollout
## Files
- `infra/terraform/hetzner/`
- `deploy/k8s/base/`
- `deploy/k8s/overlays/hetzner-single-node/`
- `scripts/hetzner/bootstrap.sh`
- `scripts/hetzner/configure-cloudflare-dns.sh`
- `scripts/hetzner/destroy.sh`
- `scripts/k8s/logs.sh`
- `.forgejo/workflows/deploy.yml`
## Required local tools
- `terraform`
- `kubectl`
- `docker`
- `curl`
- `python3`
- `git`
- `pass`
## Required local env
Start from:
```bash
cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env
${EDITOR:-vi} scripts/hetzner/bootstrap-secrets.env
source scripts/hetzner/bootstrap-secrets.env
```
The mapping file should contain non-secret config plus `pass` entry references for secrets. Bootstrap and destroy load the first line from each configured pass entry without echoing it. Explicit env exports still override `pass` lookups.
Required values:
- `HCLOUD_TOKEN_PASS` or `HCLOUD_TOKEN`
- `SSH_PUBLIC_KEY_PATH`
- `PUBLIC_DOMAIN`
- `BASE_DOMAIN`
- recommended Tailscale values:
- `TAILSCALE_AUTH_KEY_PASS` or `TAILSCALE_AUTH_KEY`
- `TAILSCALE_CONTROL_PLANE_HOSTNAME`
- `FORGEJO_DOMAIN`
- `FORGEJO_ROOT_URL`
- `REGISTRY_DOMAIN`
- `LETSENCRYPT_EMAIL`
- `REGISTRY_USERNAME`
- `REGISTRY_PASSWORD_PASS` or `REGISTRY_PASSWORD`
- `NEAR_INTENTS_API_KEY_PASS` or `NEAR_INTENTS_API_KEY`
- `FORGEJO_ADMIN_USERNAME`
- `FORGEJO_ADMIN_EMAIL`
- `FORGEJO_ADMIN_PASSWORD_PASS` or `FORGEJO_ADMIN_PASSWORD`
- optional generated-secret target: `FORGEJO_RUNNER_REGISTRATION_TOKEN_PASS`
- optional repo settings: `FORGEJO_REPO_OWNER`, `FORGEJO_REPO_NAME`, `FORGEJO_REPO_PRIVATE`
Optional for automatic DNS:
- Cloudflare:
- `CLOUDFLARE_API_TOKEN_PASS` or `CLOUDFLARE_API_TOKEN`
- `CLOUDFLARE_ZONE_ID_PASS` or `CLOUDFLARE_ZONE_ID`
- Porkbun:
- `PORKBUN_API_KEY_PASS` or `PORKBUN_API_KEY`
- `PORKBUN_SECRET_API_KEY_PASS` or `PORKBUN_SECRET_API_KEY`
## Bootstrap
```bash
bash scripts/hetzner/bootstrap.sh
```
Outputs:
- Hetzner VM created
- Tailscale joined if configured
- k3s installed
- kubeconfig written to `.state/hetzner/kubeconfig.yaml`
- CI kubeconfig written to `.state/hetzner/kubeconfig.incluster.yaml`
- overlay secrets and ingress host patches rendered from local env / `pass`
- namespaces, Redpanda, app deployments, Forgejo, registry, ingress, cert-manager, and issuers applied
- Forgejo admin account created automatically if missing
- Forgejo runner registration token generated automatically and stored in the live Kubernetes secret
- Forgejo repository created automatically
- Forgejo Actions secrets and variables configured automatically
- repo pushed to Forgejo automatically in the default `forgejo-actions` delivery mode
- first deployment triggered from Forgejo Actions by default
## Tailscale-first admin access
Recommended mode:
- public firewall exposes only `80/443`
- admin access uses Tailscale
- Kubernetes API uses the Tailscale hostname when `TAILSCALE_CONTROL_PLANE_HOSTNAME` is set
`TF_ADMIN_CIDR_BLOCKS` remains only as a fallback if you intentionally want public admin/API exposure.
## DNS and TLS
If DNS provider credentials are present, bootstrap updates:
- `${BASE_DOMAIN}`
- `git.${BASE_DOMAIN}`
- `registry.${BASE_DOMAIN}`
Supported scripted providers:
- Cloudflare
- Porkbun
TLS is handled in-cluster by cert-manager using Let's Encrypt issuers and the rendered ingress hosts.
## Observe the cluster
```bash
KUBECONFIG=.state/hetzner/kubeconfig.yaml kubectl get pods -A
bash scripts/k8s/logs.sh
```
## Self-hosted CI/CD handoff
Default bootstrap now automates the Forgejo handoff:
1. create the Forgejo repo
2. configure the repository Actions secrets:
- `KUBECONFIG_B64`
- `REGISTRY_USERNAME`
- `REGISTRY_PASSWORD`
3. configure the repository Actions variables:
- `REGISTRY_HOST=${REGISTRY_DOMAIN}`
- `PROJECT_NAME`
- `PROJECT_NAMESPACE`
- `PROJECT_DEPLOYMENTS`
4. push the current repo to `main`
The workflow then:
- starts a Kubernetes Job in the target namespace
- uses Kaniko plus the Kubernetes registry auth secret to build and push `${REGISTRY_DOMAIN}/${PROJECT_NAME}:${GIT_SHA}`
- updates the app deployments in `PROJECT_NAMESPACE`
- waits for rollout
Legacy local-image bootstrap remains available with:
```bash
BOOTSTRAP_DELIVERY_MODE=local-image-import bash scripts/hetzner/bootstrap.sh
```
## Destroy everything
```bash
source scripts/hetzner/bootstrap-secrets.env
bash scripts/hetzner/destroy.sh
```
`destroy.sh` reads `HCLOUD_TOKEN` and optional `TAILSCALE_AUTH_KEY` via the same `*_PASS` mapping mechanism as bootstrap.
## Current limitations
- automated repo creation currently assumes `FORGEJO_REPO_OWNER == FORGEJO_ADMIN_USERNAME`
- bootstrap still uses local `docker` to generate the registry htpasswd secret
- bootstrap and CI authentication paths should still be hardened before production use