doran/docs/hetzner-self-hosted-ci-runbook.md

125 lines
5.5 KiB
Markdown

# Hetzner self-hosted CI/CD runbook
This is the operator runbook for the handoff from local bootstrap to self-hosted Forgejo-based deployment.
## Bootstrap prerequisites
From your workstation:
```bash
cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env
source scripts/hetzner/bootstrap-secrets.env
python3 -c 'import nacl' # verify PyNaCl is installed for Actions secret encryption
bash scripts/hetzner/bootstrap.sh
```
`scripts/hetzner/bootstrap-secrets.env` should contain non-secret bootstrap settings and `pass` entry mappings like `HCLOUD_TOKEN_PASS`, `REGISTRY_PASSWORD_PASS`, and `FORGEJO_ADMIN_PASSWORD_PASS`. If you explicitly export the raw env vars, they override the `pass` lookups.
After that you should have:
- `.state/hetzner/kubeconfig.yaml`
- `.state/hetzner/kubeconfig.incluster.yaml`
- Forgejo reachable at `https://${FORGEJO_DOMAIN}`
- the target Forgejo repo created automatically
- repository Actions secrets/variables populated for CI
- the current repo pushed to Forgejo automatically in default mode
- Registry reachable at `https://${REGISTRY_DOMAIN}`
- private admin/control-plane access over Tailscale if configured
Bootstrap repo automation requires `FORGEJO_ADMIN_USERNAME`, `FORGEJO_ADMIN_PASSWORD`, and Python `PyNaCl` locally so the script can encrypt Forgejo Actions secrets before upload. The same bootstrap flow now also creates the initial Forgejo admin account and generates the one-time runner registration token after Forgejo is up.
## Verify the cluster
```bash
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
kubectl get nodes -o wide
kubectl get pods -A
kubectl -n forgejo get deploy,pods,svc,ingress
kubectl -n registry get deploy,pods,svc,ingress
kubectl -n unrip get deploy,pods
```
## Seed the repo into Forgejo
Default bootstrap already seeds the repo with:
```bash
bash scripts/hetzner/seed-forgejo-repo.sh
```
You only need to run it manually if you skipped seeding during bootstrap or want to push again after local changes.
## Configure Forgejo Actions secrets and variables
Bootstrap upserts these repository secrets automatically:
- `KUBECONFIG_B64`
- `REGISTRY_USERNAME`
- `REGISTRY_PASSWORD`
Bootstrap upserts these repository variables automatically:
- `REGISTRY_HOST=${REGISTRY_DOMAIN}`
- `PROJECT_NAME=${PROJECT_NAME}`
- `PROJECT_NAMESPACE=${PROJECT_NAMESPACE}`
- `PROJECT_DEPLOYMENTS` as a comma-separated version of the bootstrap deployment list
The Forgejo repo configuration step is idempotent, so rerunning bootstrap updates the same repo secrets/variables in place.
## Workflow behavior
The workflow in `.forgejo/workflows/deploy.yml` now:
1. installs `kubectl` on the Forgejo runner
2. loads kubeconfig from `KUBECONFIG_B64`
3. computes `IMAGE=${REGISTRY_HOST}/${PROJECT_NAME}:${GIT_SHA}`
4. creates an in-cluster Kubernetes Job in `PROJECT_NAMESPACE`
5. that Job checks out the repo with the Forgejo job token in an init container
6. Kaniko builds and pushes the image using the Kubernetes registry auth secret
7. the workflow updates each deployment listed in `PROJECT_DEPLOYMENTS` inside `PROJECT_NAMESPACE`
8. the workflow waits for rollout after each image update
Default behavior if you do not set project variables:
- `PROJECT_NAME=unrip`
- `PROJECT_NAMESPACE=unrip`
- `PROJECT_DEPLOYMENTS=near-intents-ingest,dummy-reactor,dummy-executor,dummy-consumer`
- `PROJECT_REGISTRY_SECRET_NAME=unrip-registry-creds`
For a future project, reuse the same workflow by changing only the Forgejo repository variables instead of copying the workflow.
Default bootstrap now uses the same routine CI path for the first deploy:
- bootstrap fetches the real kubeconfig from the node
- bootstrap derives an in-cluster kubeconfig for the runner
- bootstrap creates the Forgejo repo and Actions config
- bootstrap pushes to `main`
- Forgejo Actions builds the image in-cluster and deploys it
Legacy mode still exists if you explicitly set:
```bash
BOOTSTRAP_DELIVERY_MODE=local-image-import
```
## Trigger deploys
Push to `main` in Forgejo:
```bash
git push forgejo main
```
## Observe deploys
```bash
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
kubectl -n unrip rollout status deployment/near-intents-ingest --timeout=300s
kubectl -n unrip rollout status deployment/dummy-reactor --timeout=300s
kubectl -n unrip rollout status deployment/dummy-executor --timeout=300s
kubectl -n unrip rollout status deployment/dummy-consumer --timeout=300s
kubectl -n unrip get pods -o wide
kubectl get events -A --sort-by=.lastTimestamp | tail -n 50
```
## DNS and TLS
If DNS automation was enabled during bootstrap, A records for the base, Forgejo, and registry hosts are already managed from the repo-side bootstrap.
Currently supported DNS providers:
- Cloudflare
- Porkbun
TLS is issued by cert-manager using the rendered Let's Encrypt email and ingress hosts.
## Current limitations
- the bootstrap path now creates the initial admin account and one-time runner registration token automatically from inside the Forgejo pod, but it still depends on the operator supplying the intended admin credentials up front
- runner registration no longer needs a pre-seeded Kubernetes secret, but the runner config still lives on `emptyDir`, so bootstrap must recreate `/data/.runner` after a runner pod replacement
- automated repo creation currently assumes `FORGEJO_REPO_OWNER == FORGEJO_ADMIN_USERNAME`
- the runner currently uses host-mode jobs and installs `kubectl` at job start; the image build itself runs in-cluster via Kaniko, which is functional but not yet optimized