chore: reconcile hetzner bootstrap docs and state
This commit is contained in:
parent
63975a9e7a
commit
15ec32bece
22 changed files with 415 additions and 733 deletions
|
|
@ -36,4 +36,4 @@ EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state
|
||||||
# - optional DNS provider creds via *_PASS or direct env vars
|
# - optional DNS provider creds via *_PASS or direct env vars
|
||||||
#
|
#
|
||||||
# Future k3s deployment should source the app values from Kubernetes Secret/ConfigMap.
|
# Future k3s deployment should source the app values from Kubernetes Secret/ConfigMap.
|
||||||
# Hetzner bootstrap path clones the repo to /opt/unrip/repo for later deploy/k8s assets.
|
# Hetzner provisioning is workstation-driven after Terraform; cloud-init no longer clones this repo onto the node.
|
||||||
|
|
|
||||||
266
README.md
266
README.md
|
|
@ -38,7 +38,6 @@ src/
|
||||||
compose.yml
|
compose.yml
|
||||||
Dockerfile
|
Dockerfile
|
||||||
docs/contracts.md
|
docs/contracts.md
|
||||||
deploy/hetzner/README.md
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Event flow
|
## Event flow
|
||||||
|
|
@ -69,136 +68,65 @@ Current topics:
|
||||||
- `cmd.execute_trade`
|
- `cmd.execute_trade`
|
||||||
- `exec.trade_result`
|
- `exec.trade_result`
|
||||||
|
|
||||||
## Primary deployment path: repo-driven Hetzner bootstrap
|
## Canonical deployment path
|
||||||
|
|
||||||
The primary production path is no longer a Compose-only VM workflow.
|
The canonical production path is the repo-driven Hetzner + k3s bootstrap flow.
|
||||||
|
Compose still exists for local development and optional single-machine testing, but it is not the primary production story.
|
||||||
|
|
||||||
The intended operating model is:
|
Current single-node cluster stack includes:
|
||||||
- Terraform provisions a Hetzner single-node environment
|
|
||||||
- cloud-init installs k3s automatically on first boot
|
|
||||||
- a local operator workstation performs the first repo-driven bootstrap
|
|
||||||
- Kubernetes manifests install Redpanda, the app workloads, Forgejo, runner, registry, and ingress-related components
|
|
||||||
- once the in-cluster Git + CI stack is alive, routine app deploys move to self-hosted CI
|
|
||||||
|
|
||||||
This is a two-phase model:
|
|
||||||
- **Phase 0:** local workstation bootstrap of a brand-new cluster
|
|
||||||
- **Phase 1:** self-hosted Forgejo + runner takes over app delivery
|
|
||||||
|
|
||||||
Compose still exists for local development and optional single-machine testing, but it is not the canonical production story.
|
|
||||||
|
|
||||||
## Prerequisites for first deployment
|
|
||||||
|
|
||||||
Install locally on the operator workstation:
|
|
||||||
- Terraform `>= 1.6`
|
|
||||||
- `kubectl`
|
|
||||||
- `docker`
|
|
||||||
- `curl`
|
|
||||||
|
|
||||||
You also need:
|
|
||||||
- a Hetzner Cloud API token
|
|
||||||
- a local SSH public key file for Terraform node provisioning
|
|
||||||
- DNS control for your chosen base domain and Forgejo hostname
|
|
||||||
- preferably a Tailscale tailnet and auth key for private admin/control-plane access
|
|
||||||
- the repo checked out locally
|
|
||||||
|
|
||||||
## Required bootstrap secrets and inputs
|
|
||||||
|
|
||||||
Create the bootstrap env file:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env
|
|
||||||
```
|
|
||||||
|
|
||||||
Set at least:
|
|
||||||
- `HCLOUD_TOKEN`
|
|
||||||
- `SSH_PUBLIC_KEY_PATH`
|
|
||||||
- `PUBLIC_DOMAIN`
|
|
||||||
- recommended:
|
|
||||||
- `TAILSCALE_AUTH_KEY`
|
|
||||||
- `TAILSCALE_CONTROL_PLANE_HOSTNAME`
|
|
||||||
- optional fallback:
|
|
||||||
- `TF_ADMIN_CIDR_BLOCKS`
|
|
||||||
- `BASE_DOMAIN`
|
|
||||||
- `FORGEJO_DOMAIN`
|
|
||||||
- `FORGEJO_ROOT_URL`
|
|
||||||
- `REGISTRY_DOMAIN`
|
|
||||||
- `LETSENCRYPT_EMAIL`
|
|
||||||
- `REGISTRY_USERNAME`
|
|
||||||
- `REGISTRY_PASSWORD`
|
|
||||||
- `NEAR_INTENTS_API_KEY`
|
|
||||||
- `FORGEJO_RUNNER_REGISTRATION_TOKEN`
|
|
||||||
- optional DNS automation:
|
|
||||||
- Cloudflare:
|
|
||||||
- `CLOUDFLARE_API_TOKEN`
|
|
||||||
- `CLOUDFLARE_ZONE_ID`
|
|
||||||
- Porkbun:
|
|
||||||
- `PORKBUN_API_KEY`
|
|
||||||
- `PORKBUN_SECRET_API_KEY`
|
|
||||||
|
|
||||||
Then load them:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
source scripts/hetzner/bootstrap-secrets.env
|
|
||||||
```
|
|
||||||
|
|
||||||
## First bootstrap sequence
|
|
||||||
|
|
||||||
Run the end-to-end bootstrap from repo root:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
bash scripts/hetzner/bootstrap.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
Current repo behavior of that script:
|
|
||||||
1. runs Terraform in `infra/terraform/hetzner`
|
|
||||||
2. optionally creates DNS records for the base, Forgejo, and registry hosts via Cloudflare or Porkbun
|
|
||||||
3. if configured, joins the node to Tailscale and prefers the Tailscale control-plane hostname for Kubernetes API access
|
|
||||||
4. waits for SSH and the k3s API endpoint to become ready
|
|
||||||
5. fetches the real k3s kubeconfig from the node and writes it to `.state/hetzner/kubeconfig.yaml`
|
|
||||||
6. renders the Hetzner single-node overlay from local operator inputs
|
|
||||||
7. creates registry pull/auth secrets
|
|
||||||
8. applies the Kubernetes bootstrap manifests
|
|
||||||
9. builds the app image locally and imports it into k3s on the node
|
|
||||||
10. performs the first rollout using the imported bootstrap image
|
|
||||||
|
|
||||||
Use the generated kubeconfig afterward:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
|
|
||||||
kubectl get nodes -o wide
|
|
||||||
kubectl get pods -A
|
|
||||||
kubectl -n unrip get deploy,pods
|
|
||||||
kubectl -n forgejo get deploy,pods,svc
|
|
||||||
```
|
|
||||||
|
|
||||||
## What is deployed into k3s
|
|
||||||
|
|
||||||
The repo-managed Kubernetes assets are under `deploy/k8s/`.
|
|
||||||
|
|
||||||
Current single-node target includes resources for:
|
|
||||||
- `unrip` workloads in namespace `unrip`
|
- `unrip` workloads in namespace `unrip`
|
||||||
- Redpanda
|
- Redpanda
|
||||||
- Forgejo
|
- Forgejo
|
||||||
- Forgejo runner
|
- Forgejo runner
|
||||||
- private registry
|
- private registry
|
||||||
- ingress-nginx namespace/resources
|
- cert-manager
|
||||||
- cert-manager namespace/resources
|
- Traefik via the k3s bundled ingress controller
|
||||||
- ACME issuers and ingress definitions
|
- Grafana
|
||||||
- a bootstrap job for Redpanda topic creation
|
- Loki
|
||||||
|
- Promtail
|
||||||
|
- Headlamp
|
||||||
|
|
||||||
Shared platform namespaces:
|
### Bootstrap entrypoint
|
||||||
- `forgejo`
|
|
||||||
- `registry`
|
|
||||||
- `ingress-nginx`
|
|
||||||
- `cert-manager`
|
|
||||||
|
|
||||||
Project-specific namespaces:
|
```bash
|
||||||
- `unrip`
|
cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env
|
||||||
- future projects should get their own namespace rather than sharing `unrip`
|
source scripts/hetzner/bootstrap-secrets.env
|
||||||
|
bash scripts/hetzner/bootstrap.sh
|
||||||
|
```
|
||||||
|
|
||||||
Important current-state nuance:
|
The bootstrap script now:
|
||||||
- the bootstrap script currently applies `deploy/k8s/base`
|
1. provisions or updates Hetzner infra with Terraform
|
||||||
- the longer-term intended target is `deploy/k8s/overlays/hetzner-single-node`
|
2. optionally manages DNS via Cloudflare or Porkbun
|
||||||
|
3. prefers Tailscale for admin/control-plane access when configured
|
||||||
|
4. fetches kubeconfig from the node into `.state/hetzner/kubeconfig.yaml`
|
||||||
|
5. renders `.state/hetzner/generated-overlay/` from repo manifests plus local secrets
|
||||||
|
6. applies platform and project resources to k3s
|
||||||
|
7. bootstraps Forgejo admin, runner, repo, and Actions configuration
|
||||||
|
8. seeds this repo into Forgejo
|
||||||
|
9. lets Forgejo Actions perform the default build/push/deploy path
|
||||||
|
10. stores the generated Headlamp login token in `pass` when `HEADLAMP_ADMIN_TOKEN_PASS` is configured
|
||||||
|
|
||||||
|
Detailed bootstrap and destroy documentation lives in:
|
||||||
|
- `docs/hetzner-k3s-bootstrap.md`
|
||||||
|
- `docs/hetzner-self-hosted-ci-runbook.md`
|
||||||
|
- `docs/k8s-observability.md`
|
||||||
|
- `deploy/hetzner/README.md`
|
||||||
|
- `deploy/k8s/README.md`
|
||||||
|
- `deploy/k8s/overlays/hetzner-single-node/README.md`
|
||||||
|
|
||||||
|
### Runtime surfaces
|
||||||
|
|
||||||
|
- Forgejo: `https://git.doran.133011.xyz/`
|
||||||
|
- Registry: `https://registry.doran.133011.xyz/`
|
||||||
|
- Grafana: `https://grafana.doran.133011.xyz/`
|
||||||
|
- Headlamp: `https://headlamp.doran.133011.xyz/`
|
||||||
|
|
||||||
|
### Operator notes
|
||||||
|
|
||||||
|
- Ingress is Traefik-based. The old ingress-nginx path is obsolete.
|
||||||
|
- Grafana is for historical log search.
|
||||||
|
- Headlamp is for browsing workloads, pods, events, and pod logs.
|
||||||
|
- Use `pass`-backed `*_PASS` variables for secrets whenever possible.
|
||||||
|
|
||||||
## Executor persistence in k3s
|
## Executor persistence in k3s
|
||||||
|
|
||||||
|
|
@ -208,106 +136,12 @@ Current persistence boundary:
|
||||||
- app env uses `EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state`
|
- app env uses `EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state`
|
||||||
- in Kubernetes, the executor deployment mounts storage at that path
|
- in Kubernetes, the executor deployment mounts storage at that path
|
||||||
- the Hetzner single-node overlay pins storage to the k3s `local-path` storage class
|
- the Hetzner single-node overlay pins storage to the k3s `local-path` storage class
|
||||||
- cloud-init also prepares the host directory boundary for executor state on first boot
|
|
||||||
|
|
||||||
Operational meaning:
|
Operational meaning:
|
||||||
- executor state lives on node-backed storage in the single-node k3s environment
|
- executor state lives on node-backed storage in the single-node k3s environment
|
||||||
- if that PVC or underlying node storage is lost, duplicate-suppression history is lost too
|
- if that PVC or underlying node storage is lost, duplicate-suppression history is lost too
|
||||||
- treat executor persistence as part of the minimal durable state of the cluster
|
- treat executor persistence as part of the minimal durable state of the cluster
|
||||||
|
|
||||||
## Failure recovery and operator checks
|
|
||||||
|
|
||||||
### If bootstrap fails before Terraform completes
|
|
||||||
Re-run after fixing the local input problem:
|
|
||||||
- missing token
|
|
||||||
- invalid CIDRs
|
|
||||||
- invalid SSH public key path
|
|
||||||
|
|
||||||
If the infrastructure must be torn down:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
source scripts/hetzner/bootstrap-secrets.env
|
|
||||||
bash scripts/hetzner/destroy.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### If Terraform succeeds but Kubernetes is not ready
|
|
||||||
Check the public API and cluster state from the workstation:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
|
|
||||||
kubectl get nodes -o wide
|
|
||||||
kubectl get pods -A
|
|
||||||
kubectl get events -A --sort-by=.lastTimestamp | tail -n 50
|
|
||||||
```
|
|
||||||
|
|
||||||
Typical next checks:
|
|
||||||
- cloud-init may still be finishing
|
|
||||||
- k3s may still be starting
|
|
||||||
- a workload may be crash-looping due to missing secret values or image-delivery issues
|
|
||||||
|
|
||||||
### If workloads do not roll out
|
|
||||||
Inspect the affected namespace:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl -n unrip get pods
|
|
||||||
kubectl -n unrip describe pod <pod-name>
|
|
||||||
kubectl -n unrip logs deploy/dummy-executor --tail=100
|
|
||||||
kubectl -n forgejo logs deploy/forgejo --tail=100
|
|
||||||
```
|
|
||||||
|
|
||||||
### If you need to recreate secrets
|
|
||||||
The workstation bootstrap creates these Secrets:
|
|
||||||
- `unrip/unrip-secrets`
|
|
||||||
- `forgejo/forgejo-secrets`
|
|
||||||
|
|
||||||
Verify them:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl -n unrip get secret unrip-secrets
|
|
||||||
kubectl -n forgejo get secret forgejo-secrets
|
|
||||||
```
|
|
||||||
|
|
||||||
### Current known limitations
|
|
||||||
Current colony state already identified an important gap:
|
|
||||||
- bootstrap and CI are not yet fully production-hardened, even though the first deploy path now fetches the real kubeconfig and imports the bootstrap image directly into k3s
|
|
||||||
|
|
||||||
Treat the current bootstrap as a repo-driven first-deploy path suitable for testing, with hardening still pending.
|
|
||||||
|
|
||||||
## Self-hosted CI handoff
|
|
||||||
|
|
||||||
After cluster bootstrap:
|
|
||||||
- open Forgejo at `https://${FORGEJO_DOMAIN}`
|
|
||||||
- seed or push this repo into Forgejo
|
|
||||||
- create Forgejo repository secrets:
|
|
||||||
- `KUBECONFIG_B64`
|
|
||||||
- `REGISTRY_USERNAME`
|
|
||||||
- `REGISTRY_PASSWORD`
|
|
||||||
- create Forgejo repository variables:
|
|
||||||
- `REGISTRY_HOST=${REGISTRY_DOMAIN}`
|
|
||||||
- optional: `PROJECT_NAME=unrip`
|
|
||||||
- optional: `PROJECT_NAMESPACE=unrip`
|
|
||||||
- optional: `PROJECT_DEPLOYMENTS=near-intents-ingest,dummy-reactor,dummy-executor,dummy-consumer`
|
|
||||||
- push to `main`
|
|
||||||
|
|
||||||
Routine application deploys then follow `.forgejo/workflows/deploy.yml`:
|
|
||||||
- build image as `REGISTRY_HOST/PROJECT_NAME:${GIT_SHA}`
|
|
||||||
- push to the private registry
|
|
||||||
- `kubectl set image` for each deployment listed in `PROJECT_DEPLOYMENTS` inside `PROJECT_NAMESPACE`
|
|
||||||
- wait for rollout
|
|
||||||
|
|
||||||
If project variables are omitted, the workflow defaults to the current repo project:
|
|
||||||
- `PROJECT_NAME=unrip`
|
|
||||||
- `PROJECT_NAMESPACE=unrip`
|
|
||||||
- `PROJECT_DEPLOYMENTS=near-intents-ingest,dummy-reactor,dummy-executor,dummy-consumer`
|
|
||||||
|
|
||||||
Infrastructure changes remain Terraform-driven from the operator workstation unless and until that responsibility is also automated.
|
|
||||||
|
|
||||||
For the detailed operator runbooks, see:
|
|
||||||
- `docs/hetzner-k3s-bootstrap.md`
|
|
||||||
- `docs/hetzner-self-hosted-ci-runbook.md`
|
|
||||||
- `deploy/k8s/projects/README.md`
|
|
||||||
- `docs/next-session-architecture.md`
|
|
||||||
|
|
||||||
## Local development with Compose
|
## Local development with Compose
|
||||||
|
|
||||||
Compose remains available for local development and debugging.
|
Compose remains available for local development and debugging.
|
||||||
|
|
@ -365,4 +199,4 @@ KAFKA_TOPIC_EXEC_TRADE_RESULT=exec.trade_result
|
||||||
KAFKA_CONSUMER_GROUP_DUMMY=dummy-reactor-v1
|
KAFKA_CONSUMER_GROUP_DUMMY=dummy-reactor-v1
|
||||||
KAFKA_CONSUMER_GROUP_EXECUTOR=dummy-executor-v1
|
KAFKA_CONSUMER_GROUP_EXECUTOR=dummy-executor-v1
|
||||||
EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state
|
EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state
|
||||||
```
|
```
|
||||||
|
|
|
||||||
|
|
@ -1,275 +1,105 @@
|
||||||
# Hetzner single-node bootstrap (Terraform + cloud-init + k3s)
|
# Hetzner single-node bootstrap
|
||||||
|
|
||||||
This is the canonical first-production deployment path for the repo.
|
This repo’s canonical infrastructure path is:
|
||||||
|
|
||||||
A local operator workstation drives the first deployment end to end:
|
1. provision one Hetzner VM with Terraform
|
||||||
- Terraform provisions Hetzner infrastructure
|
2. let cloud-init install k3s (and optionally Tailscale)
|
||||||
- cloud-init installs k3s automatically on first boot
|
3. run `scripts/hetzner/bootstrap.sh` from the operator workstation
|
||||||
- the workstation waits for the public Kubernetes API
|
4. apply repo-managed platform + project manifests
|
||||||
- the workstation creates initial Kubernetes Secrets
|
5. bootstrap Forgejo, the runner, repo secrets/variables, and the first CI-driven deploy
|
||||||
- the workstation applies repo-managed Kubernetes manifests
|
|
||||||
- the workstation performs the first image/bootstrap delivery attempt
|
|
||||||
- once Forgejo + runner are alive, routine app deploys are intended to move to self-hosted CI
|
|
||||||
|
|
||||||
Compose remains available for local development, but it is not the primary production deployment model.
|
## Source of truth
|
||||||
|
|
||||||
## Scope of this layer
|
Use these docs first:
|
||||||
|
|
||||||
The foundation under `infra/terraform/hetzner` provisions:
|
- `docs/hetzner-k3s-bootstrap.md` — bootstrap + destroy + required env
|
||||||
- one Hetzner Cloud server
|
- `docs/hetzner-self-hosted-ci-runbook.md` — Forgejo/runner/CI flow
|
||||||
- one SSH key resource based on your local public key
|
- `docs/k8s-observability.md` — Grafana, Loki, Promtail, Headlamp
|
||||||
- firewall rules for SSH, Kubernetes API, and HTTP/HTTPS ingress
|
- `deploy/k8s/README.md` — Kubernetes layout
|
||||||
- a private network attachment for future growth
|
- `deploy/k8s/overlays/hetzner-single-node/README.md` — overlay details
|
||||||
- cloud-init user-data for unattended k3s installation and host preparation
|
|
||||||
|
|
||||||
The repo bootstrap then applies the Hetzner single-node overlay under `deploy/k8s/overlays/hetzner-single-node`, which composes Kubernetes resources under `deploy/k8s/` for:
|
## Current architecture
|
||||||
- shared platform namespaces and services
|
|
||||||
- Redpanda
|
Infrastructure under `infra/terraform/hetzner/` provisions:
|
||||||
- unrip workloads
|
- one Hetzner VM
|
||||||
|
- one firewall
|
||||||
|
- one private network attachment
|
||||||
|
- cloud-init for unattended k3s install
|
||||||
|
|
||||||
|
Kubernetes platform services deployed from this repo:
|
||||||
- Forgejo
|
- Forgejo
|
||||||
- Forgejo runner
|
- Forgejo runner
|
||||||
- private registry
|
- private registry
|
||||||
- ingress/TLS-related resources
|
- cert-manager
|
||||||
- Redpanda topic bootstrap job
|
- Traefik via k3s bundled ingress controller
|
||||||
|
- Grafana
|
||||||
|
- Loki
|
||||||
|
- Promtail
|
||||||
|
- Headlamp
|
||||||
|
|
||||||
## Prerequisites
|
Project services deployed from this repo:
|
||||||
|
- Redpanda
|
||||||
|
- `near-intents-ingest`
|
||||||
|
- `dummy-reactor`
|
||||||
|
- `dummy-executor`
|
||||||
|
- `dummy-consumer`
|
||||||
|
|
||||||
Install on the operator workstation:
|
## Bootstrap model
|
||||||
- Terraform `>= 1.6`
|
|
||||||
- `kubectl`
|
|
||||||
- `docker`
|
|
||||||
- `curl`
|
|
||||||
|
|
||||||
You also need:
|
The current bootstrap is workstation-driven after Terraform.
|
||||||
- a Hetzner Cloud API token
|
cloud-init does **not** clone this repo onto the node.
|
||||||
- an SSH keypair already present locally
|
|
||||||
- access to DNS for your chosen domains
|
|
||||||
- admin CIDRs that can reach the future server on `22/tcp` and `6443/tcp`
|
|
||||||
- this repo checked out locally
|
|
||||||
|
|
||||||
## Required bootstrap secrets and inputs
|
`scripts/hetzner/bootstrap.sh` now:
|
||||||
|
- loads config and secrets from `scripts/hetzner/bootstrap-secrets.env`
|
||||||
|
- resolves `*_PASS` values through `pass`
|
||||||
|
- runs Terraform
|
||||||
|
- configures DNS through Cloudflare or Porkbun when credentials are present
|
||||||
|
- fetches kubeconfig from the node
|
||||||
|
- renders `.state/hetzner/generated-overlay/`
|
||||||
|
- applies platform + project manifests
|
||||||
|
- bootstraps Forgejo admin/user/repo/runner state
|
||||||
|
- seeds the repo into Forgejo
|
||||||
|
- lets Forgejo Actions perform the routine image build + deploy path by default
|
||||||
|
|
||||||
Prepare the operator env file:
|
Legacy local-image bootstrap still exists, but the default/steady-state path is Forgejo Actions.
|
||||||
|
|
||||||
|
## Required operator inputs
|
||||||
|
|
||||||
|
Create and source:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env
|
cp scripts/hetzner/bootstrap-secrets.env.example scripts/hetzner/bootstrap-secrets.env
|
||||||
${EDITOR:-vi} scripts/hetzner/bootstrap-secrets.env
|
|
||||||
```
|
|
||||||
|
|
||||||
Set at least:
|
|
||||||
- `HCLOUD_TOKEN`
|
|
||||||
- `SSH_PUBLIC_KEY_PATH`
|
|
||||||
- `TF_ADMIN_CIDR_BLOCKS`
|
|
||||||
- `BASE_DOMAIN`
|
|
||||||
- `FORGEJO_DOMAIN`
|
|
||||||
- `FORGEJO_ROOT_URL`
|
|
||||||
- `NEAR_INTENTS_API_KEY`
|
|
||||||
- `FORGEJO_RUNNER_REGISTRATION_TOKEN`
|
|
||||||
|
|
||||||
Load it into the current shell:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
source scripts/hetzner/bootstrap-secrets.env
|
source scripts/hetzner/bootstrap-secrets.env
|
||||||
```
|
```
|
||||||
|
|
||||||
## Canonical bootstrap sequence
|
At minimum you need:
|
||||||
|
- Hetzner credentials
|
||||||
|
- SSH public key path
|
||||||
|
- public domain settings
|
||||||
|
- registry credentials
|
||||||
|
- app secret(s)
|
||||||
|
- Forgejo admin credentials
|
||||||
|
- Grafana admin credentials
|
||||||
|
|
||||||
Run from repo root:
|
Recommended:
|
||||||
|
- Tailscale auth key for private admin/control-plane access
|
||||||
|
- DNS provider credentials
|
||||||
|
- `pass`-backed secret refs instead of raw env values
|
||||||
|
|
||||||
```bash
|
## Current live/public surfaces
|
||||||
bash scripts/hetzner/bootstrap.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
Current behavior of the script:
|
- Forgejo: `https://git.doran.133011.xyz/`
|
||||||
1. validates local tooling
|
- Registry: `https://registry.doran.133011.xyz/`
|
||||||
2. runs `terraform init` and `terraform apply` in `infra/terraform/hetzner`
|
- Grafana: `https://grafana.doran.133011.xyz/`
|
||||||
3. reads Terraform outputs such as server IP and `k3s_api_url`
|
- Headlamp: `https://headlamp.doran.133011.xyz/`
|
||||||
4. waits for the k3s API readiness endpoint
|
|
||||||
5. writes a local workstation kubeconfig to `.state/hetzner/kubeconfig.yaml`
|
|
||||||
6. writes overlay secret env input files and creates:
|
|
||||||
- `unrip/unrip-secrets`
|
|
||||||
- `unrip/unrip-registry-creds`
|
|
||||||
- `forgejo/forgejo-secrets`
|
|
||||||
- `registry/registry-secrets`
|
|
||||||
7. applies `deploy/k8s/platform/base/namespace.yaml` and `deploy/k8s/overlays/hetzner-single-node`
|
|
||||||
8. builds the repo bootstrap image locally
|
|
||||||
9. pushes it through the temporary local registry bridge using the active project name
|
|
||||||
10. updates and waits for rollout status in the active project namespace
|
|
||||||
|
|
||||||
After the script finishes:
|
## Notes
|
||||||
|
|
||||||
```bash
|
- The Forgejo runner no longer reads a pre-seeded `runner_registration_token` from a secret. Bootstrap generates a one-time token in-cluster and persists the runner config on the Forgejo PVC.
|
||||||
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
|
- Registry auth is created imperatively during bootstrap from `REGISTRY_USERNAME` and `REGISTRY_PASSWORD`; manual overlay applies must provide `registry.htpasswd` themselves.
|
||||||
kubectl get nodes -o wide
|
- Headlamp login uses a generated Kubernetes service-account token; bootstrap stores it in `pass` when `HEADLAMP_ADMIN_TOKEN_PASS` is configured.
|
||||||
kubectl get pods -A
|
- Ingress is Traefik-based. The old `ingress-nginx` path is obsolete.
|
||||||
kubectl -n unrip get deploy,pods,jobs
|
|
||||||
kubectl -n forgejo get deploy,pods,svc
|
|
||||||
kubectl -n registry get pods,svc
|
|
||||||
```
|
|
||||||
|
|
||||||
## Current manifest target
|
## Status
|
||||||
|
|
||||||
Important current-state detail:
|
This path has been rebuilt successfully and the cluster is operational, but if you want the strongest reproducibility guarantee after any new platform addition, do one more full destroy/rebuild rehearsal.
|
||||||
- `scripts/hetzner/bootstrap.sh` now applies `deploy/k8s/platform/base/namespace.yaml`
|
|
||||||
- it then applies `deploy/k8s/overlays/hetzner-single-node`
|
|
||||||
- bootstrap naming no longer assumes legacy `trading-system` kubeconfig contexts, image tags, or rollout namespaces
|
|
||||||
|
|
||||||
## Executor persistence in k3s
|
|
||||||
|
|
||||||
The dummy executor persists durable idempotency state.
|
|
||||||
|
|
||||||
Current persistence model:
|
|
||||||
- application path: `EXECUTOR_STATE_DIR=/var/lib/unrip/executor-state`
|
|
||||||
- cloud-init prepares the host boundary for executor storage on first boot
|
|
||||||
- Kubernetes mounts storage at that same path for the executor workload
|
|
||||||
- the Hetzner single-node overlay pins PVC-backed storage to k3s `local-path`
|
|
||||||
|
|
||||||
Operational consequence:
|
|
||||||
- executor duplicate-suppression state lives on node-backed persistent storage
|
|
||||||
- replacing the node or deleting the PVC without migration loses that history
|
|
||||||
- treat executor state as required operational data, even though the executor is still a dummy implementation
|
|
||||||
|
|
||||||
## Failure recovery runbook
|
|
||||||
|
|
||||||
### A. Bootstrap fails before infrastructure exists
|
|
||||||
Typical causes:
|
|
||||||
- invalid `HCLOUD_TOKEN`
|
|
||||||
- wrong `SSH_PUBLIC_KEY_PATH`
|
|
||||||
- malformed `TF_ADMIN_CIDR_BLOCKS`
|
|
||||||
|
|
||||||
Fix the input and rerun:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
source scripts/hetzner/bootstrap-secrets.env
|
|
||||||
bash scripts/hetzner/bootstrap.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
If you need to destroy partially created infrastructure:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
source scripts/hetzner/bootstrap-secrets.env
|
|
||||||
bash scripts/hetzner/destroy.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### B. Terraform succeeds but cluster access is not usable
|
|
||||||
Verify the generated kubeconfig and cluster health:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
|
|
||||||
kubectl get nodes -o wide
|
|
||||||
kubectl get pods -A
|
|
||||||
kubectl get events -A --sort-by=.lastTimestamp | tail -n 50
|
|
||||||
```
|
|
||||||
|
|
||||||
What to suspect first:
|
|
||||||
- cloud-init still running
|
|
||||||
- k3s still starting
|
|
||||||
- bootstrap kubeconfig/auth not fully aligned yet
|
|
||||||
- public API reachable, but workloads not yet healthy
|
|
||||||
|
|
||||||
### C. Secrets were wrong or missing
|
|
||||||
The current bootstrap depends on:
|
|
||||||
- `${PROJECT_NAME:-unrip}-secrets`
|
|
||||||
- `NEAR_INTENTS_API_KEY`
|
|
||||||
- `forgejo-secrets`
|
|
||||||
- `root_url`
|
|
||||||
- `domain`
|
|
||||||
- `runner_registration_token`
|
|
||||||
|
|
||||||
Verify:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl -n unrip get secret unrip-secrets
|
|
||||||
kubectl -n unrip get secret unrip-registry-creds
|
|
||||||
kubectl -n forgejo get secret forgejo-secrets
|
|
||||||
kubectl -n registry get secret registry-secrets
|
|
||||||
```
|
|
||||||
|
|
||||||
If needed, recreate them from the workstation before restarting the affected deployments.
|
|
||||||
|
|
||||||
### D. Workloads are present but not healthy
|
|
||||||
Inspect by namespace:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl -n unrip get pods
|
|
||||||
kubectl -n unrip describe pod <pod-name>
|
|
||||||
kubectl -n unrip logs deploy/dummy-executor --tail=100
|
|
||||||
kubectl -n forgejo logs deploy/forgejo --tail=100
|
|
||||||
kubectl -n forgejo logs deploy/forgejo-runner --tail=100
|
|
||||||
```
|
|
||||||
|
|
||||||
Useful rollout checks:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl -n unrip rollout status deployment/near-intents-ingest --timeout=300s
|
|
||||||
kubectl -n unrip rollout status deployment/dummy-reactor --timeout=300s
|
|
||||||
kubectl -n unrip rollout status deployment/dummy-executor --timeout=300s
|
|
||||||
kubectl -n unrip rollout status deployment/dummy-consumer --timeout=300s
|
|
||||||
kubectl -n forgejo rollout status deployment/forgejo --timeout=300s
|
|
||||||
kubectl -n forgejo rollout status deployment/forgejo-runner --timeout=300s
|
|
||||||
```
|
|
||||||
|
|
||||||
### E. Need to inspect Terraform outputs directly
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd infra/terraform/hetzner
|
|
||||||
terraform output
|
|
||||||
terraform output server_ipv4
|
|
||||||
terraform output server_private_ipv4
|
|
||||||
terraform output k3s_api_url
|
|
||||||
terraform output kubeconfig_strategy
|
|
||||||
```
|
|
||||||
|
|
||||||
## Self-hosted CI handoff
|
|
||||||
|
|
||||||
After the cluster is reachable and workloads are up:
|
|
||||||
1. reach Forgejo at the configured domain or by port-forward
|
|
||||||
2. perform the initial admin/bootstrap steps in Forgejo
|
|
||||||
3. create the target repository in Forgejo
|
|
||||||
4. push or mirror this repo into that Forgejo instance
|
|
||||||
5. confirm the runner is registered and healthy
|
|
||||||
6. move routine application deploys to the self-hosted pipeline, which now derives image naming and rollout targets from Forgejo repository variables instead of hard-coding the legacy project
|
|
||||||
|
|
||||||
Current repo-state caveats already known:
|
|
||||||
- first bootstrap is repo-driven from the workstation
|
|
||||||
- the bootstrap path no longer relies on SSH/scp transport in control flow
|
|
||||||
- the kubeconfig/auth result is not yet fully production-hardened
|
|
||||||
- first rollout still uses a temporary local registry bridge; routine CI deploys are intended to be registry-native and the Forgejo workflow now defaults to `unrip` while allowing per-repo overrides for image name, namespace, and deployment list
|
|
||||||
- Forgejo admin creation, repo creation, and Actions configuration still require operator action after cluster bring-up
|
|
||||||
- DNS automation is currently wired for Cloudflare when credentials are supplied during bootstrap
|
|
||||||
- TLS is expected to come from cert-manager + Let's Encrypt once ingress hostnames resolve publicly
|
|
||||||
|
|
||||||
## Terraform-only usage
|
|
||||||
|
|
||||||
If you only want the infra layer:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd infra/terraform/hetzner
|
|
||||||
export TF_VAR_hcloud_token="<your-hetzner-token>"
|
|
||||||
export TF_VAR_ssh_public_key="$(cat ~/.ssh/id_ed25519.pub)"
|
|
||||||
export TF_VAR_admin_cidr_blocks='["203.0.113.10/32"]'
|
|
||||||
|
|
||||||
terraform init
|
|
||||||
terraform apply
|
|
||||||
```
|
|
||||||
|
|
||||||
Useful outputs:
|
|
||||||
- `server_ipv4`
|
|
||||||
- `server_private_ipv4`
|
|
||||||
- `server_name`
|
|
||||||
- `server_fqdn`
|
|
||||||
- `k3s_api_url`
|
|
||||||
- `kubeconfig_strategy`
|
|
||||||
|
|
||||||
For CI/CD details, also see:
|
|
||||||
- `docs/hetzner-k3s-bootstrap.md`
|
|
||||||
- `docs/hetzner-self-hosted-ci-runbook.md`
|
|
||||||
|
|
||||||
## Compose status
|
|
||||||
|
|
||||||
Compose is still useful for:
|
|
||||||
- local development
|
|
||||||
- fast topology debugging
|
|
||||||
- non-production single-machine testing
|
|
||||||
|
|
||||||
But it should be treated as optional/dev runtime support, not as the primary production deployment path.
|
|
||||||
|
|
|
||||||
|
|
@ -4,7 +4,6 @@ package_upgrade: true
|
||||||
packages:
|
packages:
|
||||||
- ca-certificates
|
- ca-certificates
|
||||||
- curl
|
- curl
|
||||||
- git
|
|
||||||
- gnupg
|
- gnupg
|
||||||
- jq
|
- jq
|
||||||
- nfs-common
|
- nfs-common
|
||||||
|
|
@ -58,17 +57,11 @@ write_files:
|
||||||
BOOTSTRAP_PROJECT_NAME=unrip
|
BOOTSTRAP_PROJECT_NAME=unrip
|
||||||
BOOTSTRAP_PROJECT_NAMESPACE=unrip
|
BOOTSTRAP_PROJECT_NAMESPACE=unrip
|
||||||
K3S_KUBECONFIG=/opt/bootstrap/kubeconfig-internal.yaml
|
K3S_KUBECONFIG=/opt/bootstrap/kubeconfig-internal.yaml
|
||||||
BOOTSTRAP_REPO_DIR=/opt/unrip/repo
|
BOOTSTRAP_MANIFEST_SOURCE=operator-workstation
|
||||||
BOOTSTRAP_MANIFEST_DIR=/opt/unrip/repo/deploy/k8s
|
|
||||||
GITOPS_HANDOFF=seed-self-hosted-git-and-runner
|
GITOPS_HANDOFF=seed-self-hosted-git-and-runner
|
||||||
EOF
|
EOF
|
||||||
chmod 0644 /usr/local/share/unrip/bootstrap-metadata.env
|
chmod 0644 /usr/local/share/unrip/bootstrap-metadata.env
|
||||||
|
|
||||||
install -d -m 0755 /opt/unrip
|
|
||||||
if [ ! -d /opt/unrip/repo/.git ]; then
|
|
||||||
git clone --depth 1 ${BOOTSTRAP_REPO_URL:-https://example.invalid/bootstrap-repo.git} /opt/unrip/repo || true
|
|
||||||
fi
|
|
||||||
|
|
||||||
install -d -m 0755 /opt/bootstrap
|
install -d -m 0755 /opt/bootstrap
|
||||||
cp /etc/rancher/k3s/k3s.yaml /opt/bootstrap/kubeconfig-internal.yaml
|
cp /etc/rancher/k3s/k3s.yaml /opt/bootstrap/kubeconfig-internal.yaml
|
||||||
chmod 0640 /opt/bootstrap/kubeconfig-internal.yaml
|
chmod 0640 /opt/bootstrap/kubeconfig-internal.yaml
|
||||||
|
|
@ -79,7 +72,7 @@ write_files:
|
||||||
This node was provisioned by Terraform + cloud-init.
|
This node was provisioned by Terraform + cloud-init.
|
||||||
Use /opt/bootstrap/kubeconfig-internal.yaml for automation.
|
Use /opt/bootstrap/kubeconfig-internal.yaml for automation.
|
||||||
Bootstrap metadata lives at /usr/local/share/unrip/bootstrap-metadata.env.
|
Bootstrap metadata lives at /usr/local/share/unrip/bootstrap-metadata.env.
|
||||||
Future Kubernetes bootstrap assets should live under /opt/unrip/repo/deploy/k8s.
|
Kubernetes bootstrap assets are applied from the operator workstation after provisioning.
|
||||||
EOF
|
EOF
|
||||||
chmod 0644 /opt/bootstrap/README.txt
|
chmod 0644 /opt/bootstrap/README.txt
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -13,9 +13,10 @@ Shared platform namespaces:
|
||||||
- `forgejo`
|
- `forgejo`
|
||||||
- `registry`
|
- `registry`
|
||||||
- `observability` (`grafana`, `loki`, `promtail`, `headlamp`)
|
- `observability` (`grafana`, `loki`, `promtail`, `headlamp`)
|
||||||
- `ingress-nginx`
|
|
||||||
- `cert-manager`
|
- `cert-manager`
|
||||||
|
|
||||||
|
Ingress is provided by the Traefik controller bundled with k3s. Base and overlay manifests therefore target `ingressClassName: traefik` instead of installing ingress-nginx.
|
||||||
|
|
||||||
Project-specific namespaces:
|
Project-specific namespaces:
|
||||||
- `unrip`
|
- `unrip`
|
||||||
- future projects should get their own namespace instead of sharing `unrip`
|
- future projects should get their own namespace instead of sharing `unrip`
|
||||||
|
|
@ -27,7 +28,9 @@ After Terraform/cloud-init has produced a working kubeconfig, the canonical path
|
||||||
bash scripts/hetzner/bootstrap.sh
|
bash scripts/hetzner/bootstrap.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
That script renders the Hetzner overlay inputs, creates platform and project registry auth secrets using the active project naming, and applies:
|
That script renders the Hetzner overlay inputs, creates platform and project registry auth secrets using the active project naming, and applies the generated bootstrap overlay under `.state/hetzner/generated-overlay/`.
|
||||||
|
|
||||||
|
For a manual, fully checked-in apply path, use:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
kubectl apply -k deploy/k8s/overlays/hetzner-single-node
|
kubectl apply -k deploy/k8s/overlays/hetzner-single-node
|
||||||
|
|
@ -41,4 +44,4 @@ The overlay intentionally references generated or pre-created Secrets instead of
|
||||||
- `observability/observability-secrets`
|
- `observability/observability-secrets`
|
||||||
- `registry/registry-secrets`
|
- `registry/registry-secrets`
|
||||||
|
|
||||||
The bootstrap script creates them from local environment variables. By default it targets the `unrip` project, but its kubeconfig context name, bootstrap image tag, project secret env filename, project namespace, and project registry secret name are derived from `PROJECT_NAME`, `PROJECT_NAMESPACE`, and `CLUSTER_NAME` instead of hard-coding legacy `trading-system` values.
|
The bootstrap script creates them from local environment variables and `pass`-resolved secrets. By default it targets the `unrip` project, but project secret env filenames, namespaces, image names, rollout targets, and registry pull-secret names are derived from `PROJECT_NAME` and `PROJECT_NAMESPACE` instead of hard-coding legacy `trading-system` values.
|
||||||
|
|
|
||||||
|
|
@ -2,34 +2,106 @@
|
||||||
|
|
||||||
This overlay turns the shared platform and `unrip` project bases into a concrete first-node bootstrap target for the Terraform-provisioned k3s VM.
|
This overlay turns the shared platform and `unrip` project bases into a concrete first-node bootstrap target for the Terraform-provisioned k3s VM.
|
||||||
|
|
||||||
## Before apply
|
The checked-in overlay is the declarative template. For first-cluster bootstrap, `scripts/hetzner/bootstrap.sh` renders a generated overlay under `.state/hetzner/generated-overlay/` and applies that generated copy as the source of truth for the run.
|
||||||
Create real secret material from the examples:
|
|
||||||
|
## Two ways to use this overlay
|
||||||
|
|
||||||
|
### 1. Recommended: `scripts/hetzner/bootstrap.sh`
|
||||||
|
This is the intended operator workflow for a fresh Hetzner cluster. The bootstrap script renders secret and patch inputs from local env and `pass`, creates imperative registry secrets, and applies a generated Kustomize overlay.
|
||||||
|
|
||||||
|
That generated overlay now imports the platform resources from `deploy/k8s/platform/base/kustomization.yaml`, so new checked-in platform components such as observability manifests are included automatically during bootstrap instead of being silently skipped by a hard-coded file list.
|
||||||
|
|
||||||
|
Bootstrap overwrites these operator-worktree files on each run:
|
||||||
|
- `deploy/k8s/overlays/hetzner-single-node/secrets/unrip.env`
|
||||||
|
- `deploy/k8s/overlays/hetzner-single-node/secrets/forgejo.env`
|
||||||
|
- `deploy/k8s/overlays/hetzner-single-node/secrets/observability.env`
|
||||||
|
|
||||||
|
Bootstrap also renders and applies generated copies of these patch files under `.state/hetzner/generated-overlay/` instead of modifying the checked-in overlay files directly:
|
||||||
|
- `ingress-hosts.patch.yaml`
|
||||||
|
- `issuer-email.patch.yaml`
|
||||||
|
- `storage-class.patch.yaml`
|
||||||
|
|
||||||
|
Secret/config sources when using bootstrap:
|
||||||
|
- from `pass` or direct env overrides via `scripts/hetzner/bootstrap-secrets.env`:
|
||||||
|
- `HCLOUD_TOKEN`
|
||||||
|
- `TAILSCALE_AUTH_KEY`
|
||||||
|
- `CLOUDFLARE_API_TOKEN`
|
||||||
|
- `CLOUDFLARE_ZONE_ID`
|
||||||
|
- `PORKBUN_API_KEY`
|
||||||
|
- `PORKBUN_SECRET_API_KEY`
|
||||||
|
- `REGISTRY_PASSWORD`
|
||||||
|
- `NEAR_INTENTS_API_KEY`
|
||||||
|
- `FORGEJO_ADMIN_PASSWORD`
|
||||||
|
- optional `GRAFANA_ADMIN_PASSWORD` (bootstrap generates one if omitted)
|
||||||
|
- from plain env/non-secret config in `scripts/hetzner/bootstrap-secrets.env`:
|
||||||
|
- `PUBLIC_DOMAIN`, `BASE_DOMAIN`, `FORGEJO_DOMAIN`, `FORGEJO_ROOT_URL`, `REGISTRY_DOMAIN`, `GRAFANA_DOMAIN`, `GRAFANA_ROOT_URL`, `HEADLAMP_DOMAIN`
|
||||||
|
- default hostname model under `PUBLIC_DOMAIN`: `git.${PUBLIC_DOMAIN}`, `registry.${PUBLIC_DOMAIN}`, `grafana.${PUBLIC_DOMAIN}`, `headlamp.${PUBLIC_DOMAIN}`
|
||||||
|
- `LETSENCRYPT_EMAIL`
|
||||||
|
- `REGISTRY_USERNAME`
|
||||||
|
- `FORGEJO_ADMIN_USERNAME`, `FORGEJO_ADMIN_EMAIL`
|
||||||
|
- optional `GRAFANA_ADMIN_USERNAME` (defaults to `admin`)
|
||||||
|
- optional project overrides such as `PROJECT_NAME`, `PROJECT_NAMESPACE`, and `PROJECT_SECRET_ENV_BASENAME`
|
||||||
|
|
||||||
|
Bootstrap materializes Kubernetes inputs like this:
|
||||||
|
- `secrets/unrip.env` gets `NEAR_INTENTS_API_KEY`
|
||||||
|
- `secrets/forgejo.env` gets only `root_url` and `domain`
|
||||||
|
- `secrets/observability.env` gets `grafana_admin_user`, `grafana_admin_password`, and `grafana_root_url`
|
||||||
|
- generated overlay Kustomize secret generators create `observability-secrets` in namespace `observability` alongside the project and Forgejo secrets
|
||||||
|
- `registry-secrets` in namespace `registry` is created imperatively from `REGISTRY_USERNAME` and `REGISTRY_PASSWORD`
|
||||||
|
- `<project>-registry-creds` image pull secret is created imperatively in the project namespace from the same registry credentials
|
||||||
|
|
||||||
|
Note: the Forgejo runner no longer reads `runner_registration_token` from `forgejo-secrets`. `scripts/hetzner/bootstrap.sh` generates a one-time runner token in-cluster, registers the runner, and writes `/data/forgejo-runner/.runner` on the shared Forgejo PVC before restarting the runner deployment.
|
||||||
|
|
||||||
|
### 2. Manual: `kubectl apply -k`
|
||||||
|
Use this only if you intentionally want to manage the checked-in overlay inputs yourself. In manual mode, the checked-in overlay remains the source of truth; in bootstrap mode, the generated overlay is the source of truth for what gets applied.
|
||||||
|
|
||||||
|
Before apply, create or edit real local input files:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cp deploy/k8s/overlays/hetzner-single-node/secrets/unrip.env.example deploy/k8s/overlays/hetzner-single-node/secrets/unrip.env
|
cp deploy/k8s/overlays/hetzner-single-node/secrets/unrip.env.example deploy/k8s/overlays/hetzner-single-node/secrets/unrip.env
|
||||||
cp deploy/k8s/overlays/hetzner-single-node/secrets/forgejo.env.example deploy/k8s/overlays/hetzner-single-node/secrets/forgejo.env
|
cp deploy/k8s/overlays/hetzner-single-node/secrets/forgejo.env.example deploy/k8s/overlays/hetzner-single-node/secrets/forgejo.env
|
||||||
|
cp deploy/k8s/overlays/hetzner-single-node/secrets/observability.env.example deploy/k8s/overlays/hetzner-single-node/secrets/observability.env
|
||||||
cp deploy/k8s/overlays/hetzner-single-node/secrets/registry.htpasswd.example deploy/k8s/overlays/hetzner-single-node/secrets/registry.htpasswd
|
cp deploy/k8s/overlays/hetzner-single-node/secrets/registry.htpasswd.example deploy/k8s/overlays/hetzner-single-node/secrets/registry.htpasswd
|
||||||
```
|
```
|
||||||
|
|
||||||
Update:
|
Then update:
|
||||||
- ingress hosts in `ingress-hosts.patch.yaml`
|
- ingress hosts in `ingress-hosts.patch.yaml` for Forgejo, Registry, Grafana, and Headlamp
|
||||||
- ACME email in `issuer-email.patch.yaml`
|
- ACME email in `issuer-email.patch.yaml`
|
||||||
- project secret values in `secrets/unrip.env`
|
- project secret values in `secrets/unrip.env`
|
||||||
- Forgejo secret values in `secrets/forgejo.env`
|
- Forgejo secret values in `secrets/forgejo.env` (`root_url` and `domain` only)
|
||||||
- registry htpasswd in `secrets/registry.htpasswd`
|
- observability secret values in `secrets/observability.env` (`grafana_admin_user`, `grafana_admin_password`, `grafana_root_url`)
|
||||||
|
|
||||||
|
Important manual-mode caveat:
|
||||||
|
- `kubectl apply -k deploy/k8s/overlays/hetzner-single-node` creates only the Kustomize-managed secrets from the checked-in files (`unrip-secrets`, `forgejo-secrets`, `observability-secrets`, and `registry-secrets` when `secrets/registry.htpasswd` exists)
|
||||||
|
- it does **not** create the project docker-registry pull secret
|
||||||
|
- if you skip `scripts/hetzner/bootstrap.sh`, you must create that pull secret separately before expecting image pulls or CI builds to work
|
||||||
|
|
||||||
## Apply
|
## Apply
|
||||||
|
Bootstrap path:
|
||||||
|
```bash
|
||||||
|
bash scripts/hetzner/bootstrap.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Manual path:
|
||||||
```bash
|
```bash
|
||||||
kubectl apply -k deploy/k8s/overlays/hetzner-single-node
|
kubectl apply -k deploy/k8s/overlays/hetzner-single-node
|
||||||
```
|
```
|
||||||
|
|
||||||
## What gets installed
|
## What gets installed
|
||||||
- shared platform namespaces for registry, ingress, cert-manager, and Forgejo
|
- shared platform namespaces for registry, ingress, cert-manager, Forgejo, and observability
|
||||||
- project namespace `unrip`
|
- project namespace `unrip`
|
||||||
- Redpanda plus a topic bootstrap job inside `unrip`
|
- Redpanda plus a topic bootstrap job inside `unrip`
|
||||||
- app worker deployments referencing `unrip-secrets`
|
- app worker deployments referencing `unrip-secrets`
|
||||||
- Forgejo and Forgejo runner referencing `forgejo-secrets`
|
- Forgejo and Forgejo runner referencing `forgejo-secrets`
|
||||||
- private registry protected by htpasswd from `registry-secrets`
|
- private registry workload, which still requires the imperative `registry-secrets` auth secret to be created separately unless you used `scripts/hetzner/bootstrap.sh`
|
||||||
- nginx ingress and ACME issuers for TLS
|
- nginx ingress and ACME issuers for TLS
|
||||||
|
- observability ingress for Grafana and Headlamp, plus local-path PVC overrides for Grafana and Loki
|
||||||
|
|
||||||
|
## Observability UI exposure policy
|
||||||
|
- Grafana and Headlamp are both wired into the Hetzner ingress/domain model.
|
||||||
|
- Use `grafana.${PUBLIC_DOMAIN}` / `headlamp.${PUBLIC_DOMAIN}` or explicit `GRAFANA_DOMAIN` / `HEADLAMP_DOMAIN` values.
|
||||||
|
- Grafana is the historical log search UI backed by Loki.
|
||||||
|
- Headlamp is the Kubernetes cluster UI for workloads, events, and pod logs.
|
||||||
|
- Grafana is authenticated through `observability-secrets`; Headlamp is authenticated with the generated Kubernetes service-account token that bootstrap stores in `pass` when `HEADLAMP_ADMIN_TOKEN_PASS` is configured.
|
||||||
|
|
||||||
For future projects, do not reuse `unrip`; create a new project namespace and matching `<project>-config`, `<project>-secrets`, and `<project>-registry-creds` resources.
|
For future projects, do not reuse `unrip`; create a new project namespace and matching `<project>-config`, `<project>-secrets`, and `<project>-registry-creds` resources.
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,2 @@
|
||||||
root_url=https://git.unrip-bootstrap.example.com/
|
root_url=https://git.unrip-bootstrap.example.com/
|
||||||
domain=git.unrip-bootstrap.example.com
|
domain=git.unrip-bootstrap.example.com
|
||||||
runner_registration_token=replace-me
|
|
||||||
|
|
|
||||||
|
|
@ -15,6 +15,20 @@ spec:
|
||||||
spec:
|
spec:
|
||||||
serviceAccountName: forgejo-runner
|
serviceAccountName: forgejo-runner
|
||||||
restartPolicy: Always
|
restartPolicy: Always
|
||||||
|
initContainers:
|
||||||
|
- name: wait-for-runner-config
|
||||||
|
image: busybox:1.36
|
||||||
|
command: ["/bin/sh", "-ec"]
|
||||||
|
args:
|
||||||
|
- >-
|
||||||
|
until [ -s /data/.runner ]; do
|
||||||
|
echo "waiting for bootstrap to write /data/.runner";
|
||||||
|
sleep 5;
|
||||||
|
done
|
||||||
|
volumeMounts:
|
||||||
|
- name: forgejo-data
|
||||||
|
mountPath: /data
|
||||||
|
subPath: forgejo-runner
|
||||||
containers:
|
containers:
|
||||||
- name: runner
|
- name: runner
|
||||||
image: code.forgejo.org/forgejo/runner:6.3.1
|
image: code.forgejo.org/forgejo/runner:6.3.1
|
||||||
|
|
@ -22,26 +36,18 @@ spec:
|
||||||
runAsUser: 0
|
runAsUser: 0
|
||||||
runAsGroup: 0
|
runAsGroup: 0
|
||||||
env:
|
env:
|
||||||
- name: FORGEJO_INSTANCE_URL
|
- name: FORGEJO_RUNNER_CONFIG
|
||||||
valueFrom:
|
value: /data/.runner
|
||||||
secretKeyRef:
|
|
||||||
name: forgejo-secrets
|
|
||||||
key: root_url
|
|
||||||
- name: FORGEJO_RUNNER_REGISTRATION_TOKEN
|
|
||||||
valueFrom:
|
|
||||||
secretKeyRef:
|
|
||||||
name: forgejo-secrets
|
|
||||||
key: runner_registration_token
|
|
||||||
command: ["/bin/sh", "-lc"]
|
command: ["/bin/sh", "-lc"]
|
||||||
args:
|
args:
|
||||||
- >-
|
- >-
|
||||||
if [ ! -f /data/.runner ]; then
|
test -s "$FORGEJO_RUNNER_CONFIG" &&
|
||||||
forgejo-runner register --no-interactive --name k3s-runner --instance "$FORGEJO_INSTANCE_URL" --token "$FORGEJO_RUNNER_REGISTRATION_TOKEN" --labels "linux-amd64:host";
|
forgejo-runner daemon --config "$FORGEJO_RUNNER_CONFIG"
|
||||||
fi &&
|
|
||||||
forgejo-runner daemon --config /data/.runner
|
|
||||||
volumeMounts:
|
volumeMounts:
|
||||||
- name: runner-data
|
- name: forgejo-data
|
||||||
mountPath: /data
|
mountPath: /data
|
||||||
|
subPath: forgejo-runner
|
||||||
volumes:
|
volumes:
|
||||||
- name: runner-data
|
- name: forgejo-data
|
||||||
emptyDir: {}
|
persistentVolumeClaim:
|
||||||
|
claimName: forgejo-data
|
||||||
|
|
|
||||||
|
|
@ -1,73 +0,0 @@
|
||||||
apiVersion: apps/v1
|
|
||||||
kind: Deployment
|
|
||||||
metadata:
|
|
||||||
name: ingress-nginx-controller
|
|
||||||
namespace: ingress-nginx
|
|
||||||
spec:
|
|
||||||
replicas: 1
|
|
||||||
selector:
|
|
||||||
matchLabels:
|
|
||||||
app.kubernetes.io/name: ingress-nginx
|
|
||||||
app.kubernetes.io/component: controller
|
|
||||||
template:
|
|
||||||
metadata:
|
|
||||||
labels:
|
|
||||||
app.kubernetes.io/name: ingress-nginx
|
|
||||||
app.kubernetes.io/component: controller
|
|
||||||
spec:
|
|
||||||
serviceAccountName: default
|
|
||||||
containers:
|
|
||||||
- name: controller
|
|
||||||
image: registry.k8s.io/ingress-nginx/controller:v1.12.1
|
|
||||||
args:
|
|
||||||
- /nginx-ingress-controller
|
|
||||||
- --ingress-class=nginx
|
|
||||||
- --controller-class=k8s.io/ingress-nginx
|
|
||||||
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
|
|
||||||
- --election-id=ingress-nginx-leader
|
|
||||||
- --enable-ssl-passthrough
|
|
||||||
env:
|
|
||||||
- name: POD_NAME
|
|
||||||
valueFrom:
|
|
||||||
fieldRef:
|
|
||||||
fieldPath: metadata.name
|
|
||||||
- name: POD_NAMESPACE
|
|
||||||
valueFrom:
|
|
||||||
fieldRef:
|
|
||||||
fieldPath: metadata.namespace
|
|
||||||
ports:
|
|
||||||
- name: http
|
|
||||||
containerPort: 80
|
|
||||||
- name: https
|
|
||||||
containerPort: 443
|
|
||||||
securityContext:
|
|
||||||
allowPrivilegeEscalation: true
|
|
||||||
capabilities:
|
|
||||||
add: ["NET_BIND_SERVICE"]
|
|
||||||
drop: ["ALL"]
|
|
||||||
readinessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /healthz
|
|
||||||
port: 10254
|
|
||||||
livenessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /healthz
|
|
||||||
port: 10254
|
|
||||||
---
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Service
|
|
||||||
metadata:
|
|
||||||
name: ingress-nginx-controller
|
|
||||||
namespace: ingress-nginx
|
|
||||||
spec:
|
|
||||||
type: LoadBalancer
|
|
||||||
selector:
|
|
||||||
app.kubernetes.io/name: ingress-nginx
|
|
||||||
app.kubernetes.io/component: controller
|
|
||||||
ports:
|
|
||||||
- name: http
|
|
||||||
port: 80
|
|
||||||
targetPort: 80
|
|
||||||
- name: https
|
|
||||||
port: 443
|
|
||||||
targetPort: 443
|
|
||||||
|
|
@ -22,18 +22,13 @@ metadata:
|
||||||
---
|
---
|
||||||
apiVersion: v1
|
apiVersion: v1
|
||||||
kind: Namespace
|
kind: Namespace
|
||||||
metadata:
|
|
||||||
name: ingress-nginx
|
|
||||||
labels:
|
|
||||||
project.pi.io/type: platform
|
|
||||||
---
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Namespace
|
|
||||||
metadata:
|
metadata:
|
||||||
name: cert-manager
|
name: cert-manager
|
||||||
labels:
|
labels:
|
||||||
project.pi.io/type: platform
|
project.pi.io/type: platform
|
||||||
---
|
---
|
||||||
|
# Ingress is provided by the Traefik controller bundled with k3s.
|
||||||
|
# No separate ingress-nginx namespace is created by this base.
|
||||||
apiVersion: v1
|
apiVersion: v1
|
||||||
kind: Namespace
|
kind: Namespace
|
||||||
metadata:
|
metadata:
|
||||||
|
|
|
||||||
|
|
@ -6,14 +6,17 @@ This cluster is intended to host multiple independent projects.
|
||||||
- shared platform namespaces:
|
- shared platform namespaces:
|
||||||
- `forgejo`
|
- `forgejo`
|
||||||
- `registry`
|
- `registry`
|
||||||
- `ingress-nginx`
|
- `observability`
|
||||||
- `cert-manager`
|
- `cert-manager`
|
||||||
|
- shared ingress model:
|
||||||
|
- use the k3s-bundled Traefik controller
|
||||||
|
- project Ingress resources should set `ingressClassName: traefik`
|
||||||
- per-project namespaces:
|
- per-project namespaces:
|
||||||
- `unrip`
|
- `unrip`
|
||||||
- future examples: `project-foo`, `project-bar`
|
- future examples: `project-foo`, `project-bar`
|
||||||
|
|
||||||
## How to add another project
|
## How to add another project
|
||||||
For each new project, create a project manifest set similar to `deploy/k8s/base/unrip.yaml`:
|
For each new project, create a project manifest set similar to `deploy/k8s/projects/unrip/base/`:
|
||||||
- one namespace
|
- one namespace
|
||||||
- one project config map
|
- one project config map
|
||||||
- one secret name unique to the project
|
- one secret name unique to the project
|
||||||
|
|
@ -32,4 +35,4 @@ Recommended naming convention:
|
||||||
## Current project in this repo
|
## Current project in this repo
|
||||||
- project name: `unrip`
|
- project name: `unrip`
|
||||||
- namespace: `unrip`
|
- namespace: `unrip`
|
||||||
- project manifest: `deploy/k8s/base/unrip.yaml`
|
- project manifest: `deploy/k8s/projects/unrip/base/`
|
||||||
|
|
|
||||||
|
|
@ -1,105 +1,18 @@
|
||||||
Status: partially successful, not fully healthy yet.
|
# Historical bootstrap status report
|
||||||
|
|
||||||
What worked
|
This file is retained only as an archive of an early, partially successful bootstrap attempt.
|
||||||
- Hetzner VM provisioned
|
It does **not** describe the current cluster state or the current canonical bootstrap flow.
|
||||||
- k3s installed and running
|
|
||||||
- node is `Ready`
|
|
||||||
- namespaces created
|
|
||||||
- Forgejo is up
|
|
||||||
- registry is up
|
|
||||||
- Redpanda is up
|
|
||||||
- `near-intents-ingest` is up
|
|
||||||
|
|
||||||
What is still broken
|
For current operator documentation, use:
|
||||||
- `dummy-reactor`, `dummy-executor`, `dummy-consumer` are failing because Kafka/Redpanda topic metadata is not healthy yet:
|
- `docs/hetzner-k3s-bootstrap.md`
|
||||||
- `This server does not host this topic-partition`
|
- `docs/hetzner-self-hosted-ci-runbook.md`
|
||||||
- ingress-nginx is crashing
|
- `docs/k8s-observability.md`
|
||||||
- cert-manager webhook/cainjector are crashing
|
- `docs/hetzner-rebuild-pipeline.md`
|
||||||
- so public HTTPS ingress is not ready
|
|
||||||
- therefore Git/registry/CI are not yet usable via domain names
|
|
||||||
|
|
||||||
So the honest report is:
|
Current reality has moved past the failures described in the old report:
|
||||||
- cluster bootstrap succeeded
|
- Traefik is the active ingress path
|
||||||
- platform/app stack is only partially healthy
|
- cert-manager is healthy
|
||||||
- we still need another fix pass before calling this “working”
|
- Forgejo, registry, Grafana, and Headlamp are reachable
|
||||||
|
- Forgejo Actions is the default deployment path
|
||||||
|
|
||||||
How to interact with it right now
|
If you need a historical failure log, use Git history for earlier revisions of this file.
|
||||||
|
|
||||||
1. Use kubectl
|
|
||||||
```bash
|
|
||||||
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
|
|
||||||
kubectl get nodes -o wide
|
|
||||||
kubectl get pods -A
|
|
||||||
kubectl -n unrip get pods
|
|
||||||
kubectl -n forgejo get pods,svc
|
|
||||||
kubectl -n registry get pods,svc
|
|
||||||
```
|
|
||||||
|
|
||||||
2. Access Forgejo right now
|
|
||||||
Since ingress is broken, use port-forward:
|
|
||||||
```bash
|
|
||||||
kubectl -n forgejo port-forward svc/forgejo 3000:3000
|
|
||||||
```
|
|
||||||
Then open:
|
|
||||||
```text
|
|
||||||
http://127.0.0.1:3000
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Access the registry right now
|
|
||||||
Also via port-forward:
|
|
||||||
```bash
|
|
||||||
kubectl -n registry port-forward svc/registry 5000:5000
|
|
||||||
```
|
|
||||||
Then from your machine:
|
|
||||||
```bash
|
|
||||||
docker login 127.0.0.1:5000 -u unrip
|
|
||||||
```
|
|
||||||
And push/pull like:
|
|
||||||
```bash
|
|
||||||
docker tag unrip:bootstrap 127.0.0.1:5000/unrip:test
|
|
||||||
docker push 127.0.0.1:5000/unrip:test
|
|
||||||
```
|
|
||||||
|
|
||||||
4. Watch logs
|
|
||||||
```bash
|
|
||||||
kubectl -n unrip logs deploy/near-intents-ingest -f
|
|
||||||
kubectl -n unrip logs deploy/dummy-reactor -f
|
|
||||||
kubectl -n unrip logs deploy/dummy-executor -f
|
|
||||||
kubectl -n unrip logs deploy/dummy-consumer -f
|
|
||||||
kubectl -n forgejo logs deploy/forgejo -f
|
|
||||||
kubectl -n registry logs deploy/registry -f
|
|
||||||
```
|
|
||||||
|
|
||||||
How Git would work once Forgejo is usable
|
|
||||||
After port-forward or later ingress:
|
|
||||||
```bash
|
|
||||||
git remote add forgejo http://127.0.0.1:3000/<owner>/<repo>.git
|
|
||||||
git push forgejo main
|
|
||||||
```
|
|
||||||
|
|
||||||
How CI/CD is supposed to work
|
|
||||||
Intended flow:
|
|
||||||
1. code lives in Forgejo
|
|
||||||
2. Forgejo runner executes `.forgejo/workflows/deploy.yml`
|
|
||||||
3. workflow builds image
|
|
||||||
4. pushes image to registry
|
|
||||||
5. updates `unrip` deployments in Kubernetes
|
|
||||||
|
|
||||||
Current reality:
|
|
||||||
- not ready yet
|
|
||||||
- because ingress/cert-manager are unhealthy
|
|
||||||
- and we haven’t verified a full Forgejo runner deploy cycle
|
|
||||||
|
|
||||||
Bottom line
|
|
||||||
- Kubernetes cluster: yes
|
|
||||||
- server provisioning: yes
|
|
||||||
- basic platform pieces: partially
|
|
||||||
- usable Git/CI/CD stack: not yet
|
|
||||||
- unrip app pipeline: not yet
|
|
||||||
|
|
||||||
Most important next fixes
|
|
||||||
1. fix k3s manifest/platform issues:
|
|
||||||
- ingress-nginx RBAC/crash
|
|
||||||
- cert-manager install/CRDs/RBAC
|
|
||||||
2. fix Redpanda/topic metadata issue so reactor/executor/consumer run
|
|
||||||
3. only then wire Forgejo + registry + CI as usable
|
|
||||||
|
|
|
||||||
|
|
@ -102,6 +102,7 @@ Required values:
|
||||||
- `FORGEJO_ADMIN_PASSWORD_PASS` or `FORGEJO_ADMIN_PASSWORD`
|
- `FORGEJO_ADMIN_PASSWORD_PASS` or `FORGEJO_ADMIN_PASSWORD`
|
||||||
- `GRAFANA_ADMIN_USERNAME` (defaults to `admin`)
|
- `GRAFANA_ADMIN_USERNAME` (defaults to `admin`)
|
||||||
- `GRAFANA_ADMIN_PASSWORD_PASS` or `GRAFANA_ADMIN_PASSWORD`
|
- `GRAFANA_ADMIN_PASSWORD_PASS` or `GRAFANA_ADMIN_PASSWORD`
|
||||||
|
- optional `HEADLAMP_ADMIN_TOKEN_PASS` for storing the generated Headlamp login token back into `pass`
|
||||||
- optional repo settings: `FORGEJO_REPO_OWNER`, `FORGEJO_REPO_NAME`, `FORGEJO_REPO_PRIVATE`
|
- optional repo settings: `FORGEJO_REPO_OWNER`, `FORGEJO_REPO_NAME`, `FORGEJO_REPO_PRIVATE`
|
||||||
|
|
||||||
Optional for automatic DNS:
|
Optional for automatic DNS:
|
||||||
|
|
@ -127,6 +128,8 @@ Outputs:
|
||||||
- overlay secrets and ingress host patches rendered from local env / `pass`
|
- overlay secrets and ingress host patches rendered from local env / `pass`
|
||||||
- `.state/hetzner/generated-overlay/` rendered and applied as the canonical bootstrap manifest set for that run
|
- `.state/hetzner/generated-overlay/` rendered and applied as the canonical bootstrap manifest set for that run
|
||||||
- namespaces, Redpanda, app deployments, Forgejo, registry, Traefik-targeted ingress resources, cert-manager, issuers, and any additional platform resources referenced by `deploy/k8s/platform/base/kustomization.yaml` applied
|
- namespaces, Redpanda, app deployments, Forgejo, registry, Traefik-targeted ingress resources, cert-manager, issuers, and any additional platform resources referenced by `deploy/k8s/platform/base/kustomization.yaml` applied
|
||||||
|
- Headlamp is deployed and wired to the configured public hostname model
|
||||||
|
- bootstrap stores the generated Headlamp service-account token in `pass` when `HEADLAMP_ADMIN_TOKEN_PASS` is configured
|
||||||
- Forgejo admin account created automatically if missing
|
- Forgejo admin account created automatically if missing
|
||||||
- Forgejo runner registration is generated automatically from inside the Forgejo pod and the resulting `/data/.runner` config is stored under the shared `forgejo-data` persistent volume used by the runner deployment
|
- Forgejo runner registration is generated automatically from inside the Forgejo pod and the resulting `/data/.runner` config is stored under the shared `forgejo-data` persistent volume used by the runner deployment
|
||||||
- Forgejo repository created automatically in either the admin user's namespace or a pre-existing organization named by `FORGEJO_REPO_OWNER`
|
- Forgejo repository created automatically in either the admin user's namespace or a pre-existing organization named by `FORGEJO_REPO_OWNER`
|
||||||
|
|
@ -155,7 +158,7 @@ Supported scripted providers:
|
||||||
- Porkbun
|
- Porkbun
|
||||||
|
|
||||||
TLS is handled in-cluster by cert-manager using Let's Encrypt issuers and the rendered ingress hosts.
|
TLS is handled in-cluster by cert-manager using Let's Encrypt issuers and the rendered ingress hosts.
|
||||||
Grafana is the default observability UI wired into the public hostname model. Keep Grafana authenticated.
|
Grafana and Headlamp are both wired into the public hostname model by default. Keep Grafana authenticated, and treat the Headlamp token as an operator credential.
|
||||||
The platform base assumes the default k3s Traefik ingress controller is present; it does not install ingress-nginx.
|
The platform base assumes the default k3s Traefik ingress controller is present; it does not install ingress-nginx.
|
||||||
For clean-cluster applies, the base kustomization now includes cert-manager before the `ClusterIssuer` resources so the issuer CRs can be created in the same bootstrap flow.
|
For clean-cluster applies, the base kustomization now includes cert-manager before the `ClusterIssuer` resources so the issuer CRs can be created in the same bootstrap flow.
|
||||||
|
|
||||||
|
|
@ -214,7 +217,7 @@ bash scripts/hetzner/destroy.sh
|
||||||
|
|
||||||
`destroy.sh` reads `HCLOUD_TOKEN`, optional `TAILSCALE_AUTH_KEY`, optional DNS provider credentials, and optional Forgejo admin credentials via the same `*_PASS` mapping mechanism as bootstrap.
|
`destroy.sh` reads `HCLOUD_TOKEN`, optional `TAILSCALE_AUTH_KEY`, optional DNS provider credentials, and optional Forgejo admin credentials via the same `*_PASS` mapping mechanism as bootstrap.
|
||||||
It uses the same Terraform inputs as bootstrap for the infrastructure resources, then can optionally:
|
It uses the same Terraform inputs as bootstrap for the infrastructure resources, then can optionally:
|
||||||
- delete the scripted DNS records for `${BASE_DOMAIN}`, `git.${BASE_DOMAIN}`, `registry.${BASE_DOMAIN}`, and `grafana.${BASE_DOMAIN}`
|
- delete the scripted DNS records for `${PUBLIC_DOMAIN}`, `git.${PUBLIC_DOMAIN}`, `registry.${PUBLIC_DOMAIN}`, `grafana.${PUBLIC_DOMAIN}`, and `headlamp.${PUBLIC_DOMAIN}`
|
||||||
- remove local bootstrap artifacts under `.state/hetzner/`, `deploy/k8s/overlays/hetzner-single-node/generated/`, and the local Terraform working/state files in `infra/terraform/hetzner/`
|
- remove local bootstrap artifacts under `.state/hetzner/`, `deploy/k8s/overlays/hetzner-single-node/generated/`, and the local Terraform working/state files in `infra/terraform/hetzner/`
|
||||||
- delete the bootstrap-managed Forgejo repository via the Forgejo API
|
- delete the bootstrap-managed Forgejo repository via the Forgejo API
|
||||||
|
|
||||||
|
|
|
||||||
117
docs/hetzner-rebuild-pipeline.md
Normal file
117
docs/hetzner-rebuild-pipeline.md
Normal file
|
|
@ -0,0 +1,117 @@
|
||||||
|
# Hetzner rebuild pipeline map
|
||||||
|
|
||||||
|
This document summarizes the currently intended rebuild flow for the repo-driven Hetzner single-node cluster.
|
||||||
|
|
||||||
|
It is a companion to the operator runbooks, not a competing source of truth.
|
||||||
|
Use these first for exact commands and required env:
|
||||||
|
|
||||||
|
- `docs/hetzner-k3s-bootstrap.md`
|
||||||
|
- `docs/hetzner-self-hosted-ci-runbook.md`
|
||||||
|
- `docs/k8s-observability.md`
|
||||||
|
|
||||||
|
## High-level rebuild sequence
|
||||||
|
|
||||||
|
1. prepare `scripts/hetzner/bootstrap-secrets.env`
|
||||||
|
2. source it so `*_PASS` mappings resolve through `pass`
|
||||||
|
3. optionally run `scripts/hetzner/destroy.sh`
|
||||||
|
4. run `scripts/hetzner/bootstrap.sh`
|
||||||
|
5. let bootstrap:
|
||||||
|
- provision/update Hetzner infra with Terraform
|
||||||
|
- configure DNS when provider credentials are present
|
||||||
|
- fetch the real kubeconfig from the node
|
||||||
|
- render `.state/hetzner/generated-overlay/`
|
||||||
|
- apply platform + project manifests
|
||||||
|
- bootstrap Forgejo admin, runner, repo, and Actions configuration
|
||||||
|
- seed the repo into Forgejo
|
||||||
|
- trigger the normal Forgejo Actions build/push/deploy path
|
||||||
|
6. verify public/operator surfaces:
|
||||||
|
- Forgejo
|
||||||
|
- registry
|
||||||
|
- Grafana
|
||||||
|
- Headlamp
|
||||||
|
7. verify workload health and CI success
|
||||||
|
|
||||||
|
## Ownership boundaries
|
||||||
|
|
||||||
|
### Terraform owns
|
||||||
|
- Hetzner VM
|
||||||
|
- network
|
||||||
|
- firewall
|
||||||
|
- cloud-init user data
|
||||||
|
|
||||||
|
### Cloud-init owns
|
||||||
|
- OS package prep
|
||||||
|
- optional Tailscale join
|
||||||
|
- k3s installation
|
||||||
|
- a marker file under `/opt/unrip/bootstrap/README.txt`
|
||||||
|
|
||||||
|
Cloud-init does **not** clone this repo or apply Kubernetes manifests.
|
||||||
|
|
||||||
|
### Bootstrap script owns
|
||||||
|
- `pass`-resolved secret loading
|
||||||
|
- DNS automation
|
||||||
|
- kubeconfig retrieval/rendering
|
||||||
|
- generated overlay rendering under `.state/hetzner/generated-overlay/`
|
||||||
|
- imperative registry auth secret creation
|
||||||
|
- Forgejo bootstrap API calls
|
||||||
|
- repo seeding
|
||||||
|
- Headlamp token export to `pass`
|
||||||
|
|
||||||
|
### Kubernetes manifests own
|
||||||
|
- platform services
|
||||||
|
- project services
|
||||||
|
- ingress/TLS resources
|
||||||
|
- observability stack
|
||||||
|
- persistent volume claims and workload specs
|
||||||
|
|
||||||
|
## Current default runtime model
|
||||||
|
|
||||||
|
Platform services:
|
||||||
|
- Forgejo
|
||||||
|
- Forgejo runner
|
||||||
|
- registry
|
||||||
|
- cert-manager
|
||||||
|
- Grafana
|
||||||
|
- Loki
|
||||||
|
- Promtail
|
||||||
|
- Headlamp
|
||||||
|
|
||||||
|
Project services:
|
||||||
|
- Redpanda
|
||||||
|
- `near-intents-ingest`
|
||||||
|
- `dummy-reactor`
|
||||||
|
- `dummy-executor`
|
||||||
|
- `dummy-consumer`
|
||||||
|
|
||||||
|
Ingress/controller model:
|
||||||
|
- Traefik bundled with k3s
|
||||||
|
- no ingress-nginx in the active path
|
||||||
|
|
||||||
|
## Rebuild verification checklist
|
||||||
|
|
||||||
|
After bootstrap, verify:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
|
||||||
|
kubectl get nodes -o wide
|
||||||
|
kubectl get pods -A
|
||||||
|
kubectl -n observability get deploy,ds,pods,svc,ingress,secrets
|
||||||
|
kubectl -n forgejo get deploy,pods,svc,ingress
|
||||||
|
kubectl -n registry get deploy,pods,svc,ingress
|
||||||
|
kubectl -n unrip get deploy,pods
|
||||||
|
```
|
||||||
|
|
||||||
|
Public/operator surfaces should respond:
|
||||||
|
- `https://git.<public-domain>/`
|
||||||
|
- `https://registry.<public-domain>/v2/`
|
||||||
|
- `https://grafana.<public-domain>/`
|
||||||
|
- `https://headlamp.<public-domain>/`
|
||||||
|
|
||||||
|
CI should show a successful deploy workflow in Forgejo Actions.
|
||||||
|
|
||||||
|
## Current caveat
|
||||||
|
|
||||||
|
The core Hetzner/k3s/Forgejo path has been rebuilt successfully before.
|
||||||
|
Headlamp was added afterward and validated live on the rebuilt cluster, but a brand-new destroy/rebuild rehearsal with Headlamp included has not yet been re-run from zero.
|
||||||
|
|
||||||
|
So the rebuild story is repo-driven and operationally close to fully reproducible, with one remaining value-add validation step: a final clean-room rebuild after the latest Headlamp/docs cleanup.
|
||||||
|
|
@ -19,20 +19,17 @@ write_files:
|
||||||
#!/usr/bin/env bash
|
#!/usr/bin/env bash
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
install -d -m 0755 /opt/unrip
|
|
||||||
if [ ! -d /opt/unrip/repo/.git ]; then
|
|
||||||
git clone --branch ${bootstrap_repo_branch} ${bootstrap_repo_url} /opt/unrip/repo
|
|
||||||
else
|
|
||||||
git -C /opt/unrip/repo fetch --all --prune
|
|
||||||
git -C /opt/unrip/repo checkout ${bootstrap_repo_branch}
|
|
||||||
git -C /opt/unrip/repo pull --ff-only origin ${bootstrap_repo_branch}
|
|
||||||
fi
|
|
||||||
|
|
||||||
install -d -m 0755 /opt/unrip/bootstrap
|
install -d -m 0755 /opt/unrip/bootstrap
|
||||||
cat >/opt/unrip/bootstrap/README.txt <<'EOF'
|
cat >/opt/unrip/bootstrap/README.txt <<'EOF'
|
||||||
This node was provisioned by Terraform + cloud-init.
|
This node was provisioned by Terraform + cloud-init.
|
||||||
Future Kubernetes bootstrap assets should live in:
|
This cloud-init step no longer clones a bootstrap repository.
|
||||||
/opt/unrip/repo/${bootstrap_repo_path}
|
The current Hetzner flow remains workstation-driven after Terraform:
|
||||||
|
- scripts/hetzner/bootstrap.sh fetches kubeconfig from the node
|
||||||
|
- scripts/hetzner/bootstrap.sh renders secrets/overlays locally
|
||||||
|
- scripts/hetzner/bootstrap.sh applies Kubernetes manifests from the operator workstation
|
||||||
|
|
||||||
|
Reserved for future node-local bootstrap/GitOps assets:
|
||||||
|
/opt/unrip/bootstrap/${bootstrap_repo_path}
|
||||||
EOF
|
EOF
|
||||||
- path: /etc/rancher/k3s/config.yaml
|
- path: /etc/rancher/k3s/config.yaml
|
||||||
permissions: '0644'
|
permissions: '0644'
|
||||||
|
|
|
||||||
|
|
@ -38,8 +38,6 @@ resource "hcloud_server" "trading_system" {
|
||||||
node_name = var.name
|
node_name = var.name
|
||||||
private_ipv4_address = var.private_ipv4_address
|
private_ipv4_address = var.private_ipv4_address
|
||||||
public_domain = var.public_domain
|
public_domain = var.public_domain
|
||||||
bootstrap_repo_url = var.bootstrap_repo_url
|
|
||||||
bootstrap_repo_branch = var.bootstrap_repo_branch
|
|
||||||
bootstrap_repo_path = var.bootstrap_repo_path
|
bootstrap_repo_path = var.bootstrap_repo_path
|
||||||
tailscale_enabled = var.tailscale_enabled
|
tailscale_enabled = var.tailscale_enabled
|
||||||
tailscale_auth_key = var.tailscale_auth_key
|
tailscale_auth_key = var.tailscale_auth_key
|
||||||
|
|
|
||||||
|
|
@ -26,10 +26,6 @@ output "kubeconfig_strategy" {
|
||||||
value = var.tailscale_enabled ? "Use Tailscale for private Kubernetes API access; avoid public SSH/Kubernetes exposure in the canonical flow." : "Use the public Kubernetes API endpoint with an operator-supplied bootstrap credential; avoid SSH/scp kubeconfig retrieval in the canonical flow."
|
value = var.tailscale_enabled ? "Use Tailscale for private Kubernetes API access; avoid public SSH/Kubernetes exposure in the canonical flow." : "Use the public Kubernetes API endpoint with an operator-supplied bootstrap credential; avoid SSH/scp kubeconfig retrieval in the canonical flow."
|
||||||
}
|
}
|
||||||
|
|
||||||
output "bootstrap_repo_checkout" {
|
|
||||||
value = "/opt/unrip/repo"
|
|
||||||
}
|
|
||||||
|
|
||||||
output "bootstrap_marker_file" {
|
output "bootstrap_marker_file" {
|
||||||
value = "/opt/unrip/bootstrap/README.txt"
|
value = "/opt/unrip/bootstrap/README.txt"
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -93,19 +93,8 @@ variable "public_domain" {
|
||||||
type = string
|
type = string
|
||||||
}
|
}
|
||||||
|
|
||||||
variable "bootstrap_repo_url" {
|
|
||||||
description = "Git repository URL cloned onto the node for GitOps/bootstrap assets"
|
|
||||||
type = string
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "bootstrap_repo_branch" {
|
|
||||||
description = "Branch checked out for the bootstrap repository"
|
|
||||||
type = string
|
|
||||||
default = "main"
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "bootstrap_repo_path" {
|
variable "bootstrap_repo_path" {
|
||||||
description = "Repository subdirectory expected to contain future Kubernetes bootstrap manifests/scripts"
|
description = "Reserved repository subdirectory name for a future node-local bootstrap/GitOps flow; current provisioning still applies manifests from the operator workstation"
|
||||||
type = string
|
type = string
|
||||||
default = "deploy/k8s"
|
default = "deploy/k8s"
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -57,7 +57,6 @@ export FORGEJO_ROOT_URL="${FORGEJO_ROOT_URL:-https://${FORGEJO_DOMAIN}/}"
|
||||||
export REGISTRY_DOMAIN="${REGISTRY_DOMAIN:-registry.${PUBLIC_DOMAIN}}"
|
export REGISTRY_DOMAIN="${REGISTRY_DOMAIN:-registry.${PUBLIC_DOMAIN}}"
|
||||||
export GRAFANA_DOMAIN="${GRAFANA_DOMAIN:-grafana.${PUBLIC_DOMAIN}}"
|
export GRAFANA_DOMAIN="${GRAFANA_DOMAIN:-grafana.${PUBLIC_DOMAIN}}"
|
||||||
export GRAFANA_ROOT_URL="${GRAFANA_ROOT_URL:-https://${GRAFANA_DOMAIN}/}"
|
export GRAFANA_ROOT_URL="${GRAFANA_ROOT_URL:-https://${GRAFANA_DOMAIN}/}"
|
||||||
export HEADLAMP_DOMAIN="${HEADLAMP_DOMAIN:-headlamp.${PUBLIC_DOMAIN}}"
|
|
||||||
export LETSENCRYPT_EMAIL="${LETSENCRYPT_EMAIL:-ops@example.com}"
|
export LETSENCRYPT_EMAIL="${LETSENCRYPT_EMAIL:-ops@example.com}"
|
||||||
|
|
||||||
# Optional DNS automation: choose one provider
|
# Optional DNS automation: choose one provider
|
||||||
|
|
@ -85,10 +84,13 @@ export FORGEJO_ADMIN_PASSWORD_PASS="${FORGEJO_ADMIN_PASSWORD_PASS:-$(pass_ref fo
|
||||||
export GRAFANA_ADMIN_USERNAME="${GRAFANA_ADMIN_USERNAME:-admin}"
|
export GRAFANA_ADMIN_USERNAME="${GRAFANA_ADMIN_USERNAME:-admin}"
|
||||||
export GRAFANA_ADMIN_PASSWORD_PASS="${GRAFANA_ADMIN_PASSWORD_PASS:-$(pass_ref grafana/admin-password)}"
|
export GRAFANA_ADMIN_PASSWORD_PASS="${GRAFANA_ADMIN_PASSWORD_PASS:-$(pass_ref grafana/admin-password)}"
|
||||||
|
|
||||||
# Optional storage path for the generated Headlamp admin login token.
|
|
||||||
# Bootstrap writes the in-cluster token here after Headlamp is available.
|
|
||||||
export HEADLAMP_ADMIN_TOKEN_PASS="${HEADLAMP_ADMIN_TOKEN_PASS:-$(pass_ref headlamp/admin-token)}"
|
export HEADLAMP_ADMIN_TOKEN_PASS="${HEADLAMP_ADMIN_TOKEN_PASS:-$(pass_ref headlamp/admin-token)}"
|
||||||
|
|
||||||
|
# Headlamp bootstrap token handling:
|
||||||
|
# - bootstrap stores the generated token in HEADLAMP_ADMIN_TOKEN_PASS when set
|
||||||
|
# - the current default public hostname is HEADLAMP_DOMAIN
|
||||||
|
# - for a stricter posture, you can still keep Headlamp private behind Tailscale or another admin path
|
||||||
|
|
||||||
# Optional explicit overrides for CI/testing:
|
# Optional explicit overrides for CI/testing:
|
||||||
# export HCLOUD_TOKEN="..."
|
# export HCLOUD_TOKEN="..."
|
||||||
# export REGISTRY_PASSWORD="..."
|
# export REGISTRY_PASSWORD="..."
|
||||||
|
|
|
||||||
|
|
@ -395,7 +395,7 @@ for attempt in $(seq 1 60); do
|
||||||
sleep 2
|
sleep 2
|
||||||
done
|
done
|
||||||
if [[ -z "$HEADLAMP_ADMIN_TOKEN" ]]; then
|
if [[ -z "$HEADLAMP_ADMIN_TOKEN" ]]; then
|
||||||
echo "warning: headlamp admin token not available yet; rerun bootstrap or read secret headlamp-admin-token manually" >&2
|
echo "warning: headlamp admin token not available yet; read secret headlamp-admin-token manually if needed" >&2
|
||||||
elif [[ -n "${HEADLAMP_ADMIN_TOKEN_PASS:-}" ]]; then
|
elif [[ -n "${HEADLAMP_ADMIN_TOKEN_PASS:-}" ]]; then
|
||||||
store_secret_to_pass "$HEADLAMP_ADMIN_TOKEN_PASS" "$HEADLAMP_ADMIN_TOKEN"
|
store_secret_to_pass "$HEADLAMP_ADMIN_TOKEN_PASS" "$HEADLAMP_ADMIN_TOKEN"
|
||||||
echo "stored headlamp admin token in pass: $HEADLAMP_ADMIN_TOKEN_PASS"
|
echo "stored headlamp admin token in pass: $HEADLAMP_ADMIN_TOKEN_PASS"
|
||||||
|
|
|
||||||
|
|
@ -62,22 +62,28 @@ records=(
|
||||||
"headlamp.$PUBLIC_DOMAIN"
|
"headlamp.$PUBLIC_DOMAIN"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
ROOT_RECORD="${records[0]}"
|
||||||
|
GIT_RECORD="${records[1]}"
|
||||||
|
REGISTRY_RECORD="${records[2]}"
|
||||||
|
GRAFANA_RECORD="${records[3]}"
|
||||||
|
HEADLAMP_RECORD="${records[4]}"
|
||||||
|
|
||||||
case "$DNS_MODE" in
|
case "$DNS_MODE" in
|
||||||
upsert)
|
upsert)
|
||||||
: "${SERVER_IP:?set SERVER_IP}"
|
: "${SERVER_IP:?set SERVER_IP}"
|
||||||
upsert_record A "${records[0]}" "$SERVER_IP" false
|
upsert_record A "$ROOT_RECORD" "$SERVER_IP" false
|
||||||
upsert_record A "${records[1]}" "$SERVER_IP" false
|
upsert_record A "$GIT_RECORD" "$SERVER_IP" false
|
||||||
upsert_record A "${records[2]}" "$SERVER_IP" false
|
upsert_record A "$REGISTRY_RECORD" "$SERVER_IP" false
|
||||||
upsert_record A "${records[3]}" "$SERVER_IP" false
|
upsert_record A "$GRAFANA_RECORD" "$SERVER_IP" false
|
||||||
upsert_record A "${records[4]}" "$SERVER_IP" false
|
upsert_record A "$HEADLAMP_RECORD" "$SERVER_IP" false
|
||||||
echo "cloudflare dns updated for ${records[*]}"
|
echo "cloudflare dns updated for ${records[*]}"
|
||||||
;;
|
;;
|
||||||
delete)
|
delete)
|
||||||
delete_record A "${records[0]}"
|
delete_record A "$ROOT_RECORD"
|
||||||
delete_record A "${records[1]}"
|
delete_record A "$GIT_RECORD"
|
||||||
delete_record A "${records[2]}"
|
delete_record A "$REGISTRY_RECORD"
|
||||||
delete_record A "${records[3]}"
|
delete_record A "$GRAFANA_RECORD"
|
||||||
delete_record A "${records[4]}"
|
delete_record A "$HEADLAMP_RECORD"
|
||||||
echo "cloudflare dns cleanup finished for ${records[*]}"
|
echo "cloudflare dns cleanup finished for ${records[*]}"
|
||||||
;;
|
;;
|
||||||
*)
|
*)
|
||||||
|
|
|
||||||
|
|
@ -38,7 +38,6 @@ TF_VARS=(
|
||||||
-var "hcloud_token=$HCLOUD_TOKEN"
|
-var "hcloud_token=$HCLOUD_TOKEN"
|
||||||
-var "ssh_public_key=$SSH_PUBLIC_KEY"
|
-var "ssh_public_key=$SSH_PUBLIC_KEY"
|
||||||
-var "public_domain=$PUBLIC_DOMAIN"
|
-var "public_domain=$PUBLIC_DOMAIN"
|
||||||
-var "bootstrap_repo_url=local-bootstrap"
|
|
||||||
-var "tailscale_auth_key=${TAILSCALE_AUTH_KEY:-}"
|
-var "tailscale_auth_key=${TAILSCALE_AUTH_KEY:-}"
|
||||||
-var "tailscale_control_plane_hostname=$TAILSCALE_CONTROL_PLANE_HOSTNAME"
|
-var "tailscale_control_plane_hostname=$TAILSCALE_CONTROL_PLANE_HOSTNAME"
|
||||||
)
|
)
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue