doran/docs/k8s-observability.md
2026-03-29 10:28:09 +02:00

172 lines
5.5 KiB
Markdown

# Kubernetes observability on the Hetzner single-node cluster
This cluster now includes a reproducible ops/observability stack in the `observability` namespace:
- `loki` for log storage and querying
- `promtail` as a DaemonSet that ships pod stdout/stderr logs from every node
- `grafana` for log search and historical exploration
- `headlamp` for a Kubernetes web UI with pods, workloads, events, and pod logs
## What gets collected
Promtail tails Kubernetes container log files under `/var/log/pods` on each node.
That means any container writing logs to stdout/stderr automatically shows up in Loki/Grafana.
This fits the current app setup in this repo because the services already log to stdout/stderr.
What is **not** collected automatically:
- arbitrary log files written somewhere else inside a container filesystem
- logs from external services that are not running as Kubernetes pods on this cluster
## Access
Grafana is exposed through Traefik + cert-manager at:
- `https://${GRAFANA_DOMAIN}` when bootstrapped from `scripts/hetzner/bootstrap-secrets.env`
- in the current live environment: `https://grafana.doran.133011.xyz/`
Grafana credentials come from:
- `GRAFANA_ADMIN_USERNAME`
- `GRAFANA_ADMIN_PASSWORD_PASS` or `GRAFANA_ADMIN_PASSWORD`
The recommended path is `pass`.
In the current live setup the password is stored at:
- `api/hetznerk3s/grafana-admin-password`
Headlamp is exposed at:
- `https://${HEADLAMP_DOMAIN}` when bootstrapped from `scripts/hetzner/bootstrap-secrets.env`
- in the current live environment: `https://headlamp.doran.133011.xyz/`
Headlamp uses a Kubernetes service-account token for login. Bootstrap stores the generated token in `pass` when `HEADLAMP_ADMIN_TOKEN_PASS` is set.
In the current live setup it is stored at:
- `api/hetznerk3s/headlamp-admin-token`
## Reproducible bootstrap path
The observability stack is part of the repo-managed platform layer:
- `deploy/k8s/platform/base/observability.yaml`
- `deploy/k8s/platform/base/headlamp.yaml`
- `deploy/k8s/platform/base/kustomization.yaml`
- `deploy/k8s/platform/base/namespace.yaml`
- `deploy/k8s/overlays/hetzner-single-node/storage-class.patch.yaml`
- `deploy/k8s/overlays/hetzner-single-node/kustomization.yaml`
- `deploy/k8s/overlays/hetzner-single-node/ingress-hosts.patch.yaml`
- `deploy/k8s/overlays/hetzner-single-node/secrets/observability.env.example`
Bootstrap materializes the Grafana secret from local env / `pass` and also stores the generated Headlamp login token back into `pass` when configured:
- writes `deploy/k8s/overlays/hetzner-single-node/secrets/observability.env`
- copies it into `.state/hetzner/generated-overlay/`
- applies the generated overlay
- waits for `headlamp-admin-token`
- stores that token via `HEADLAMP_ADMIN_TOKEN_PASS`
## Verify the stack
```bash
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
kubectl -n observability get pods
kubectl -n observability get pvc
kubectl -n observability get ingress
kubectl -n observability rollout status deployment/loki --timeout=300s
kubectl -n observability rollout status deployment/grafana --timeout=300s
kubectl -n observability rollout status deployment/headlamp --timeout=300s
kubectl -n observability rollout status daemonset/promtail --timeout=300s
```
## Verify logs are arriving
Generate some app logs, then query Loki directly:
```bash
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
kubectl -n observability port-forward svc/loki 3100:3100
```
In another shell:
```bash
curl -sS 'http://127.0.0.1:3100/loki/api/v1/labels' | jq
curl -G -sS 'http://127.0.0.1:3100/loki/api/v1/query' \
--data-urlencode 'query={namespace="unrip"}' | jq
```
If those queries return labels/streams, pod logs are reaching Loki.
## Use Headlamp
1. open `https://headlamp.doran.133011.xyz/`
2. fetch the login token with:
```bash
pass show api/hetznerk3s/headlamp-admin-token
```
3. paste that token into the Headlamp login form
4. browse namespaces, workloads, pods, and use the built-in pod log view
For this disposable cluster the generated Headlamp token is bound to `cluster-admin` so the UI can show everything. For a production setup, replace that with narrower RBAC.
## Use Grafana
After logging into Grafana:
1. open **Explore**
2. choose the default **Loki** datasource
3. run queries like:
- `{namespace="unrip"}`
- `{namespace="forgejo"}`
- `{namespace="registry"}`
- `{pod=~"near-intents-ingest.*"}`
- `{container="app"}`
Useful labels added by promtail:
- `namespace`
- `pod`
- `container`
- `app`
- selected `app.kubernetes.io/*` labels
## Day-to-day ops
CLI remains useful for fast debugging:
```bash
kubectl get pods -A
kubectl -n unrip logs deploy/near-intents-ingest -f
kubectl -n forgejo logs deploy/forgejo -f
bash scripts/k8s/logs.sh
```
Use Headlamp when you want:
- a web UI listing workloads and pods
- click-through pod inspection
- built-in pod log viewing
- events and resource browsing
Use Grafana when you want:
- historical log search
- cross-pod filtering
- LogQL queries
- easier multi-namespace log exploration
## Security notes
Grafana is an admin/operator surface.
For this cluster it is publicly reachable behind Grafana login.
That is acceptable for this disposable single-node setup, but for a harder production posture prefer one of:
- Tailscale-only access
- ingress auth in front of Grafana and Headlamp
- SSO/OIDC
## Add a new app and have logs show up there
Nothing special is required as long as the new pod logs to stdout/stderr.
If you deploy a new app under Kubernetes and expose it through the usual manifests/Ingress flow, promtail will scrape its pod logs automatically.