doran/docs/k8s-observability.md
2026-03-29 10:28:09 +02:00

5.5 KiB

Kubernetes observability on the Hetzner single-node cluster

This cluster now includes a reproducible ops/observability stack in the observability namespace:

  • loki for log storage and querying
  • promtail as a DaemonSet that ships pod stdout/stderr logs from every node
  • grafana for log search and historical exploration
  • headlamp for a Kubernetes web UI with pods, workloads, events, and pod logs

What gets collected

Promtail tails Kubernetes container log files under /var/log/pods on each node. That means any container writing logs to stdout/stderr automatically shows up in Loki/Grafana.

This fits the current app setup in this repo because the services already log to stdout/stderr.

What is not collected automatically:

  • arbitrary log files written somewhere else inside a container filesystem
  • logs from external services that are not running as Kubernetes pods on this cluster

Access

Grafana is exposed through Traefik + cert-manager at:

  • https://${GRAFANA_DOMAIN} when bootstrapped from scripts/hetzner/bootstrap-secrets.env
  • in the current live environment: https://grafana.doran.133011.xyz/

Grafana credentials come from:

  • GRAFANA_ADMIN_USERNAME
  • GRAFANA_ADMIN_PASSWORD_PASS or GRAFANA_ADMIN_PASSWORD

The recommended path is pass. In the current live setup the password is stored at:

  • api/hetznerk3s/grafana-admin-password

Headlamp is exposed at:

  • https://${HEADLAMP_DOMAIN} when bootstrapped from scripts/hetzner/bootstrap-secrets.env
  • in the current live environment: https://headlamp.doran.133011.xyz/

Headlamp uses a Kubernetes service-account token for login. Bootstrap stores the generated token in pass when HEADLAMP_ADMIN_TOKEN_PASS is set. In the current live setup it is stored at:

  • api/hetznerk3s/headlamp-admin-token

Reproducible bootstrap path

The observability stack is part of the repo-managed platform layer:

  • deploy/k8s/platform/base/observability.yaml
  • deploy/k8s/platform/base/headlamp.yaml
  • deploy/k8s/platform/base/kustomization.yaml
  • deploy/k8s/platform/base/namespace.yaml
  • deploy/k8s/overlays/hetzner-single-node/storage-class.patch.yaml
  • deploy/k8s/overlays/hetzner-single-node/kustomization.yaml
  • deploy/k8s/overlays/hetzner-single-node/ingress-hosts.patch.yaml
  • deploy/k8s/overlays/hetzner-single-node/secrets/observability.env.example

Bootstrap materializes the Grafana secret from local env / pass and also stores the generated Headlamp login token back into pass when configured:

  • writes deploy/k8s/overlays/hetzner-single-node/secrets/observability.env
  • copies it into .state/hetzner/generated-overlay/
  • applies the generated overlay
  • waits for headlamp-admin-token
  • stores that token via HEADLAMP_ADMIN_TOKEN_PASS

Verify the stack

export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml

kubectl -n observability get pods
kubectl -n observability get pvc
kubectl -n observability get ingress
kubectl -n observability rollout status deployment/loki --timeout=300s
kubectl -n observability rollout status deployment/grafana --timeout=300s
kubectl -n observability rollout status deployment/headlamp --timeout=300s
kubectl -n observability rollout status daemonset/promtail --timeout=300s

Verify logs are arriving

Generate some app logs, then query Loki directly:

export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
kubectl -n observability port-forward svc/loki 3100:3100

In another shell:

curl -sS 'http://127.0.0.1:3100/loki/api/v1/labels' | jq
curl -G -sS 'http://127.0.0.1:3100/loki/api/v1/query' \
  --data-urlencode 'query={namespace="unrip"}' | jq

If those queries return labels/streams, pod logs are reaching Loki.

Use Headlamp

  1. open https://headlamp.doran.133011.xyz/
  2. fetch the login token with:
pass show api/hetznerk3s/headlamp-admin-token
  1. paste that token into the Headlamp login form
  2. browse namespaces, workloads, pods, and use the built-in pod log view

For this disposable cluster the generated Headlamp token is bound to cluster-admin so the UI can show everything. For a production setup, replace that with narrower RBAC.

Use Grafana

After logging into Grafana:

  1. open Explore
  2. choose the default Loki datasource
  3. run queries like:
    • {namespace="unrip"}
    • {namespace="forgejo"}
    • {namespace="registry"}
    • {pod=~"near-intents-ingest.*"}
    • {container="app"}

Useful labels added by promtail:

  • namespace
  • pod
  • container
  • app
  • selected app.kubernetes.io/* labels

Day-to-day ops

CLI remains useful for fast debugging:

kubectl get pods -A
kubectl -n unrip logs deploy/near-intents-ingest -f
kubectl -n forgejo logs deploy/forgejo -f
bash scripts/k8s/logs.sh

Use Headlamp when you want:

  • a web UI listing workloads and pods
  • click-through pod inspection
  • built-in pod log viewing
  • events and resource browsing

Use Grafana when you want:

  • historical log search
  • cross-pod filtering
  • LogQL queries
  • easier multi-namespace log exploration

Security notes

Grafana is an admin/operator surface. For this cluster it is publicly reachable behind Grafana login. That is acceptable for this disposable single-node setup, but for a harder production posture prefer one of:

  • Tailscale-only access
  • ingress auth in front of Grafana and Headlamp
  • SSO/OIDC

Add a new app and have logs show up there

Nothing special is required as long as the new pod logs to stdout/stderr. If you deploy a new app under Kubernetes and expose it through the usual manifests/Ingress flow, promtail will scrape its pod logs automatically.