doran/docs/k8s-observability.md

4.1 KiB

Kubernetes observability on the Hetzner single-node cluster

This cluster now includes a minimal reproducible log stack in the observability namespace:

  • loki for log storage and querying
  • promtail as a DaemonSet that ships pod stdout/stderr logs from every node
  • grafana as the web UI

What gets collected

Promtail tails Kubernetes container log files under /var/log/pods on each node. That means any container writing logs to stdout/stderr automatically shows up in Loki/Grafana.

This fits the current app setup in this repo because the services already log to stdout/stderr.

What is not collected automatically:

  • arbitrary log files written somewhere else inside a container filesystem
  • logs from external services that are not running as Kubernetes pods on this cluster

Access

Grafana is exposed through Traefik + cert-manager at:

  • https://${GRAFANA_DOMAIN} when bootstrapped from scripts/hetzner/bootstrap-secrets.env
  • in the current live environment: https://grafana.doran.133011.xyz/

Admin credentials come from:

  • GRAFANA_ADMIN_USERNAME
  • GRAFANA_ADMIN_PASSWORD_PASS or GRAFANA_ADMIN_PASSWORD

The recommended path is pass. In the current live setup the password is stored at:

  • api/hetznerk3s/grafana-admin-password

Reproducible bootstrap path

The observability stack is part of the repo-managed platform layer:

  • deploy/k8s/platform/base/observability.yaml
  • deploy/k8s/platform/base/kustomization.yaml
  • deploy/k8s/platform/base/namespace.yaml
  • deploy/k8s/overlays/hetzner-single-node/storage-class.patch.yaml
  • deploy/k8s/overlays/hetzner-single-node/kustomization.yaml
  • deploy/k8s/overlays/hetzner-single-node/ingress-hosts.patch.yaml
  • deploy/k8s/overlays/hetzner-single-node/secrets/observability.env.example

Bootstrap materializes the Grafana secret from local env / pass:

  • writes deploy/k8s/overlays/hetzner-single-node/secrets/observability.env
  • copies it into .state/hetzner/generated-overlay/
  • applies the generated overlay

Verify the stack

export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml

kubectl -n observability get pods
kubectl -n observability get pvc
kubectl -n observability get ingress
kubectl -n observability rollout status deployment/loki --timeout=300s
kubectl -n observability rollout status deployment/grafana --timeout=300s
kubectl -n observability rollout status daemonset/promtail --timeout=300s

Verify logs are arriving

Generate some app logs, then query Loki directly:

export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
kubectl -n observability port-forward svc/loki 3100:3100

In another shell:

curl -sS 'http://127.0.0.1:3100/loki/api/v1/labels' | jq
curl -G -sS 'http://127.0.0.1:3100/loki/api/v1/query' \
  --data-urlencode 'query={namespace="unrip"}' | jq

If those queries return labels/streams, pod logs are reaching Loki.

Use Grafana

After logging into Grafana:

  1. open Explore
  2. choose the default Loki datasource
  3. run queries like:
    • {namespace="unrip"}
    • {namespace="forgejo"}
    • {namespace="registry"}
    • {pod=~"near-intents-ingest.*"}
    • {container="app"}

Useful labels added by promtail:

  • namespace
  • pod
  • container
  • app
  • selected app.kubernetes.io/* labels

Day-to-day ops

CLI remains useful for fast debugging:

kubectl get pods -A
kubectl -n unrip logs deploy/near-intents-ingest -f
kubectl -n forgejo logs deploy/forgejo -f
bash scripts/k8s/logs.sh

Use Grafana when you want:

  • a browser UI
  • historical log search
  • multi-namespace filtering
  • easier cross-pod inspection

Security notes

Grafana is an admin/operator surface. For this cluster it is publicly reachable behind Grafana login. That is acceptable for this disposable single-node setup, but for a harder production posture prefer one of:

  • Tailscale-only access
  • ingress auth in front of Grafana
  • SSO/OIDC

Add a new app and have logs show up there

Nothing special is required as long as the new pod logs to stdout/stderr. If you deploy a new app under Kubernetes and expose it through the usual manifests/Ingress flow, promtail will scrape its pod logs automatically.