137 lines
4.1 KiB
Markdown
137 lines
4.1 KiB
Markdown
# Kubernetes observability on the Hetzner single-node cluster
|
|
|
|
This cluster now includes a minimal reproducible log stack in the `observability` namespace:
|
|
|
|
- `loki` for log storage and querying
|
|
- `promtail` as a DaemonSet that ships pod stdout/stderr logs from every node
|
|
- `grafana` as the web UI
|
|
|
|
## What gets collected
|
|
|
|
Promtail tails Kubernetes container log files under `/var/log/pods` on each node.
|
|
That means any container writing logs to stdout/stderr automatically shows up in Loki/Grafana.
|
|
|
|
This fits the current app setup in this repo because the services already log to stdout/stderr.
|
|
|
|
What is **not** collected automatically:
|
|
- arbitrary log files written somewhere else inside a container filesystem
|
|
- logs from external services that are not running as Kubernetes pods on this cluster
|
|
|
|
## Access
|
|
|
|
Grafana is exposed through Traefik + cert-manager at:
|
|
|
|
- `https://${GRAFANA_DOMAIN}` when bootstrapped from `scripts/hetzner/bootstrap-secrets.env`
|
|
- in the current live environment: `https://grafana.doran.133011.xyz/`
|
|
|
|
Admin credentials come from:
|
|
|
|
- `GRAFANA_ADMIN_USERNAME`
|
|
- `GRAFANA_ADMIN_PASSWORD_PASS` or `GRAFANA_ADMIN_PASSWORD`
|
|
|
|
The recommended path is `pass`.
|
|
In the current live setup the password is stored at:
|
|
|
|
- `api/hetznerk3s/grafana-admin-password`
|
|
|
|
## Reproducible bootstrap path
|
|
|
|
The observability stack is part of the repo-managed platform layer:
|
|
|
|
- `deploy/k8s/platform/base/observability.yaml`
|
|
- `deploy/k8s/platform/base/kustomization.yaml`
|
|
- `deploy/k8s/platform/base/namespace.yaml`
|
|
- `deploy/k8s/overlays/hetzner-single-node/storage-class.patch.yaml`
|
|
- `deploy/k8s/overlays/hetzner-single-node/kustomization.yaml`
|
|
- `deploy/k8s/overlays/hetzner-single-node/ingress-hosts.patch.yaml`
|
|
- `deploy/k8s/overlays/hetzner-single-node/secrets/observability.env.example`
|
|
|
|
Bootstrap materializes the Grafana secret from local env / `pass`:
|
|
|
|
- writes `deploy/k8s/overlays/hetzner-single-node/secrets/observability.env`
|
|
- copies it into `.state/hetzner/generated-overlay/`
|
|
- applies the generated overlay
|
|
|
|
## Verify the stack
|
|
|
|
```bash
|
|
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
|
|
|
|
kubectl -n observability get pods
|
|
kubectl -n observability get pvc
|
|
kubectl -n observability get ingress
|
|
kubectl -n observability rollout status deployment/loki --timeout=300s
|
|
kubectl -n observability rollout status deployment/grafana --timeout=300s
|
|
kubectl -n observability rollout status daemonset/promtail --timeout=300s
|
|
```
|
|
|
|
## Verify logs are arriving
|
|
|
|
Generate some app logs, then query Loki directly:
|
|
|
|
```bash
|
|
export KUBECONFIG=$PWD/.state/hetzner/kubeconfig.yaml
|
|
kubectl -n observability port-forward svc/loki 3100:3100
|
|
```
|
|
|
|
In another shell:
|
|
|
|
```bash
|
|
curl -sS 'http://127.0.0.1:3100/loki/api/v1/labels' | jq
|
|
curl -G -sS 'http://127.0.0.1:3100/loki/api/v1/query' \
|
|
--data-urlencode 'query={namespace="unrip"}' | jq
|
|
```
|
|
|
|
If those queries return labels/streams, pod logs are reaching Loki.
|
|
|
|
## Use Grafana
|
|
|
|
After logging into Grafana:
|
|
|
|
1. open **Explore**
|
|
2. choose the default **Loki** datasource
|
|
3. run queries like:
|
|
- `{namespace="unrip"}`
|
|
- `{namespace="forgejo"}`
|
|
- `{namespace="registry"}`
|
|
- `{pod=~"near-intents-ingest.*"}`
|
|
- `{container="app"}`
|
|
|
|
Useful labels added by promtail:
|
|
- `namespace`
|
|
- `pod`
|
|
- `container`
|
|
- `app`
|
|
- selected `app.kubernetes.io/*` labels
|
|
|
|
## Day-to-day ops
|
|
|
|
CLI remains useful for fast debugging:
|
|
|
|
```bash
|
|
kubectl get pods -A
|
|
kubectl -n unrip logs deploy/near-intents-ingest -f
|
|
kubectl -n forgejo logs deploy/forgejo -f
|
|
bash scripts/k8s/logs.sh
|
|
```
|
|
|
|
Use Grafana when you want:
|
|
- a browser UI
|
|
- historical log search
|
|
- multi-namespace filtering
|
|
- easier cross-pod inspection
|
|
|
|
## Security notes
|
|
|
|
Grafana is an admin/operator surface.
|
|
For this cluster it is publicly reachable behind Grafana login.
|
|
That is acceptable for this disposable single-node setup, but for a harder production posture prefer one of:
|
|
|
|
- Tailscale-only access
|
|
- ingress auth in front of Grafana
|
|
- SSO/OIDC
|
|
|
|
## Add a new app and have logs show up there
|
|
|
|
Nothing special is required as long as the new pod logs to stdout/stderr.
|
|
If you deploy a new app under Kubernetes and expose it through the usual manifests/Ingress flow, promtail will scrape its pod logs automatically.
|