Architecture

info

This page covers the architecture of the observability stack including component roles, data flow, and resource requirements.

Observability stack series

Observability stack
Architecture - You are here
Manifests
Flux integration
Operations

Overview

The stack provides four layers of observability:

Components

Prometheus

Prometheus is the core metrics collection and time-series database. It scrapes metrics from exporters and evaluates alerting rules.

Aspect	Configuration
Retention	15 days
Storage	50Gi PVC
Scrape targets	kube-state-metrics, node-exporter, application metrics

Alertmanager

Alertmanager handles alert routing, deduplication, and notification delivery. It reads the Slack webhook from a Kubernetes secret.

Aspect	Configuration
Grouping	By alertname and namespace
Group wait	30 seconds
Repeat interval	4 hours
Notification	Slack webhook

Grafana

Grafana provides dashboards for visualising metrics and logs. It comes pre-configured with Prometheus and Loki as data sources.

Aspect	Configuration
Data sources	Prometheus, Loki
Storage	5Gi PVC
Access	https://grafana.example.local

Loki

Loki is the log aggregation system. It stores logs in a single-binary deployment mode suitable for small to medium clusters.

Aspect	Configuration
Mode	SingleBinary
Storage	20Gi PVC (filesystem)
Schema	v13 with TSDB store
Replication	1 (single node)

Grafana Alloy

Grafana Alloy is the log collection agent that replaces the deprecated Promtail. It runs as a DaemonSet to collect logs from all nodes.

Aspect	Configuration
Deployment	DaemonSet (one per node)
Targets	All running pods
Labels	namespace, pod, container, node, app
Output	Loki at http://loki:3100

Uptime Kuma

Uptime Kuma provides external HTTP monitoring with its own web UI for configuration. It probes endpoints from inside the cluster and sends notifications directly to Slack.

Aspect	Configuration
Storage	1Gi PVC
Access	https://uptime.example.local
Monitors	Configured via UI

Supporting components

Component	Purpose
kube-state-metrics	Exports Kubernetes object metrics (pods, deployments, etc.)
node-exporter	Exports host-level metrics (CPU, memory, disk, NFS)
Prometheus Operator	Manages Prometheus, Alertmanager, and PrometheusRule CRDs

Data flow

Metrics path

Exporters expose metrics on /metrics endpoints
Prometheus scrapes metrics at configured intervals
Prometheus evaluates alerting rules against time-series data
Firing alerts are sent to Alertmanager
Alertmanager groups, deduplicates, and sends to Slack

Applications integrate by creating a ServiceMonitor (for scraping) and PrometheusRule (for alerts) in their namespace. See Email relay for an example.

Logs path

Pods write to stdout/stderr
Alloy collects logs from all pods and adds labels
Alloy pushes logs to Loki
Loki stores logs and serves queries
Prometheus can query Loki for log-based alerts
Alerts flow through Alertmanager to Slack

External monitoring path

Uptime Kuma probes configured endpoints
On failure or threshold breach, sends directly to Slack
Independent of Prometheus/Alertmanager pipeline

Resource requirements

Component	Memory	Storage	Replicas
Prometheus	~2GB	50Gi	1
Alertmanager	~256MB	5Gi	1
Grafana	~256MB	5Gi	1
Loki	~1GB	20Gi	1
Loki caches	~512MB	-	2
Alloy	~128MB	-	1 per node
Uptime Kuma	~256MB	1Gi	1
kube-state-metrics	~128MB	-	1
node-exporter	~64MB	-	1 per node

Total estimates:

Memory: ~5GB base + ~200MB per node
Storage: ~82Gi

HelmRepository placement

HelmRepositories are created in the monitoring namespace alongside the HelmReleases, following the same pattern as Gatekeeper in infra-trust.

This avoids cross-namespace references and keeps all monitoring resources together.

Observability stack series​

Overview​

Components​

Prometheus​

Alertmanager​

Grafana​

Loki​

Grafana Alloy​

Uptime Kuma​

Supporting components​

Data flow​

Metrics path​

Logs path​

External monitoring path​

Resource requirements​

HelmRepository placement​