Skip to main content

Architecture

info

This page covers the architecture of the observability stack including component roles, data flow, and resource requirements.

Observability stack series

  1. Observability stack
  2. Architecture - You are here
  3. Manifests
  4. Flux integration
  5. Operations

Overview

The stack provides four layers of observability:

Components

Prometheus

Prometheus is the core metrics collection and time-series database. It scrapes metrics from exporters and evaluates alerting rules.

AspectConfiguration
Retention15 days
Storage50Gi PVC
Scrape targetskube-state-metrics, node-exporter, application metrics

Alertmanager

Alertmanager handles alert routing, deduplication, and notification delivery. It reads the Slack webhook from a Kubernetes secret.

AspectConfiguration
GroupingBy alertname and namespace
Group wait30 seconds
Repeat interval4 hours
NotificationSlack webhook

Grafana

Grafana provides dashboards for visualising metrics and logs. It comes pre-configured with Prometheus and Loki as data sources.

AspectConfiguration
Data sourcesPrometheus, Loki
Storage5Gi PVC
Accesshttps://grafana.example.local

Loki

Loki is the log aggregation system. It stores logs in a single-binary deployment mode suitable for small to medium clusters.

AspectConfiguration
ModeSingleBinary
Storage20Gi PVC (filesystem)
Schemav13 with TSDB store
Replication1 (single node)

Grafana Alloy

Grafana Alloy is the log collection agent that replaces the deprecated Promtail. It runs as a DaemonSet to collect logs from all nodes.

AspectConfiguration
DeploymentDaemonSet (one per node)
TargetsAll running pods
Labelsnamespace, pod, container, node, app
OutputLoki at http://loki:3100

Uptime Kuma

Uptime Kuma provides external HTTP monitoring with its own web UI for configuration. It probes endpoints from inside the cluster and sends notifications directly to Slack.

AspectConfiguration
Storage1Gi PVC
Accesshttps://uptime.example.local
MonitorsConfigured via UI

Supporting components

ComponentPurpose
kube-state-metricsExports Kubernetes object metrics (pods, deployments, etc.)
node-exporterExports host-level metrics (CPU, memory, disk, NFS)
Prometheus OperatorManages Prometheus, Alertmanager, and PrometheusRule CRDs

Data flow

Metrics path

  1. Exporters expose metrics on /metrics endpoints
  2. Prometheus scrapes metrics at configured intervals
  3. Prometheus evaluates alerting rules against time-series data
  4. Firing alerts are sent to Alertmanager
  5. Alertmanager groups, deduplicates, and sends to Slack

Applications integrate by creating a ServiceMonitor (for scraping) and PrometheusRule (for alerts) in their namespace. See Email relay for an example.

Logs path

  1. Pods write to stdout/stderr
  2. Alloy collects logs from all pods and adds labels
  3. Alloy pushes logs to Loki
  4. Loki stores logs and serves queries
  5. Prometheus can query Loki for log-based alerts
  6. Alerts flow through Alertmanager to Slack

External monitoring path

  1. Uptime Kuma probes configured endpoints
  2. On failure or threshold breach, sends directly to Slack
  3. Independent of Prometheus/Alertmanager pipeline

Resource requirements

ComponentMemoryStorageReplicas
Prometheus~2GB50Gi1
Alertmanager~256MB5Gi1
Grafana~256MB5Gi1
Loki~1GB20Gi1
Loki caches~512MB-2
Alloy~128MB-1 per node
Uptime Kuma~256MB1Gi1
kube-state-metrics~128MB-1
node-exporter~64MB-1 per node

Total estimates:

  • Memory: ~5GB base + ~200MB per node
  • Storage: ~82Gi

HelmRepository placement

HelmRepositories are created in the monitoring namespace alongside the HelmReleases, following the same pattern as Gatekeeper in infra-trust.

This avoids cross-namespace references and keeps all monitoring resources together.