Rebuilding My Monitoring Stack
I discovered my website was down by chance today.
The connection looked healthy from the outside, but the real issue was buried deep in the database logs: a configuration problem that my existing monitoring didn't detect. The fix was a one-line change once I could actually see what was happening.

I was not happy with myself so I rebuilt my monitoring system from the ground up.
In a couple of hours I deployed a full stack that collects performance metrics, aggregates logs from every service, monitors my site from outside the network, and sends alerts to Slack when something goes wrong. All automated, all version-controlled, all designed so I can reproduce the exact same setup if I ever need to.
The stack:
- External monitoring: Uptime Kuma
- Kubernetes metrics: Prometheus + kube-state-metrics
- Application logs: Loki + Grafana Alloy
- Alerting: Alertmanager to Slack
Is the site reachable from the internet? Are all the services running properly? What are the applications actually logging? How are the servers themselves performing? If the database has problems again, I will know within minutes rather than discovering it by accident.
Within minutes of deployment, alerts started arriving for a development environment I had not touched in weeks. A silent failure I would never have noticed. The monitoring proved its value immediately.
I run my production infrastructure on old computers recovered from my shed. A full monitoring stack felt like overkill, but having visibility into what is actually happening is worth the effort.
The photo was taken while creating the stack on my MacBook in my garden while watching a flock come down for a bath. There were nearly 30 birds here at one point.
I'm Andy, I may suck at many things but that doesn't stop me from trying.
Read more: Observability Stack