K8s cluster state (etcd)
Cluster data store overview, checks, and impact.
Kubernetes cluster state (etcd)
The system of record that all components consult for cluster state
- etcd is a distributed, consistent, highly-available key-value store developed by the CoreOS team (now under the CNCF)
- It implements the Raft consensus algorithm to manage replication and provide strong consistency
- It’s designed for storing small amounts of critical data (metadata, configuration, state), not for large blobs or files
Losing etcd = losing the cluster state. Everything that isn’t running in memory or stored externally (e.g. in persistent volumes) is lost.
etcd — Kubernetes Cluster Data Storage Overview
| Type of Data | Examples |
|---|---|
| Cluster configuration & objects | Deployments, Services, ConfigMaps, Secrets, Namespaces |
| Node and Pod status / conditions | Which nodes are Ready, which pods are up, etc. |
| Leader election / component leases | API Server, controllers, scheduler, etc. |
| RBAC, policies, custom resources (CRDs) | Access rules, custom controllers’ state |
| Kubernetes metadata | Annotations, labels, object metadata |
Checking etcd on the dev cluster
-
Health and basic stats:
ETCDCTL_API=3 etcdctl --endpoints=https://192.168.30.203:2379 \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
--cert=/etc/ssl/etcd/ssl/node-dev-m-v1.pem \
--key=/etc/ssl/etcd/ssl/node-dev-m-v1-key.pem \
endpoint status --write-out=table---
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.30.203:2379 | 66aedcdc4903153b | 3.5.22 | 9.1 MB | true | false | 3 | 7994 | 7994 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+- Healthy single-member etcd:
- Leader: true (as expected with 1 node)
- Version: 3.5.22 matches kubeadm defaults
- DB size: ~9 MB (tiny, normal on a fresh cluster)
- Raft term/index/applied: in lock-step (no lag), no errors
- Healthy single-member etcd:
-
Endpoint health:
ETCDCTL_API=3 etcdctl --endpoints=https://192.168.30.203:2379 \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
--cert=/etc/ssl/etcd/ssl/node-dev-m-v1.pem \
--key=/etc/ssl/etcd/ssl/node-dev-m-v1-key.pem \
endpoint healthhttps://192.168.30.203:2379 is healthy: successfully committed proposal: took = 8.765833ms
Checking etcd on a different cluster
- This is from a three master node cluster which is having network issues due to incorrect MTU size. Note how it is affecting the etcd state, pay attention to the out of sync raft index, and that the leader node dropped out which will force an election
---
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.30.153:2379 | d9c4159e72350017 | 3.5.21 | 24 MB | false | false | 11 | 483121 | 483121 | |
| https://192.168.30.154:2379 | b0856bd284415d1c | 3.5.21 | 23 MB | true | false | 11 | 483121 | 483121 | |
| https://192.168.30.155:2379 | c151e437be1e717f | 3.5.21 | 23 MB | false | false | 11 | 483121 | 483121 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.30.153:2379 | d9c4159e72350017 | 3.5.21 | 24 MB | false | false | 11 | 483365 | 483365 | |
| https://192.168.30.154:2379 | b0856bd284415d1c | 3.5.21 | 23 MB | true | false | 11 | 483368 | 483368 | |
| https://192.168.30.155:2379 | c151e437be1e717f | 3.5.21 | 23 MB | false | false | 11 | 483346 | 483346 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------
Failed to get the status of endpoint https://192.168.30.154:2379 (context deadline exceeded)
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.30.153:2379 | d9c4159e72350017 | 3.5.21 | 24 MB | false | false | 11 | 483638 | 483638 | |
| https://192.168.30.155:2379 | c151e437be1e717f | 3.5.21 | 23 MB | false | false | 11 | 483653 | 483653 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Summary: etcd Network Impact
This example shows how network instability, particularly due to incorrect MTU settings, can directly affect etcd’s health and synchronisation across control-plane nodes.
When the MTU was misconfigured, etcd communication between the three master nodes degraded, resulting in out-of-sync Raft indices and eventual leader loss, which triggers a new election.
Key Observations
- Initially, all three members reported identical Raft indices (
483121), meaning the cluster was healthy and synchronised - As the network degraded, the indices started to diverge slightly, showing write lag between members (
483365,483368,483346) - Finally, one node (
192.168.30.154) dropped out completely, leading to acontext deadline exceedederror — a sign that etcdctl could no longer reach that endpoint within the timeout - The remaining two nodes stayed in the cluster but continued independently until the election resolved
Conclusion
Incorrect MTU or packet fragmentation disrupted etcd peer-to-peer communication, causing replication delays, loss of quorum, and instability in the Kubernetes control plane. Ensuring consistent MTU, low latency, and full bidirectional connectivity on ports 2379/2380 between masters is critical for etcd health.