Skip to main content

K8s cluster state (etcd)

Cluster data store overview, checks, and impact.

Kubernetes cluster state (etcd)

The system of record that all components consult for cluster state

  • etcd is a distributed, consistent, highly-available key-value store developed by the CoreOS team (now under the CNCF)
  • It implements the Raft consensus algorithm to manage replication and provide strong consistency
  • It’s designed for storing small amounts of critical data (metadata, configuration, state), not for large blobs or files

Losing etcd = losing the cluster state. Everything that isn’t running in memory or stored externally (e.g. in persistent volumes) is lost.

etcd — Kubernetes Cluster Data Storage Overview

Type of DataExamples
Cluster configuration & objectsDeployments, Services, ConfigMaps, Secrets, Namespaces
Node and Pod status / conditionsWhich nodes are Ready, which pods are up, etc.
Leader election / component leasesAPI Server, controllers, scheduler, etc.
RBAC, policies, custom resources (CRDs)Access rules, custom controllers’ state
Kubernetes metadataAnnotations, labels, object metadata

Checking etcd on the dev cluster

  • Health and basic stats:

    ETCDCTL_API=3 etcdctl --endpoints=https://192.168.30.203:2379 \
    --cacert=/etc/ssl/etcd/ssl/ca.pem \
    --cert=/etc/ssl/etcd/ssl/node-dev-m-v1.pem \
    --key=/etc/ssl/etcd/ssl/node-dev-m-v1-key.pem \
    endpoint status --write-out=table
    ---
    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | https://192.168.30.203:2379 | 66aedcdc4903153b | 3.5.22 | 9.1 MB | true | false | 3 | 7994 | 7994 | |
    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    • Healthy single-member etcd:
      • Leader: true (as expected with 1 node)
      • Version: 3.5.22 matches kubeadm defaults
      • DB size: ~9 MB (tiny, normal on a fresh cluster)
      • Raft term/index/applied: in lock-step (no lag), no errors
  • Endpoint health:

    ETCDCTL_API=3 etcdctl --endpoints=https://192.168.30.203:2379 \
    --cacert=/etc/ssl/etcd/ssl/ca.pem \
    --cert=/etc/ssl/etcd/ssl/node-dev-m-v1.pem \
    --key=/etc/ssl/etcd/ssl/node-dev-m-v1-key.pem \
    endpoint health
    https://192.168.30.203:2379 is healthy: successfully committed proposal: took = 8.765833ms

Checking etcd on a different cluster

  • This is from a three master node cluster which is having network issues due to incorrect MTU size. Note how it is affecting the etcd state, pay attention to the out of sync raft index, and that the leader node dropped out which will force an election
    ---
    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | https://192.168.30.153:2379 | d9c4159e72350017 | 3.5.21 | 24 MB | false | false | 11 | 483121 | 483121 | |
    | https://192.168.30.154:2379 | b0856bd284415d1c | 3.5.21 | 23 MB | true | false | 11 | 483121 | 483121 | |
    | https://192.168.30.155:2379 | c151e437be1e717f | 3.5.21 | 23 MB | false | false | 11 | 483121 | 483121 | |
    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | https://192.168.30.153:2379 | d9c4159e72350017 | 3.5.21 | 24 MB | false | false | 11 | 483365 | 483365 | |
    | https://192.168.30.154:2379 | b0856bd284415d1c | 3.5.21 | 23 MB | true | false | 11 | 483368 | 483368 | |
    | https://192.168.30.155:2379 | c151e437be1e717f | 3.5.21 | 23 MB | false | false | 11 | 483346 | 483346 | |
    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------

    Failed to get the status of endpoint https://192.168.30.154:2379 (context deadline exceeded)
    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | https://192.168.30.153:2379 | d9c4159e72350017 | 3.5.21 | 24 MB | false | false | 11 | 483638 | 483638 | |
    | https://192.168.30.155:2379 | c151e437be1e717f | 3.5.21 | 23 MB | false | false | 11 | 483653 | 483653 | |
    +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Summary: etcd Network Impact

This example shows how network instability, particularly due to incorrect MTU settings, can directly affect etcd’s health and synchronisation across control-plane nodes.

When the MTU was misconfigured, etcd communication between the three master nodes degraded, resulting in out-of-sync Raft indices and eventual leader loss, which triggers a new election.

Key Observations

  • Initially, all three members reported identical Raft indices (483121), meaning the cluster was healthy and synchronised
  • As the network degraded, the indices started to diverge slightly, showing write lag between members (483365, 483368, 483346)
  • Finally, one node (192.168.30.154) dropped out completely, leading to a context deadline exceeded error — a sign that etcdctl could no longer reach that endpoint within the timeout
  • The remaining two nodes stayed in the cluster but continued independently until the election resolved

Conclusion

Incorrect MTU or packet fragmentation disrupted etcd peer-to-peer communication, causing replication delays, loss of quorum, and instability in the Kubernetes control plane. Ensuring consistent MTU, low latency, and full bidirectional connectivity on ports 2379/2380 between masters is critical for etcd health.