Etcd
Etcd Architecture: The “Memory” and “Truth” of the Cluster
In the complex ecosystem of Kubernetes, Etcd plays the role of the brain’s hippocampus it is the dedicated memory center. While the Control Plane components (like the API Server, Scheduler, and Controller Manager) act as the decision-makers and workers, Etcd is the notebook where every decision, every change, and every state is recorded.
Only the API Server is allowed to talk directly to Etcd. All other components must ask the API Server for information.
Quick Reference
| Component / Concept | Role | Simple Analogy | Best Way to Remember |
| Etcd | The Memory | A Shared Notebook | “If Etcd dies, the cluster dies.” |
| Key-Value Store | Data Structure | A Phonebook | Look up a name (Key), get a number (Value). |
| Raft Algorithm | Consistency Logic | Voting in a Democracy | The majority rules. If 2 out of 3 agree, it’s the truth. |
| Snapshot | Backup | Photocopying the Notebook | The only way to save your cluster from disaster. |
Etcd is a consistent and highly-available key-value store. It stores the absolute “Truth” of the cluster.
How Etcd Works
The Vault (Storage)
Etcd stores absolutely every piece of configuration data required by the cluster. This includes:
- Secrets: Passwords, tokens, and keys.
- ConfigMaps: Configuration files for apps.
- Cluster State: Which nodes are healthy, which pods are running.
- Kubernetes Objects: Deployments, Services, DaemonSets.
The Consensus (Raft Algorithm)
Etcd uses the Raft Consensus Algorithm to maintain order. In a distributed system, you can’t just write data to one hard drive; you must replicate it. Raft ensures:
- Leader Election: One node is elected the “Leader.” All writes must go to the Leader.
- Log Replication: The Leader sends the data to “Followers.”
- Commit: Once a majority (Quorum) confirms they have received the data, the Leader “commits” it, and the write is successful.
The Data Structure (Key-Value)
It is not a spreadsheet with rows and columns. It is a dictionary.
- Key:
/registry/pods/default/nginx-pod - Value:
{ "kind": "Pod", "apiVersion": "v1", ... }
Strong Consistency & Versioning
Unlike many NoSQL databases that are “Eventually Consistent” (meaning data might take a few seconds to sync across nodes), Etcd is Strongly Consistent.
- Guarantee: If you write a value to Etcd, the very next read request is guaranteed to return that new value. This is critical for Kubernetes; you cannot have the Scheduler thinking a node is empty when the API Server just filled it with a Pod.
- Versioning: Etcd stores a revision number for every change. This allows “Time Travel” (undoing changes), which is how
kubectl rollout undoworks logically.
–
Deployment Topologies: Stacked vs. External
Deciding where Etcd lives is a major architectural decision.
1. Stacked Etcd
- Setup: Etcd runs on the same servers (nodes) as the Kubernetes Control Plane components.
- Pros: Easy to set up (default in
kubeadm), requires fewer servers, lower cost. - Cons: If the Control Plane node goes down, you lose both the manager and a database replica. High resource contention (CPU/Memory) between Kubernetes components and Etcd.
- Best For: Small to medium clusters, development environments.
2. External Etcd
- Setup: Etcd runs on its own dedicated cluster of servers, separate from Kubernetes Control Plane nodes.
- Pros: Maximum resilience. If the Control Plane crashes, data is safe. Dedicated resources ensure disk I/O is not stolen by other processes.
- Cons: More expensive (more servers), significantly more complex to configure and manage (certs, networking).
- Best For: Large production enterprises, mission-critical systems.
–
Use Case
- Service Discovery: Storing the IP addresses and ports of services so others can find them.
- Distributed Locking: Ensuring that two Schedulers don’t try to schedule the same pod at the same time.
Benefits
- Reliability: Designed to survive hardware failures without data loss (as long as Quorum exists).
- Simplicity: Uses a simple HTTP/gRPC API.
Limitations
- Storage Limit: Etcd is designed for metadata, not big data. The default storage limit is 2GB (configurable to 8GB). If you try to store a 1GB file in a ConfigMap, Etcd will reject it or crash.
- Network Sensitivity: Since it replicates data instantly, high network latency between Etcd nodes can break the consensus.
Common Issues & Solutions
| Issue | Problem | Solution |
| Database Full | Error: etcdserver: mvcc: database space exceeded. The cluster stops accepting writes. | You must compact the revision history and defrag the database using the etcdctl command-line tool. |
| Member Failure | One Etcd node dies permanently. | You must manually remove the dead member from the cluster list (etcdctl member remove) and add a new one. Etcd does not auto-heal like a Kubernetes Pod. |
- Encrypting Secret Data at Rest
- Official Etcd Documentation
- Operating Etcd Clusters for Kubernetes
- Backing up an Etcd Cluster