Etcd
Etcd Architecture: The “Memory” and “Truth” of the Cluster
In the complex ecosystem of Kubernetes, Etcd plays the role of the brain’s hippocampus it is the dedicated memory center. While the Control Plane components (like the API Server, Scheduler, and Controller Manager) act as the decision-makers and workers, Etcd is the notebook where every decision, every change, and every state is recorded.
Key Characteristics to Remember
- The Vault: It stores all cluster data (Secrets, ConfigMaps, Pods, etc.).
- The Consensus: It uses the Raft Algorithm to ensure all copies of the database agree on the truth.
- The Key-Value: It’s not a table (like Excel); it’s a dictionary (Key = Name, Value = Data).
- The Sensitivity: It is extremely sensitive to disk speed. Slow disks = Broken cluster.
| Component | Role | Simple Analogy | Best Way to Remember |
| Etcd | The Memory | A Shared Notebook | “If Etcd dies, the cluster dies.” |
| Key-Value Store | Data Structure | A Phonebook | Look up a name (Key), get a number (Value). |
| Raft Algorithm | Consistency Logic | Voting in a Democracy | The majority rules. If 2 out of 3 agree, it’s truth. |
| Snapshot | Backup | Photocopying the Notebook | The only way to save your cluster from disaster. |
Etcd is a consistent and highly-available key-value store. It stores the “Truth” of the cluster.
The Vault (Storage)
Etcd stores absolutely every piece of configuration data required by the cluster. This includes:
- Secrets: Passwords, tokens, and keys.
- ConfigMaps: Configuration files for apps.
- Cluster State: Which nodes are healthy, which pods are running.
- Kubernetes Objects: Deployments, Services, DaemonSets.
The Consensus (Raft Algorithm)
Etcd uses the Raft Consensus Algorithm to maintain order. In a distributed system, you can’t just write data to one hard drive; you must replicate it. Raft ensures:
- Leader Election: One node is elected the “Leader.” All writes must go to the Leader.
- Log Replication: The Leader sends the data to “Followers.”
- Commit: Once a majority (Quorum) confirms they have received the data, the Leader “commits” it, and the write is successful.
The Data Structure (Key-Value)
It is not a spreadsheet with rows and columns. It is a dictionary.
- Key:
/registry/pods/default/nginx-pod - Value:
{ "kind": "Pod", "apiVersion": "v1", ... }
Strong Consistency
Unlike many NoSQL databases that are “Eventually Consistent” (meaning data might take a few seconds to sync), Etcd is Strongly Consistent.
- Guarantee: If you write a value to Etcd, the very next read request is guaranteed to return that new value. This is critical for Kubernetes; you cannot have the Scheduler thinking a node is empty when the API Server just filled it with a Pod.
–
Deployment Topologies: Stacked vs. External
Deciding where Etcd lives is a major architectural decision.
A. Stacked Etcd
- Setup: Etcd runs on the same servers (nodes) as the Kubernetes Control Plane components.
- Pros: Easy to set up (default in
kubeadm), requires fewer servers, lower cost. - Cons: If the Control Plane node goes down, you lose both the manager and a database replica. High resource contention (CPU/Memory) between Kubernetes components and Etcd.
- Best For: Small to medium clusters, development environments.
B. External Etcd
- Setup: Etcd runs on its own dedicated cluster of servers, separate from Kubernetes Control Plane nodes.
- Pros: Maximum resilience. If the Control Plane crashes, data is safe. Dedicated resources ensure disk I/O is not stolen by other processes.
- Cons: More expensive (more servers), significantly more complex to configure and manage (certs, networking).
- Best For: Large production enterprises, mission-critical systems.
–
Key Characteristics
- Strong Consistency: Unlike some NoSQL DBs (which are “Eventually Consistent”), Etcd is “Strongly Consistent.” If you write data, the next read is guaranteed to see it.
- Versioning: It stores a revision number for every change. This allows “Time Travel” (undoing changes), which is how
kubectl rollout undoworks logically.
Use Case
- Service Discovery: Storing the IP addresses and ports of services so others can find them.
- Distributed Locking: Ensuring that two Schedulers don’t try to schedule the same pod at the same time.
Benefits
- Reliability: Designed to survive hardware failures without data loss (as long as Quorum exists).
- Simplicity: Uses a simple HTTP/gRPC API.
Limitations
- Storage Limit: Etcd is designed for metadata, not big data. The default storage limit is 2GB (configurable to 8GB). If you try to store a 1GB file in a ConfigMap, Etcd will reject it or crash.
- Network Sensitivity: Since it replicates data instantly, high network latency between Etcd nodes can break the consensus.
Common Issues & Solutions
- Issue:Database Full.
- Problem: Error
etcdserver: mvcc: database space exceeded. - Solution: The cluster stops accepting writes. You must compact the revision history and defrag the database using
etcdctl.
- Problem: Error
- Issue:Member Failure.
- Problem: One Etcd node dies permanently.
- Solution: You must manually remove the dead member from the cluster list (
etcdctl member remove) and add a new one. It doesn’t auto-heal like a Kubernetes Pod.