Etcd

PostedDecember 26, 2021

UpdatedFebruary 22, 2026

Author -Rajkumar Aute

Etcd Architecture: The “Memory” and “Truth” of the Cluster

In the complex ecosystem of Kubernetes, Etcd plays the role of the brain’s hippocampus it is the dedicated memory center. While the Control Plane components (like the API Server, Scheduler, and Controller Manager) act as the decision-makers and workers, Etcd is the notebook where every decision, every change, and every state is recorded.

Only the API Server is allowed to talk directly to Etcd. All other components must ask the API Server for information.

Quick Reference

Component / Concept	Role	Simple Analogy	Best Way to Remember
Etcd	The Memory	A Shared Notebook	“If Etcd dies, the cluster dies.”
Key-Value Store	Data Structure	A Phonebook	Look up a name (Key), get a number (Value).
Raft Algorithm	Consistency Logic	Voting in a Democracy	The majority rules. If 2 out of 3 agree, it’s the truth.
Snapshot	Backup	Photocopying the Notebook	The only way to save your cluster from disaster.

Etcd is a consistent and highly-available key-value store. It stores the absolute “Truth” of the cluster.

How Etcd Works

The Vault (Storage)

Etcd stores absolutely every piece of configuration data required by the cluster. This includes:

Secrets: Passwords, tokens, and keys.
ConfigMaps: Configuration files for apps.
Cluster State: Which nodes are healthy, which pods are running.
Kubernetes Objects: Deployments, Services, DaemonSets.

The Consensus (Raft Algorithm)

Etcd uses the Raft Consensus Algorithm to maintain order. In a distributed system, you can’t just write data to one hard drive; you must replicate it. Raft ensures:

Leader Election: One node is elected the “Leader.” All writes must go to the Leader.
Log Replication: The Leader sends the data to “Followers.”
Commit: Once a majority (Quorum) confirms they have received the data, the Leader “commits” it, and the write is successful.

The Data Structure (Key-Value)

It is not a spreadsheet with rows and columns. It is a dictionary.

Key: /registry/pods/default/nginx-pod
Value: { "kind": "Pod", "apiVersion": "v1", ... }

Strong Consistency & Versioning

Unlike many NoSQL databases that are “Eventually Consistent” (meaning data might take a few seconds to sync across nodes), Etcd is Strongly Consistent.

Guarantee: If you write a value to Etcd, the very next read request is guaranteed to return that new value. This is critical for Kubernetes; you cannot have the Scheduler thinking a node is empty when the API Server just filled it with a Pod.
Versioning: Etcd stores a revision number for every change. This allows “Time Travel” (undoing changes), which is how kubectl rollout undo works logically.

–

Deployment Topologies: Stacked vs. External

Deciding where Etcd lives is a major architectural decision.

1. Stacked Etcd

Setup: Etcd runs on the same servers (nodes) as the Kubernetes Control Plane components.
Pros: Easy to set up (default in kubeadm), requires fewer servers, lower cost.
Cons: If the Control Plane node goes down, you lose both the manager and a database replica. High resource contention (CPU/Memory) between Kubernetes components and Etcd.
Best For: Small to medium clusters, development environments.

2. External Etcd

Setup: Etcd runs on its own dedicated cluster of servers, separate from Kubernetes Control Plane nodes.
Pros: Maximum resilience. If the Control Plane crashes, data is safe. Dedicated resources ensure disk I/O is not stolen by other processes.
Cons: More expensive (more servers), significantly more complex to configure and manage (certs, networking).
Best For: Large production enterprises, mission-critical systems.

–

Use Case

Service Discovery: Storing the IP addresses and ports of services so others can find them.
Distributed Locking: Ensuring that two Schedulers don’t try to schedule the same pod at the same time.

Benefits

Reliability: Designed to survive hardware failures without data loss (as long as Quorum exists).
Simplicity: Uses a simple HTTP/gRPC API.

Limitations

Storage Limit: Etcd is designed for metadata, not big data. The default storage limit is 2GB (configurable to 8GB). If you try to store a 1GB file in a ConfigMap, Etcd will reject it or crash.
Network Sensitivity: Since it replicates data instantly, high network latency between Etcd nodes can break the consensus.

Common Issues & Solutions

Issue	Problem	Solution
Database Full	Error: `etcdserver: mvcc: database space exceeded.` The cluster stops accepting writes.	You must compact the revision history and defrag the database using the `etcdctl` command-line tool.
Member Failure	One Etcd node dies permanently.	You must manually remove the dead member from the cluster list (`etcdctl member remove`) and add a new one. Etcd does not auto-heal like a Kubernetes Pod.

Etcd

Quick Reference

How Etcd Works

The Vault (Storage)

The Consensus (Raft Algorithm)

The Data Structure (Key-Value)

Strong Consistency & Versioning

Deployment Topologies: Stacked vs. External

1. Stacked Etcd

2. External Etcd

Use Case

Benefits

Limitations

Common Issues & Solutions

Lab Kubernetes Etcd

Quiz Kubernetes ETCD

Tech should learn

AWS(Draft)

AWS-Cloud-Tech

AWS-Compute

DevOps Essentials

DevSecOps Essentials(Draft)

CI/CD

GitHub Actions

Docker

Kubernetes (Draft)

The Kubernetes Foundation

Kubernetes Architecture

Kubernetes Setting Up the Lab

Kubernetes Namespace

Kubernetes Pod

Kubernetes Workload Controller

Kubernetes Storage and Configurations

Kubernetes Networking

Kubernetes Authentication & Authorization

Kubernetes Manifests

AWS Elastic Kubernetes Service

EKS Architecture

AWS EKS Identity & Access Management

EKS Configuration & Storage

EKS Workload Controllers

EKS Advanced Networking & Traffic Management

EKS Workload Security

EKS Observability & Troubleshooting

EKS CI/CD, GitOps

EKS Platform Engineering

EKS Cluster Upgrades & Reliability

EKS AI, ML, LLMs

Programming

Python