Kubernetes Architecture
Welcome to Chapter 3! This is the most important chapter for interviews. If you don’t understand the architecture, you cannot debug a broken cluster.
Kubernetes follows a Client-Server Architecture. We divide the cluster into two main parts:
- The Control Plane (Master Node): The “Brain” that makes decisions.
- The Data Plane (Worker Nodes): The “Body” that does the actual work.
- Kubernetes Components
The Control Plane
Kubernetes Control Plane Architecture: The “Brain” of the Cluster
Imagine a busy airport. You have planes (containers) carrying passengers (applications). You have runways and gates (Worker Nodes) where these planes operate. But who manages the schedule? Who decides which plane lands where? Who tracks flight statuses? That is the Air Traffic Control tower.
In Kubernetes, the Control Plane is that Air Traffic Control tower. It is the absolute “Brain” of the operation. It does not carry the luggage (run the apps) itself; instead, it constantly makes decisions to ensure the airport runs smoothly. If the Control Plane goes down, the planes might keep flying for a bit, but no new orders can be given, and chaos will eventually strike.
Cheat Sheet
| Component | Scope | Role | Simple Analogy | Best Way to Remember |
| Kube-API Server | Master | Validates and configures data. The “hub” of the cluster. | Receptionist | The only component that talks to the Etcd database. |
| Etcd | Master | Distributed Key-Value store. | The Source of Truth | If it isn’t in Etcd, it doesn’t exist in the cluster. |
| Kube-Scheduler | Master | Watches for newly created Pods with no assigned node. | The Matchmaker | Finds the best “home” for a Pod based on resources. |
| Kube-Controller Manager | Master | Watches the state and makes changes to reach the “Desired State.” | The Thermostat | Notices if the “room” is too cold (pod down) and turns on the heat. |
| Cloud Controller Manager | Master | Manages cloud-specific integrations (LB, Storage, Routes). | The Liaison | Translates K8s requests into AWS/GCP/Azure commands. |
| Kubelet | Master & Worker | Manages Control Plane Pods (Static Pods). The primary “node agent.” Reports back to the API server. | The Foreman | Takes the “blueprint” from the API and ensures the containers run. |
| Container Runtime | Master & Worker | Runs Control Plane containers. The software that pulls images and runs containers. | The Engine | The actual worker (containerd, CRI-O) that starts the process. |
| Kube-Proxy | Master & Worker | Routes traffic from Master to Services. Handles host sub-netting and makes services available. | The Traffic Cop | Manages the IP tables/IPVS so Pods can talk to each other. |
The Control Plane manages the state of the cluster. It rarely runs applications itself; its job is to manage the workers. It consists of 4 key components:
1. Kube-API Server (The Front Desk)
In Kubernetes, the Kube-API Server is that Front Desk. It is the central management entity. Whether you are a human using kubectl, a robot (CI/CD pipeline), or a worker node reporting status, everyone must talk to the API Server first. It is the only component that ever touches the database (Etcd).
Key Characteristics to Remember
- The Hub: It is the central meeting point for all cluster communications.
- The Guard: It validates every request before processing it.
- The Messenger: It is the only component allowed to write to the Etcd database.
- The Scaler: Unlike other control plane components, it acts like a web server and scales horizontally (you can run many of them).
| Component | Role | Simple Analogy | Best Way to Remember |
| Kube-API Server | The Gatekeeper | Front Desk / Customs Officer | “All roads lead to the API Server.” |
| Authentication | ID Check | Showing your Passport | “Who are you?” |
| Authorization | Access Check | Checking your Visa/Ticket | “Are you allowed here?” |
| Admission Control | Safety Check | Metal Detector / Security Scan | “Is your request safe?” |
To Learn More about topic
Kube-API Server
2. Etcd (The Memory)
Etcd Architecture: The “Memory” and “Truth” of the Cluster
In Kubernetes, Etcd is that notebook. It is the permanent storage for the cluster. Unlike a traditional database (like SQL) which is complex and heavy, Etcd is simple, fast, and designed to never lose data, even if a server crashes. It is the “Single Source of Truth.” If it’s not written in Etcd, it didn’t happen.
Key Characteristics to Remember
- The Vault: It stores all cluster data (Secrets, ConfigMaps, Pods, etc.).
- The Consensus: It uses the Raft Algorithm to ensure all copies of the database agree on the truth.
- The Key-Value: It’s not a table (like Excel); it’s a dictionary (Key = Name, Value = Data).
- The Sensitivity: It is extremely sensitive to disk speed. Slow disks = Broken cluster.
| Component | Role | Simple Analogy | Best Way to Remember |
| Etcd | The Memory | A Shared Notebook | “If Etcd dies, the cluster dies.” |
| Key-Value Store | Data Structure | A Phonebook | Look up a name (Key), get a number (Value). |
| Raft Algorithm | Consistency Logic | Voting in a Democracy | The majority rules. If 2 out of 3 agree, it’s truth. |
| Snapshot | Backup | Photocopying the Notebook | The only way to save your cluster from disaster. |
Read more for more details
Etcd
3. Kube-Scheduler
Kube-Scheduler: The “Decision Maker” & Cluster Planner
In Kubernetes, the Kube-Scheduler is that manager. It never touches the actual containers. It simply watches for new Pods that are “homeless” (have no node assigned) and runs a complex algorithm to find the perfect home for them based on resources, rules, and restrictions.
Key Characteristics to Remember
- The Matchmaker: Matches Pods to Nodes.
- The Observer: It watches for Pods with
nodeName: empty. - The Two-Step Logic: First it Filters (Can it fit?), then it Scores (Is it the best fit?).
- The Hands-Off Leader: It assigns the node but does not start the pod.
| Feature | Description | Simple Analogy |
| Filtering (Predicates) | Eliminating unsuitable nodes. | “This shirt is too small, discard it.” |
| Scoring (Priorities) | Ranking the remaining nodes. | “These 3 shirts fit, but the red one looks best.” |
| Taints & Tolerations | Repelling pods from nodes. | “This seat is ‘Reserved’. You can’t sit here unless you have a VIP ticket.” |
| Node Affinity | Attracting pods to nodes. | “I prefer to sit near the window.” |
The Scheduling Loop
- The Trigger: The Scheduler constantly watches the API Server. When you run
kubectl run nginx, a Pod object is created in Etcd, but itsspec.nodeNamefield is blank. The Scheduler sees this “Unbound Pod” and wakes up. - The Decision:
- Phase 1: Filtering (Hard Constraints): It checks all nodes.
- Check: Does Node A have enough CPU?
- Check: Does Node B have the required label (
disk=ssd)? - Result: Nodes that fail are removed from the list.
- Phase 2: Scoring (Soft Constraints): It ranks the survivors.
- Check: Node C has the docker image cached (Score +10).
- Check: Node D is empty and has lower load (Score +5).
- Result: Node C wins.
- Phase 1: Filtering (Hard Constraints): It checks all nodes.
- The Action: The Scheduler sends a “Binding” object to the API Server, effectively writing “Node C” into the Pod’s
spec.nodeName.
Taints and Tolerations (The “Repellent”)
- This is a critical concept.
- Taint: applied to a Node (e.g., “This node is for GPU tasks only”).
- Toleration: applied to a Pod (e.g., “I am a GPU task, I can tolerate that taint”).
- Analogy: A Taint is like a “Bad Smell” on the node. Only pods that “Tolerate” the smell will land there. Everyone else stays away.
Affinity and Anti-Affinity (The “Magnet”)
- Node Affinity: “I want to run on a node that is in the ‘US-East’ zone.” (Attraction).
- Pod Affinity: “I want to run on the same node as the Database Pod.” (Togetherness).
- Pod Anti-Affinity: “I do not want to run on the same node as another Web Server.” (Separation – useful for High Availability so one server crash doesn’t kill both apps).
Read more for more details
kube-scheduler
4. Kube-Controller Manager (The Enforcer)
This is a single binary that contains a loop that regulates the state of the system. It runs logically distinct controllers inside it.
- Node Controller: Notices when nodes go down.
- Job Controller: Watches for Job objects and creates Pods to run those tasks.
- EndpointSlice Controller: Populates EndpointSlice objects (links Services to Pods).
- Role: The State Manager.
- How it works: It constantly compares the Desired State (what you want) with the Current State (what is happening).
- Example: You asked for 3 replicas of Nginx. The Controller checks: “Are there 3?” If one crashes and there are only 2, the Controller notices the difference and orders the creation of a new one.
- Types: Node Controller, Replication Controller, Endpoint Controller.
Read more details
Kube-Controller Manager
5. Cloud Controller Manager (CCM)
The CCM does not manage Pods or Deployments. It manages only three specific things. Mastering CCM means understanding exactly what these three loops do:
1. Node Controller (The “Inventory” Manager)
- Function: When a new Node joins the cluster, the CCM talks to the Cloud API to verify it.
2. Route Controller (The “Networking” Manager)
- Function: Configures the underlying cloud network (VPC) to route traffic between Pods on different nodes.
3. Service Controller (The “Load Balancer” Manager)
- Function: Watches for Services of
type: LoadBalancer.
Read more details
cloud-controller-manager
The “Pause” Container Logic
- While not a Control Plane component, the Control Plane relies on the container runtime. The API Server does not talk to the container runtime directly; the Kubelet on the worker node does. The Control Plane simply updates the spec in Etcd, and the Kubelet watches for that change. This asynchronous communication is vital to understand.
Key Characteristics
- Stateless (Mostly): The API Server, Scheduler, and Controller Manager are stateless. They don’t save data locally; they push it to Etcd.
- Modular: You can actually swap out the default Scheduler for a custom one if you have very specific needs.
- Secure: Communication between these components is encrypted via TLS (Transport Layer Security).
Use Case
- Central Management: Provides a single point of authority for the entire cluster.
- Self-Healing: Through the Controller Manager, it automatically replaces failed pods without human intervention.
- Scheduling Intelligence: Optimizes hardware usage by placing workloads on the most appropriate nodes (bin packing).
Benefits
- Abstraction: Developers don’t need to know which server their app is on; the Control Plane handles it.
- High Availability: Can be replicated across multiple zones (e.g., 3 Master Nodes) so the cluster survives hardware failures.
- Extensibility: Custom Resource Definitions (CRDs) allow the Control Plane to manage non-standard resources (like database backups or certificates).
Limitations
- Etcd Limits: Etcd has a hard limit on request size (default 1.5MB). Storing massive config files or binaries in Kubernetes Secrets/ConfigMaps can crash the Control Plane.
- Scalability Cap: While improved, a single Control Plane has limits on the number of nodes (approx. 5,000 nodes) and pods (approx. 150,000 pods) it can manage effectively before performance degrades.
The Data Plane (The Worker Nodes)
This is where your actual applications (Containers) run. A cluster can have 1 worker or 5,000 workers.
1. Kubelet (The Captain)
- Role: The Agent.
- How it works: Every Worker Node has a Kubelet. It is the “spy” for the Control Plane.
- Duty: It listens to the API Server. If the API Server says, “Start a pod for me,” the Kubelet talks to the Container Runtime to pull the image and start the container.
- Reporting: It constantly reports Node health (CPU, RAM, Disk) back to the Master.
2. Kube-Proxy (The Networker)
- Role: The Traffic Cop.
- How it works: It maintains network rules on the node. It ensures that traffic destined for a Service gets routed to the correct Pod backend.
- Mechanism: It uses
iptablesorIPVS(as discussed in Chapter 1) to forward packets.
3. Container Runtime (The Engine)
- Role: The Worker.
- How it works: This is the software that actually runs the containers.
- Examples: Containerd, CRI-O, Docker Engine.
3.3 The “Pause” Container (The Hidden Secret)
(The invisible glue holding a Pod together)
If you SSH into a worker node and run docker ps (or crictl ps), you will see many containers you didn’t create named pause.
What is a Pod really? We say a Pod is “one or more containers sharing a network.” But how do they share it? If Container A dies and restarts, it gets a new ID. How does it keep the same IP address?
The Solution: The Pause Container
- When you schedule a Pod, Kubernetes starts a tiny, empty container called the Pause Container first.
- This container reserves the Network Namespace (IP Address) and keeps it “open.”
- Your actual app (e.g., Nginx) joins this namespace.
- If Nginx crashes and restarts, the Pause container stays alive, holding onto the IP address.
- The new Nginx joins the same Pause container, keeping the same IP.
💡 Summary: The Pause container is the “parent” that holds the network resources so the “children” (your apps) can die and restart without losing their network identity.