Kubernetes Architecture
If you don’t understand the architecture, you cannot debug a broken cluster.
Kubernetes follows a Client-Server Architecture. We divide the cluster into two main parts:
- The Control Plane (Master Node): The “Brain” that makes decisions.
- The Data Plane (Worker Nodes): The “Body” that does the actual work.
https://kubernetes.io/docs/concepts/overview/components
The Control Plane: Master Node
Kubernetes Control Plane Architecture: The “Brain” of the Cluster
Imagine a busy airport. You have planes (containers) carrying passengers (applications). You have runways and gates (Worker Nodes) where these planes operate. But who manages the schedule? Who decides which plane lands where? Who tracks flight statuses? That is the Air Traffic Control tower.
In Kubernetes, the Control Plane is that Air Traffic Control tower. It is the absolute “Brain” of the operation. It does not carry the luggage (run the apps) itself; instead, it constantly makes decisions to ensure the airport runs smoothly. If the Control Plane goes down, the planes might keep flying for a bit, but no new orders can be given, and chaos will eventually strike.
The “ATC Tower” Staff (Control Plane Components)
- kube-apiserver (The Tower Radio & Security): The central hub and gatekeeper. It authenticates, authorizes, and validates every instruction. Whether it’s a pilot (developer via
kubectl) or internal crew (controllers), all communication flows through here. It is the stateless interface that proxies’ communication to the database. - etcd (The Master Flight Plan): The single source of truth. A highly consistent, distributed key-value store that persists the cluster’s state (configuration, secrets, metadata). If the tower reboots, the airport recovers exactly as it was only if this log is intact.
- kube-scheduler (The Gate Agent): Assigns “parking spots” (Nodes) to incoming “aircraft” (Pods). It analyzes constraints like runway length (CPU/RAM), gate compatibility (Taints/Tolerations), and VIP grouping (Affinity) to select the optimal node. It does not execute the placement; it only assigns the destination.
- kube-controller-manager (The Ground Operations Crew): Ensures reality matches the schedule. A single binary running multiple control loops (e.g., Node Controller, ReplicaSet Controller) that continuously monitor the cluster. If a plane is missing (Pod crash) or a terminal goes dark (Node failure), this crew issues orders to restore the desired count.
- cloud-controller-manager (The External Liaison): The specialized bridge to the airport’s landlord (AWS, Azure, GCP). It isolates cloud-specific logic from the core operations. This component handles requests for external infrastructure, such as opening public access roads (Load Balancers), managing storage volumes, or verifying if a remote gate has been demolished by the provider (Node Lifecycle).
Use Cases: Control Plane
- Centralized Management: Managing thousands of containers from one entry point.
- Auto-Healing: Automatically detecting node failures and moving workloads.
- Scaling: Deciding when to add more pods (via HPA – Horizontal Pod Autoscaler).
Benefits: Control Plane
- Abstraction: You don’t need to know which specific server your app is on.
- Resilience: The brain ensures the body keeps working even if parts are injured.
The Data Plane: The Worker Nodes
Worker nodes are the machines (VMs or physical servers) where your applications actually run. Every worker node runs a set of services that allow it to communicate with the Control Plane.
If the Control Plane is the “Headquarters” where the managers sit and make decisions, the Data Plane (Worker Nodes) is the actual Factory Floor where the machinery operates, and the products (your applications) are built and assembled.
The “Factory Floor” Equipment (Data Plane Components)
- The Worker Node: Is one specific building or workstation on that floor.
- The Kubelet (The Floor Supervisor): The primary agent responsible for that specific building. It takes orders from HQ (API Server) and ensures the containers specified in a PodSpec are running and healthy. It reports back if a machine breaks down or is running out of electricity (CPU/RAM).
- The Container Runtime (The Heavy Machinery): The actual software engine (like a conveyor belt or robotic arm) that pulls images and runs your containers. Kubernetes relies on software that speaks the “CRI” (Container Runtime Interface).
- Note: While Docker was the historical standard, modern Kubernetes (v1.24+) uses containerd or CRI-O.
- kube-proxy (The Logistics & Routing Officer): Manages network routing inside the building. It ensures that when raw materials (network traffic) arrive, they are translated (via DNAT(Destination Network Address Translation) / SNAT(Source Network Address Translation) using
iptablesor IPVS) and directed to the correct machine (Pod) so nothing gets lost. - CoreDNS (The Phonebook): An essential cluster add-on. It translates service names (like
my-database) into internal IP addresses so pods can talk to each other seamlessly.
Without the Data Plane, the Control Plane is just a management team with no one to do the actual work.
A cluster’s capacity is defined by its Worker Nodes. You can scale from a single node (like Minikube) to thousands of nodes, creating a massive pool of compute resources (CPU and RAM) that the Control Plane can schedule work onto.
Quick Referance
- Kubelet is the Primary Agent: responsible for ensuring that the containers specified in a PodSpec are running and healthy on that specific node.
- Kube-Proxy handles Networking: It manages IP translation (DNAT/SNAT) so services can find pods. It typically uses
iptablesorIPVS. - Runtime is Pluggable: Kubernetes doesn’t care if you use Docker, containerd, or CRI-O, as long as it speaks “CRI” (Container Runtime Interface).
- Worker Nodes are Disposable: In a cloud-native mindset, if a worker node dies, we usually just replace it rather than fixing it.
- The primary essential Kubernetes worker node components that run directly on the host Operating System (OS) are the Kubelet and Optionally the kube-proxy(DaemonSet(default))
- Scalability: You can add or remove worker nodes dynamically (Cluster Autoscaler).
- Self-Healing: If a component on the node crashes (like kube-proxy), systemd restarts it. If the Node dies, the Controller Manager moves work elsewhere.
- Heterogeneity: A cluster can have a mix of worker nodes (e.g., some with GPU for AI, some with high memory for databases, some Linux, some Windows).
Use Cases: Data Plane
- High Performance Computing: Using specific nodes with GPUs/TPUs managed by Kubelet.
- Stateful Apps: Nodes mounting physical disks via CSI for databases like PostgreSQL.
- Edge Computing: Running lightweight K3s worker nodes in retail stores or cell towers.
- https://kubernetes.io/docs/concepts/overview/components/#node-components
—
Master and Worker Node components
| Component | Scope | Role | Simple Analogy | Best Way to Remember |
| Kube-API Server | Master | Validates and configures data. The “hub” of the cluster. | Receptionist | The only component that talks to the Etcd database. |
| Etcd | Master | Distributed Key-Value store. | The Source of Truth | If it isn’t in Etcd, it doesn’t exist in the cluster. |
| Kube-Scheduler | Master | Watches for newly created Pods with no assigned node. | The Matchmaker | Finds the best “home” for a Pod based on resources. |
| Kube-Controller Manager | Master | Watches the state and makes changes to reach the “Desired State.” | The Thermostat | Notices if the “room” is too cold (pod down) and turns on the heat. |
| Cloud Controller Manager | Master | Manages cloud-specific integrations (LB, Storage, Routes). | The Liaison | Translates K8s requests into AWS/GCP/Azure commands. |
| Kubelet | Master & Worker | Master: Manages Control Plane Pods (Static Pods). Worker: The primary “node agent.” Reports back to the API server. | The Foreman | Takes the “blueprint” from the API Server and ensures the containers run. |
| Container Runtime | Master & Worker | Master: Runs Control Plane containers. Worker: The software that pulls images and runs containers. | The Engine | The actual worker (containerd, CRI-O) that starts the process. |
| Kube-Proxy | Master & Worker | Master: Routes traffic from Master to Services. Worker: Handles host sub-netting and makes services available. | The Traffic Cop | Manages the IP tables/IPVS so Pods can talk to each other. |
| CoreDNS | Cluster Add-on | Provides Service Discovery across the cluster. | The Phonebook | Maps human-readable service names to Pod IP addresses. |
Note: Master nodes are technically nodes as well, meaning they also run a Kubelet, Container Runtime, and Kube-proxy behind the scenes to host the Control Plane components (Static Pods).