Kubelet
The Kubelet is the Site Manager present on every single worker node (construction site).
- The Head Office sends a blueprint (PodSpec) to the Site Manager.
- The Site Manager (Kubelet) reads the plan, calls the workers (Container Runtime like Docker/containerd), and ensures the building (Pod) is constructed exactly as planned.
- If a brick falls or a wall cracks (container crashes), the Site Manager notices immediately and fixes it or reports back to the Head Office.
- The Head Office doesn’t talk to the workers directly; they only talk to the Site Manager (Kubelet).
No Kubelet = No Pods running on that node. It is the bridge between the “Brain” (Master) and the “Muscle” (Worker).
- Kubelet is the primary agent that runs on every node in the cluster.
- It does not manage containers created outside of Kubernetes (like manually running
docker run). - It works on a Pull-based model it watches the API Server for “PodSpecs” assigned to its node.
- It talks to the container engine (like containerd) using the CRI (Container Runtime Interface).
- It reports the Node Status (Ready/NotReady) back to the Control Plane every few seconds.
- Static Pods are the only pods managed directly by Kubelet without the API Server (using files on the disk).
- Process Name:
kubelet(runs as a systemd service usually). - Port: Listens on port 10250 (for API) and 10255 (Read-only, deprecated).
- Config File: Usually located at
/var/lib/kubelet/config.yaml. - Communication: Uses gRPC to talk to the Runtime (CRI).
| Feature | Description | Command/Location |
| Service Status | Check if Kubelet is running | systemctl status kubelet |
| Logs | detailed logs for debugging | journalctl -u kubelet -f |
| Config Location | Where Kubelet gets its rules | /var/lib/kubelet/config.yaml |
| Manifest Path | Where Static Pods live | /etc/kubernetes/manifests |
| Port | Secure port for API server calls | 10250 |
| CRI Socket | Connection to Container Runtime | unix:///run/containerd/containerd.sock |
the Kubelet has one job: Take a PodSpec and make it a reality. When you run kubectl apply -f pod.yaml, here is the flow:
- API Server writes the data to etcd.
- Scheduler assigns a Node to that Pod.
- Kubelet (on that specific node) sees, “Oh! A new task for me.”
- Kubelet talks to the Container Runtime (via CRI) to pull the image.
- Kubelet tells the Runtime to start the container.
- Kubelet continuously monitors the container. If the “Liveness Probe” fails, Kubelet kills and restarts the container.
Role: The Primary Node Agent
Analogy: The “Captain” of the ship (the Node), taking orders from the Admiral (Control Plane).
The Kubelet is the most critical component running on every single Worker Node. It acts as the bridge between the centralized intelligence of the API Server and the local reality of the node. It does not manage containers directly; instead, it orchestrates the process.
How it Works (The Workflow)
- Polling/Watching: The Kubelet constantly watches the API Server for new
PodSpecs(instructions defining a Pod) that have been assigned to its specific node. - CRI Interaction: When it receives an instruction to start a Pod, it communicates with the Container Runtime (via the Container Runtime Interface – CRI) using gRPC calls. It says, “I need a Pod with these containers, using these images and these resource limits.”
- Lifecycle Management: It doesn’t just start containers; it ensures they stay running. If a container crashes, the Kubelet restarts it (based on the
restartPolicy). - Probing: It performs health checks:
- Liveness Probes: “Is the application running?” (If no, restart it).
- Readiness Probes: “Is the application ready to accept traffic?” (If no, remove it from the Service endpoints).
- Startup Probes: “Has the slow application started yet?”
- Resource Reporting: Every 10 seconds (default), the Kubelet reports the node’s status (Node conditions like
Ready,OutOfDisk,MemoryPressure) back to the API Server.
As an Architect, you must understand the Internal Control Loop and Security.
1. The PLEG (Pod Lifecycle Event Generator): Kubelet cannot constantly poll the container runtime (that would be too heavy). Instead, it uses PLEG. PLEG relists containers from the runtime periodically (default 1s) and generates “Events”. If you see “PLEG is not healthy” in logs, it means your container runtime is too slow or unresponsive.
2. CRI, CNI, and CSI Interaction: Kubelet does not handle networking or storage directly. It acts as an orchestrator:
- CRI: Kubelet calls
RunPodSandbox(via gRPC). - CNI: The Runtime (not Kubelet directly anymore in modern versions) executes CNI plugins to get an IP address.
- CSI: Kubelet handles the
NodeStageVolumeandNodePublishVolumecalls to mount storage into the pod.
3. Garbage Collection (GC): Kubelet is the janitor. It cleans up unused images and dead containers to save disk space.
image-gc-high-threshold: If disk usage hits 85% (default), Kubelet starts deleting old images.- Security Risk: If GC fails, the disk fills up, and the Node goes
NotReady.
4. Security Hardening (DevSecOps Focus):
- Disable Anonymous Auth: Hackers love port 10250. Ensure
--anonymous-auth=false. - Node Authorization: Ensure
--authorization-mode=Webhook. This ensures Kubelet asks the API server, “Is this person allowed to see my logs?” - Rotate Certificates: Kubelet needs a certificate to talk to the API server. Enable
--rotate-certificatesso it automatically renews its own ID card.
- Kube-bench: Checks if your Kubelet is configured securely.
- Node Problem Detector: Helps Kubelet report deeper hardware issues.
Key Characteristics
- Self-Healing: Automatically restarts crashed containers.
- Node-Centric: It cares only about its node, not the whole cluster.
- Declarative: It constantly tries to match the “Current State” to the “Desired State.”
Use Case
- Running Applications: Every standard pod runs via Kubelet.
- Bootstrapping Clusters: Used to start the Control Plane components as Static Pods.
- Edge Computing: Kubelet runs on small devices (IoT) to manage containers remotely.
Benefits
- Reliability: Even if the Master Node goes down, the Kubelet keeps the existing pods running (it just can’t accept new changes).
- Resource Management: Enforces CPU and Memory limits so one app doesn’t crash the whole server.
Limitations
- Scalability: A single Kubelet can struggle if you cram too many pods (e.g., >110 pods) on one node, leading to PLEG issues.
- Dependency: It relies heavily on the Container Runtime. If Docker/containerd freezes, Kubelet freezes.
Common Issues, Problems, and Solutions
| Problem | Symptom | Solution |
| PLEG is not healthy | Node goes NotReady, pods stuck | Check Container Runtime logs. Increase PLEG timeout (not recommended) or reduce load. |
| Disk Pressure | Pods are being evicted | Check image-gc settings. Prune unused images manually or increase disk size. |
| Unauthorized (401) | Kubelet cannot talk to API Server | Check /etc/kubernetes/kubelet.conf and ensure client certificates are valid. |
| CrashLoopBackOff | Pod starts and stops loop | Check app logs (kubectl logs). It’s usually an app error, but Kubelet is just reporting it. |
- Official Kubelet Reference: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
- Kubelet Configuration (Config File): https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
- Debug Kubelet: https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/