Kube-Controller Manager
Imagine you have a very strict a Smart Air Conditioner (AC).
- Think of it as a Smart AC: You set the remote to 24°C (this is your Desired State). The AC constantly checks the room temperature (the Current State). If the room gets hot (26°C), the AC turns on to cool it down. If it gets too cold, it stops. It works non-stop to ensure the room is exactly how you want it.
- In Kubernetes: You tell Kubernetes, “I want 3 copies of my application running.” The Kube-Controller Manager is the boss that constantly checks: “Do we have 3 copies?” If one crashes and only 2 are left, it notices the gap and immediately orders a new one to be created. It enforces the rules.
- “The Kube-Controller Manager is the brain that fixes the difference between what you want and what you have.”
- “It is a single binary (program) that runs many different logic loops inside it.”
- “If a Pod dies, the Controller Manager is the one that notices and requests a replacement.”
Key Characteristics to Remember
- State Watcher: Continuously monitors the cluster state via the API Server.
- Self-Healing: Automatically corrects failures (like restarting crashed pods).
- Single Binary: Even though it does many jobs (Node control, Job control), it runs as one process to keep things simple.
Sheet Table
| Feature | Description | Real-World Analogy |
| Primary Role | Regulates the state of the cluster. | A Cruise Control system in a car. |
| Input | Desired State (YAML configuration). | Setting speed to 80 km/h. |
| Action | Compares Current State vs. Desired State. | Checking speedometer vs. setting. |
| Output | Corrective action (Create/Delete Pods). | Accelerating or braking. |
The Kubernetes Controller Manager is a daemon that embeds the core control loops shipped with Kubernetes. In robotics and automation, a “control loop” is a non-terminating loop that regulates the state of a system.
In Kubernetes, a controller is a control loop that watches the shared state of the cluster through the API Server and makes changes attempting to move the current state towards the desired state.
For a DevSecOps Architect, the Kube-Controller Manager is critical for High Availability (HA) and Cluster Stability.
- Leader Election: In a production setup (like a multi-master cluster), you will have multiple instances of the Controller Manager running for redundancy. However, only one can be active at a time to conflicts. They use a Leader Election mechanism (using
leasesAPI) to decide who is the active “Enforcer.” If the leader dies, another instance takes over. - Performance Tuning: You can tune the
--concurrent-deployment-syncsflag. By default, it might process items sequentially or with low concurrency. Increasing this allows faster reconciling in large clusters but puts more load on the API Server. - Security Context: The Controller Manager requires significant privileges (it can touch almost any object). It should run with its own Service Account and restricted RBAC (Role-Based Access Control) permissions.
- Metrics & Monitoring: It exposes a
/metricsendpoint that tools like Prometheus can scrape. You should alert on metrics likeworkqueue_depth(is the controller falling behind?) andrest_client_requests_total(is it spamming the API server?).
Key Controllers running inside this single binary:
- Node Controller: Responsible for noticing and responding when nodes go down. It checks if a node has stopped sending “heartbeats.”
- Replication Controller: Responsible for maintaining the correct number of pods for every replication controller object in the system.
- Endpoints Controller: Populates the Endpoints object (matches Services & Pods).
1. Kubernetes Node Controller: The Heartbeat Monitor of Your Cluster
Key Characteristics to Remember
- Location: Runs inside the
kube-controller-manageron the Control Plane. - Main Job: Manages the lifecycle of nodes (Registration, Monitoring, Deletion).
- Health Signal: Relies on Heartbeats sent by the Kubelet every 10 seconds (default).
- Reaction: If a node stops sending heartbeats, the controller waits for a
grace-periodbefore evicting pods. - Cloud Awareness: In cloud environments (AWS, Azure, GCP), it checks with the cloud provider to see if the VM still exists.
| Feature | Default Value | Description |
| Monitor Period | 5 seconds | How often the controller checks the status of each node. |
| Node Monitor Grace Period | 40 seconds | How long the controller waits before marking a node as Unhealthy / Unknown. |
| Pod Eviction Timeout | 5 minutes | How long the controller waits after a node is “Unknown” before deleting its Pods. |
| Eviction Rate | 0.1/sec | Speed at which pods are deleted from a down node (prevents mass deletion panic). |
| Zone Awareness | Enabled | Handles failures differently if a whole Availability Zone goes down vs. just one node. |
At a basic level, the Node Controller is a loop that runs forever. It looks at the Node objects in the API Server.
- Registration: When a new Node (server) joins, the Kubelet creates a
Nodeobject. The Controller validates it. - CIDR Assignment: It assigns a range of IP addresses (CIDR block) to the node so its Pods can have IPs.
- Tainting: If a node is having trouble (like running out of disk or memory), the Controller puts a “stamp” (Taint) on it. This tells the Scheduler, “Don’t send new Pods here!”
- Kubectl: You see the Node Controller’s work when you run
kubectl get nodes. Official Kubectl Docs
DevSecOps Architect Level
For an architect, the Node Controller is about Failure Handling and State Consistency.
1. The “Unknown” State Logic: When Kubelet stops posting status:
- Step 1: Controller marks Node status condition
ReadytoUnknown. - Step 2: It adds a taint
node.kubernetes.io/unreachablewithNoExecuteeffect. - Step 3: The API server starts a countdown (default 5 mins). If the node doesn’t return, the pods are evicted (deleted) so they can be rescheduled elsewhere.
2. Cloud Provider Integration: If you are on AWS/GCP/Azure, the Node Controller runs a loop called cloudNodeController.
- If a node becomes
Unknown, the controller asks the Cloud API: “Does this VM instance still exist?” - If the Cloud API says “No, it’s deleted,” the Node Controller immediately deletes the Node object from Kubernetes, skipping the 5-minute wait.
- Cloud Controller Manager (CCM): In modern clusters, cloud-specific node logic is moved here. Official CCM Docs
Key Characteristics
- Asynchronous: It doesn’t talk directly to the node. It talks to the API Server, which talks to Etcd.
- Consistency: It ensures the “Desired State” (Active Nodes) matches the “Actual State” (Infrastructure).
- Self-Healing: It triggers the self-healing process for workloads by killing pods on dead nodes.
Use Case
- Auto-Scaling: When a Cluster Autoscaler adds a node, the Node Controller registers it.
- Disaster Recovery: When a zone fails, the controller detects the massive loss of heartbeats and manages the pod migration to healthy zones.
- Maintenance: When you drain a node, the controller respects the “SchedulingDisabled” status.
Benefits
- High Availability: Ensures workloads move away from broken hardware.
- Resource Management: Prevents scheduling on nodes that have lost network connectivity.
- Automated Cleanup: Removes stale node objects so you don’t see “Phantom Nodes” in your list.
Common Issues, Problems, and Solutions
Problem 1: Node Flapping (Ready <-> NotReady)
- Cause: Kubelet is under high load and cannot send heartbeats in time.
- Solution: Increase
--node-monitor-grace-periodin the controller manager or give more CPU to the Kubelet.
Problem 2: Pods Stuck in “Terminating” on Dead Node
- Cause: The Node is down, so the Kubelet cannot confirm the pod is deleted. The Controller waits.
- Solution: Force delete the pod:
kubectl delete pod <pod-name> --grace-period=0 --force.
Problem 3: “Thundering Herd” on Recovery
- Cause: Massive eviction causes Scheduler to overload.
- Solution: Configure
--node-eviction-ratecarefully (default is 0.1 per second).
- Kubernetes Node Architecture: https://kubernetes.io/docs/concepts/architecture/nodes/
- Controller Manager: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/
- Taints and Tolerations: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
2. Kubernetes Replication Controller (RC)
a ReplicationController ensures a specific number of “Pods” (application instances) are running at all times.
- Case A: If a Pod crashes (room gets hot), the RC starts a new one immediately.
- Case B: If someone accidentally starts too many Pods (room gets cold), the RC kills the extra ones.
- The Goal: It guarantees that the Desired State (e.g., 3 Pods) always matches the Actual State.
Here are some quick lines to stick in your memory:
- “RC (Replication Controller) guarantees that a specified number of pod replicas are running at any one time.”
- “It replaces pods that are deleted, failed, or terminated.”
- “It is the original/legacy way of replication; the modern way is ReplicaSet (used by Deployments).”
- “RC only supports Equality-Based selectors (e.g.,
env = prod), unlike ReplicaSet which supports Set-Based selectors (e.g.,env in (prod, stage)).”
- Self-Healing: Automatically restarts failed containers.
- Scalability: Can easily scale pods up or down manually or automatically.
- Load Balancing: Helps distribute traffic across multiple pods (when used with Services).
- Legacy Status: It is
apiVersion: v1.
Now, let us go deeper. As an architect, you must know how it works internally.
- The Reconciliation Loop: The Controller Manager runs a loop called the Informer. It watches for changes in
Podobjects andReplicationControllerobjects. - Label Selector Logic: The RC does not “own” the pods by name. It owns them by Labels. If you manually create a standalone Pod with the label
app: nginx, the RC will think, “Oh! I already have a pod!” and might even delete it if the count exceeds the limit. - Shared State: The RC reads the state from
etcd(via API Server). It calculates$CurrentReplicas - $DesiredReplicas.- If Result < 0: It creates specific pods using the
template. - If Result > 0: It sorts the pods (usually killing the youngest or unready ones first) and deletes them.
- If Result < 0: It creates specific pods using the
- Orphan Adoption: If you delete an RC with
--cascade=orphan, the pods keep running. A new RC with the same selector will “adopt” these existing pods immediately.
Kubernetes ReplicationController Docs
Key Characteristics
- Resiliency: It survives Node failures. If a Node dies, the RC notices the pods are gone and starts new ones on a healthy Node.
- Efficiency: It uses very low resources as it is part of the compiled Go binary of the Controller Manager.
Use Case
- Stateless Applications: Perfect for web servers (Nginx, Apache) where one pod is exactly the same as another.
- Queue Consumers: Workers listening to a message queue (RabbitMQ/Kafka) where you just need “5 workers” regardless of their identity.
Benefits
- High Availability: Your app theoretically never goes down completely.
- Automation: No need for a human to wake up at 3 AM to restart a server.
- Resource Optimization: Keeps exactly the resources you asked for, no waste.
Common Issues, Problems, and Solutions
| Problem | Symptom | Solution |
| Pod Flapping | Pods start and terminate repeatedly. | Check if two controllers (e.g., an RC and a Deployment) are selecting the same label. They will fight for control. |
| Pending Pods | replicas: 3 but only 0 running. | Check your Cluster Resources (CPU/Memory) or if Nodes are Ready. The RC creates the pod request, but the Scheduler cannot find a place for it. |
| ImageErr | Pod created but status is Error. | The RC did its job (created the pod), but your Docker image name might be wrong. Check kubectl describe pod. |
3. Endpoint Controller
The Endpoint Controller is the receptionist who has a list of all employees.
- It checks who belongs to the “Billing Department” (Label Selector).
- It checks if they are actually at their desk and ready to work (Readiness Probe).
- It updates the internal phone directory (The Endpoints Object) so that when you call the main number, the call is forwarded to an actual available person.
If this receptionist goes on leave, the phone directory stops updating. New employees won’t get calls, and employees who left will still get calls (which will fail).
- The Rule: If a Service has a
selector, the Endpoint Controller creates anEndpointsobject with the same name. - The Job: It bridges the gap between a Service (stable IP) and Pods (ephemeral IPs).
- The Condition: Only Ready pods get into the Endpoints list (unless
publishNotReadyAddressesis true). - The Modern Way: In newer Kubernetes, this controller works alongside the EndpointSlice controller for better performance.
- It runs inside the
kube-controller-managerbinary. - It watches Services and Pods.
- It does not route traffic; it only maintains the list of IPs that
kube-proxyuses to route traffic.
| Feature | Details |
| Binary | kube-controller-manager |
| Input | Service (with Selector) & Pods |
| Output | Endpoints (and EndpointSlices) |
| Trigger | Pod creation/deletion, Label changes, Readiness state change |
| Default Sync | 5 concurrent syncs (configurable) |
| Scalability Limit | Traditional Endpoints object hits limits at ~1000 pods (use EndpointSlices for more) |
The Endpoint Controller is a loop that runs forever. Here is the simple logic flow:
- Watch: It watches for changes in Services.
- Check: If a Service has a
selector(likeapp: nginx), the controller wakes up. - Search: It searches the cluster for all Pods that have that exact label.
- Verify: It looks at the
statusof those Pods. Are theyRunning? Is theReadinessProbepassing? - Write: It collects the IPs of the “Ready” pods and writes them into a Kubernetes object called
Endpoints.
- Official Link: Kubernetes Endpoints API
At an architect level, you must understand the performance implications and the shift to EndpointSlices.
1. The “Thundering Herd” Problem: In the old days, the Endpoints object contained all IPs for a service in one single API object.
- If you had 5,000 pods backing a service, the
Endpointsobject would be huge (e.g., 1.5 MB). - If one pod changed status, the entire 1.5 MB object had to be resent to every single Node (kube-proxy) in the cluster.
- This caused massive network congestion.
2. The Solution: EndpointSlices: Kubernetes introduced the EndpointSlice Controller (which runs parallel to the Endpoint Controller). It splits that massive list into smaller “slices” (chunks of 100 IPs).
- Architect Note: Ensure your cluster uses
EndpointSlicesif you scale beyond 100 pods per service.
3. Controller Configuration Flags: You can tune the kube-controller-manager to handle high churn (lots of pods dying/starting).
--concurrent-endpoint-syncs: Default is5. If you have a massive cluster, increase this to process endpoint updates faster.--concurrent-service-endpoint-syncs: Controls the newer EndpointSlice syncing concurrency.
Key Characteristics
- Reactive: It reacts to Pod state changes instantly.
- Stateless: It doesn’t store state locally; it rebuilds the state from the API server.
- Selector-Driven: It relies entirely on Label Selectors.
Use Case
- Load Balancing: Ensuring traffic is distributed only to healthy containers.
- Zero-Downtime Deployments: When a Deployment does a rolling update, the Endpoint Controller removes the old pod IP and adds the new one only when it is ready.
Common Issues & Solutions
Problem: Service is up, but connection refused (Endpoints are empty).
- Reason: The Service Selector does not match the Pod Labels.
- Solution: Check for typos in YAML.
kubectl get ep <service-name>should show IPs. If it says<none>, labels don’t match.
Problem: Traffic going to dead pods.
- Reason: The Endpoint Controller is stuck or the Kubelet hasn’t reported the pod as “NotReady” yet.
- Solution: Check
kube-controller-managerlogs for crashes. Use Readiness Probes to force the status update.