Kube-Scheduler

PostedDecember 26, 2021

UpdatedFebruary 8, 2026

Author -Rajkumar Aute

Kube-Scheduler: The “Decision Maker” & Cluster Planner

Imagine you are checking into a massive hotel (the Cluster) with your family (the Pod). You go to the front desk. The receptionist (Kube-Scheduler) doesn’t carry your bags or make your bed (that’s the Kubelet’s job). Instead, they look at your requirements: “We need a room with two beds, a sea view, and it must be non-smoking.”

The receptionist checks the computer system (etcd), filters out the rooms that are occupied or too small, scores the remaining rooms based on the best view, and finally hands you the key for Room 304 (assigns the Node). If no room matches your needs, you stay in the lobby (Pending state) until one opens up.

In technical terms, the Scheduler watches for new Pods that have no assigned node and selects the best node for them to run on.

The Matchmaker: It marries “homeless” Pods to the most suitable Node.
The Observer: It strictly watches for Pods where spec.nodeName is empty.
Two-Step Logic: Always remember: Filter first (Can it fit?), Score second (Is it the best fit?).
Hands-Off Leader: It assigns the node (updates the database) but never touches the container itself.
The Brain, Not the Muscle: It makes decisions; the Kubelet executes them.

Feature	Description	Simple Analogy
Filtering (Predicates)	Eliminating unsuitable nodes immediately.	“This shirt is size Small, I need Large. Discard it.“
Scoring (Priorities)	Ranking the remaining “survivor” nodes.	“These 3 shirts fit, but the red one looks the best. Pick the red one.“
Taints & Tolerations	Repelling pods from specific nodes.	“This seat is ‘Reserved’ for VIPs. You can’t sit here unless you have a VIP ticket.”
Node Affinity	Attracting pods to specific nodes.	“I prefer to sit near the window (US-East zone).”
Pod Affinity	Grouping pods together.	“I want to sit next to my friend (Database Pod).”
Pod Anti-Affinity	Keeping pods apart (for safety).	“I don’t want to sit next to my ex (Same App Instance).”

The Kube-Scheduler is a control plane component that runs within the master node. Its entire life purpose is to watch for Unbound Pods.

The Scheduling Loop (The Lifecycle):

Queueing: When you run kubectl run nginx, the API server saves the pod to etcd. The Scheduler notices this pod is sitting in the “Scheduling Queue.”
Filtering (Hard Constraints): The scheduler looks at all available nodes (e.g., 100 nodes). It runs checks like:
- NodeResourcesFit: Does the node have enough free CPU/Memory?
- NodeUnschedulable: Is the node cordoned (closed for maintenance)?
- TaintToleration: Does the pod tolerate the node’s taints?
- Result: Maybe only 20 nodes pass this filter. The rest are dropped for this specific pod.
Scoring (Soft Constraints): Now it looks at those 20 nodes to find the “best” one. It calculates a score (0-100) for each.
- ImageLocality: Does the node already have the nginx image downloaded? (Saves time = Higher score).
- LeastRequested: Which node is emptiest? (Spreading the load = Higher score).
Binding: The node with the highest score wins. The Scheduler sends a Binding object to the API Server saying, “Assign Pod X to Node Y.”

Kubernetes Scheduler Documentation

DevSecOps Architect

At an architect level, you must understand that the Scheduler is not just a binary “fit/no-fit” machine; it is a Pluggable Scheduling Framework.

The Scheduling Framework (Extension Points): Modern Kubernetes (v1.19+) uses a framework approach where you can inject custom logic at different points (Plugins). You don’t just replace the scheduler; you extend it.

QueueSort: Decides which pending pod goes first (PriorityClasses).
PreFilter / Filter: Checks constraints (like GPU availability).
PreScore / Score: Runs ranking algorithms.
Reserve: Reserves resources internally in the scheduler cache (to prevent race conditions).
Permit: specific logic to “wait” or “allow” (used in gang scheduling for AI/ML workloads).
Bind: The actual assignment logic.

Multi-Scheduler Setup: Did you know you can run multiple schedulers in one cluster?

You can write a custom scheduler for specific batch jobs (like Big Data) and leave the default scheduler for web apps.
In the Pod spec, you simply define schedulerName: my-custom-scheduler.

Descheduler (The “Corrector”): The Kube-Scheduler only runs once when the pod is created. If the cluster becomes unbalanced later (e.g., a big node is added but stays empty), the Scheduler won’t move old pods.

Solution: Use the Descheduler. It evicts pods based on policies, forcing them to go back to the Kube-Scheduler to find a better home.

–

Taints and Tolerations (The “Repellent”)

This is a critical concept.
Taint: applied to a Node (e.g., “This node is for GPU tasks only”).
Toleration: applied to a Pod (e.g., “I am a GPU task, I can tolerate that taint”).
Analogy: A Taint is like a “Bad Smell” on the node. Only pods that “Tolerate” the smell will land there. Everyone else stays away.

Affinity and Anti-Affinity (The “Magnet”)

Node Affinity: “I want to run on a node that is in the ‘US-East’ zone.” (Attraction).
Pod Affinity: “I want to run on the same node as the Database Pod.” (Togetherness).
Pod Anti-Affinity: “I do not want to run on the same node as another Web Server.” (Separation – useful for High Availability so one server crash doesn’t kill both apps).

Use Case

Dedicated Infrastructure: Ensuring heavy AI workloads only land on expensive GPU nodes using Taints.
Cost Optimization: Packing non-critical dev pods onto cheaper “Spot Instances” using Affinity.
High Availability: Using Anti-Affinity to ensure that if one node crashes, your entire application doesn’t go down because the replicas were spread out.

Benefits

Resource Efficiency: Ensures hardware is utilized optimally (Bin-packing).
Stability: Prevents “Noisy Neighbors” (one app eating all CPU) by respecting resource limits during filtering.
Automation: Eliminates the need for manual placement of containers.

Limitations

Static Decision: Once a pod is scheduled, the scheduler forgets about it. It does not re-balance purely based on runtime metrics (unless you use Descheduler).
Complex Rules Conflict: It is easy to write rules that contradict each other (e.g., “Must be on Node A” vs “Tainted against Node A”), causing the pod to be stuck in Pending forever.

Common Issues, Problems, and Solutions

Issue	Problem Analysis	Solution
Pod Stuck in “Pending”	Scheduler cannot find a node that satisfies all Filters (CPU, Taints, Affinity).	Check events: `kubectl describe pod <pod-name>`. Look for “FailedScheduling”.
“Insufficient CPU”	The sum of Pod `requests` is higher than available Node capacity.	Reduce Pod requests or add more Nodes (Cluster Autoscaler).
Uneven Distribution	All pods landed on one node because it was empty at that specific second.	Use PodTopologySpreadConstraints to force even spreading.

Main Docs: Kubernetes Scheduler
Taints & Tolerations: Taints and Tolerations Docs
Assigning Pods: Assigning Pods to Nodes

Labs

Scenario: Force a Pod to run on a specific node using nodeSelector.

Step 1: Label a Node First, we give a “sticker” (label) to one of our worker nodes.

kubectl label nodes worker-node-1 disktype=ssd

Step 2: Create the Pod Manifest Create a file named pod-ssd.yaml.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-ssd
spec:
  containers:
  - name: nginx
    image: nginx
  nodeSelector:
    disktype: ssd  # This forces the Scheduler to look for the label

Step 3: Apply and Verify

kubectl apply -f pod-ssd.yaml
kubectl get pod nginx-ssd -o wide
# You should see it running specifically on worker-node-1

Tech should learn

AWS(Draft)

AWS-Cloud-Tech

AWS-Compute

DevOps Essentials

DevSecOps Essentials(Draft)

Programming

Python

CI/CD

GitHub Actions

Kubernetes (Draft)

The Foundation

Kubernetes Architecture

Kubernetes Setting Up the Lab

Kubernetes Core Workloads

Docker