Skip to main content
< All Topics

Kubernetes Pod Requests and Limits

Imagine you are booking a hotel room for a family trip. You tell the hotel manager, “I definitely need 2 beds” (this is your Request). But you also say, “If there’s extra space, I might use up to 4 beds, but no more” (this is your Limit).

In Kubernetes, Requests are what a container is guaranteed to get. The cluster makes sure this amount is reserved just for that container. Limits are the maximum a container is allowed to use. If it tries to use more than this, Kubernetes stops it (for memory) or slows it down (for CPU). This system ensures that one “greedy” application doesn’t eat up all the resources and crash the whole server.

 Easy Remember Cheat Sheet
  • Requests are for the Scheduler (Deciding where to put the Pod).
  • Limits are for the Kernel (Stopping the Pod from going crazy).
  • CPU is a “compressible” resource (it gets squeezed/slowed down).
  • Memory is an “incompressible” resource (it gets killed if you exceed it).
  • Always set Requests to what your app needs at idle/normal load.
  • Always set Limits to what your app needs at peak load.
  • Unit for CPU: Measured in cores or millicores (m). 1000m = 1 Core.
  • Unit for Memory: Measured in bytes (Mi, Gi). 128Mi = 128 Mebibytes.
  • Throttling: Occurs when CPU limit is hit. App runs slow.
  • OOMKill: Occurs when Memory limit is hit. App crashes/restarts.

The “Noisy Neighbor” Problem: Without limits, one bad Pod with a memory leak could consume 100% of the RAM on a Node, causing system processes (like Kubelet) to crash. Requests and Limits are the primary defense against this.

At the architect level, you stop thinking about individual Pods and start thinking about Quality of Service (QoS) and Cluster Governance.

1. QoS Classes (The Hidden Feature): Kubernetes automatically assigns a “Class” to your Pod based on how you set resources. This decides who gets killed first when the Node is full.

  • Guaranteed: requests == limits (for everything). Safest. Last to be killed.
  • Burstable: requests < limitsCommon. Killed if using more than request during pressure.
  • BestEffort: No requests or limits set. Dangerous. First to be killed.

2. LimitRange (Namespace Governance): Prevents developers from creating Pods without limits. It sets default requests/limits if the user forgets them.

3. ResourceQuota (Hard Budget): Sets a hard cap on the total resources a Namespace can use (e.g., “The Dev team gets 100GB RAM total”).

Key Characteristics
  • Declarative: You state what you want in YAML, K8s ensures it happens.
  • Granular: Can be set per Container, not just per Pod.
Use Case
  • High Traffic Web Servers: High limits to handle sudden viral traffic spikes.
  • Batch Jobs/AI Training: High requests to guarantee the calculation finishes without interruption.
  • Databases: Guaranteed QoS (Requests=Limits) to ensure the DB never gets killed randomly.
Benefits
  • Cost Savings: “Bin packing” allows you to fit more apps tightly onto fewer servers.
  • Stability: Prevents one bug from taking down the whole cluster.
  • Predictability: You know exactly what capacity you need to buy from AWS/Azure/Google.
Common Issues, Problems, and Solutions
ProblemSymptomSolution
OOMKilledPod restarts with Exit Code 137.Increase Memory Limit or fix memory leak in code.
CPU ThrottlingApp is slow, high latency, no restart.Increase CPU Limit or remove it entirely (relying on Requests).
Pending PodsPod status stays “Pending”.Your Requests are too high; no Node has enough space. Add Nodes or lower Requests.
Wasted CostCluster utilization is low (10%) but bills are high.Requests are set too high (over-provisioning). Use VPA/Goldilocks to resize.

kubernetes.io/docs/concepts/configuration/manage-resources-containers/

kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/

kubernetes.io/docs/concepts/policy/limit-range/

Contents
Scroll to Top