Skip to main content
< All Topics

Kubernetes ResourceQuota

Kubernetes Namespace Resource Quotas: Controlling the “Noisy Neighbor”

In Kubernetes, a cluster is exactly like a big shared computer where multiple teams run their applications. If one team deploys a very heavy application, it might accidentally consume all the CPU and RAM, causing other teams’ applications to crash or slow down.

Namespace Resource Quotas are simply the “limits” or “budgets” you set for a specific team (which operates in a Namespace). It clearly tells Kubernetes: “This team is allowed to use only this much CPU and Memory, and not a single byte more.” This guarantees fairness across the cluster and prevents one misbehaving application from bringing down the entire system.

The Office Building Analogy

Imagine a massive corporate office building (the Kubernetes Cluster). Various departments like HR, Finance, and IT (the Namespaces) share this building.

If the IT department plugs in thousands of heavy servers, they might trip the main circuit breaker, causing a blackout for the entire building. To prevent this, the building manager installs sub-meters and power limits for each floor. The IT floor gets a maximum budget of 100kW (the Resource Quota). If they try to draw 101kW, their specific floor’s power is capped or blocked, but the rest of the building continues to work smoothly.

A Resource Quota is the sub-meter and circuit breaker for your Kubernetes Namespaces.

Quick Reference
  • “Quota is the Ceiling”: It sets the maximum limit for the whole room (Namespace), not just one person (Pod).
  • “Requests allow entry, Limits stop abuse”: Quotas calculate the total of all Pods’ requested resources to see if they fit the budget.
  • “No Ticket, No Entry”: If a Compute Quota is active in a namespace, every single Pod must have resource requests and limits defined, or the API server will reject it.
  • Aggregate Level: Quotas apply to the mathematical sum of all resources in that specific namespace.
  • Hard Limits: Kubernetes enforces a strict hard stop. You cannot exceed the allocated quota.
  • Scope: Strictly bound to a specific Namespace.
FeatureDescriptionReal-World Check
Compute QuotaLimits total CPU & RAM usage across all pods.“You have 16GB RAM total for this project.”
Object QuotaLimits the count of Kubernetes resources (Pods, Services, PVCs).“You can only create 20 servers maximum.”
ScopeSelectorApplies quotas conditionally (e.g., based on priority).“Only Gold-tier applications get these limits.”
EnforcementImmediate rejection of new Pods if they push usage over quota.“Transaction Declined: Insufficient Funds.”

How Quotas Work Under the Hood

Kubernetes Resource Quotas are critical governance objects defined in the API group v1. They provide constraints that limit the aggregate resource consumption per Namespace.

When a ResourceQuota is applied, the Kubernetes API server intercepts every pod creation request. If the new pod’s resource requirement forces the namespace usage over the set Quota, the Admission Controller immediately rejects it, and the API server returns a 403 Forbidden error.

This mechanism is vital for multi-tenant environments where you have Development, Staging, and Production workloads running on the same physical hardware. It is the primary defense against the “Noisy Neighbor” problem. the most important concept is limiting Compute Resources.

  • requests.cpu: The minimum CPU guaranteed to the namespace.
  • limits.cpu: The maximum CPU the namespace can ever reach.
  • requests.memory & limits.memory: The exact same logic applied to RAM.

If you create a Resource Quota for CPU and Memory, you force a strict rule: Any new pod entering this namespace must declare its own CPU and Memory requests/limits. If a developer forgets to add these to their deployment YAML, Kubernetes will instantly block the pod. To fix this, administrators use a companion tool called LimitRange.

The DevSecOps Architect Level

At an advanced architectural level, managing quotas goes beyond basic CPU and RAM. It involves managing the control plane health and isolating workloads based on QoS (Quality of Service).

  • Control Plane Protection (Object Counts): Object Count Quotas are your primary defense against “resource exhaustion attacks.” A runaway CI/CD loop that accidentally creates 10,000 configmaps or tiny pods can crash the etcd database, bringing down the entire cluster even if CPU usage is low. You must strictly limit the count of pods, services, secrets, and configmaps per namespace.
  • LimitRange Integration: This is a crucial architectural distinction. ResourceQuota limits the entire Namespace (the whole team), while LimitRange limits a single Pod or Container (the individual). Because a Quota strictly requires every Pod to have a resource request, an architect will deploy a LimitRange alongside a ResourceQuota to automatically inject default limits into “naked” pods. This ensures valid deployments aren’t rejected purely due to missing YAML fields.
  • Cloud Cost Context: Quotas directly impact your billing on AWS, GCP, and Azure. For example, in AWS EKS, every LoadBalancer service spins up a real, external AWS Classic or Network Load Balancer, incurring hourly charges. Setting a hard cap on services.loadbalancers prevents developers from accidentally racking up massive cloud bills.
  • Advanced Scoping & QoS: You can apply quotas conditionally using ScopeSelectors and PriorityClasses:
    • Priority and QoS: Enforce strict limits for “low-priority” batch jobs (or BestEffort QoS pods) while leaving critical system daemons unrestricted within the same namespace.
    • Execution Time: Enforce quotas based on whether a pod has activeDeadlineSeconds defined. The Terminating scope applies to temporary jobs, while NotTerminating applies to long-running microservices.
  • StorageClass & Hardware Profiling:
    • Disk Tiering: You can limit storage based on the underlying disk type. For instance, you can limit expensive SSD storage (premium-storage.storageclass.storage.k8s.io/requests.storage) while allowing generous limits for cheaper HDD storage.
    • Extended & Ephemeral Resources: Limit GPU consumption for AI/ML workloads using extended resources (e.g., requests.nvidia.com/gpu: 2). Additionally, always include requests.ephemeral-storage to prevent pods from filling up the worker node’s local filesystem.
  • Overcommit Ratios: Architects design clusters with intentional overcommit ratios (e.g., total limits equal 150% of physical capacity, while total requests equal 80%). ResourceQuotas act as the strict mathematical boundaries that make this overcommitting strategy safe and predictable.
  • ScopeSelectors and PriorityClasses: You can apply quotas conditionally. For example, you can create a quota that only applies to pods with the BestEffort QoS class, or pods with a specific PriorityClass. This allows a namespace to have strict limits for batch jobs but unlimited capacity for critical system daemons.
  • Terminating vs. NotTerminating Scopes: You can enforce quotas based on whether a pod has activeDeadlineSeconds defined. Terminating applies to temporary jobs, while NotTerminating applies to long-running microservices.

Object Count Quotas:

These prevent “resource exhaustion attacks” where a runaway CI/CD loop creates 10,000 tiny pods, crashing the etcd database even if CPU usage is low. You must limit pods, services, secrets, and configmaps.

  1. Compute & Ephemeral Resources: Beyond just CPU and Memory, you can control the temporary disk space that pods use. This prevents a pod from downloading massive files and filling up the physical node’s disk.
    • requests.ephemeral-storage: Minimum local disk space guaranteed.
    • limits.ephemeral-storage: Maximum local disk space a pod can use before getting evicted.
    • requests.hugepages-<size> / limits.hugepages-<size>: Used for high-performance applications that require memory hugepages (e.g., hugepages-2Mi).
  2. Storage Resources (Persistent Volumes): You can tightly control how much permanent storage a team can claim from your storage arrays or cloud providers.
    • requests.storage: The total sum of storage across all Persistent Volume Claims (PVCs) in the namespace (e.g., 500Gi).
    • persistentvolumeclaims: The total number of PVCs the team is allowed to create.
    • StorageClass Specific Quotas: You can even restrict storage based on the type of disk. For example, you can allow unlimited cheap HDD storage but restrict expensive SSD storage:
  3. Core Object Counts (Control Plane Protection): This is where DevSecOps architects prevent teams from accidentally spamming the Kubernetes etcd database or exhausting network IPs.
    • pods: Maximum number of pods allowed (regardless of their CPU/RAM).
    • services: Total number of Services.
    • services.loadbalancers: (Crucial for Cloud Costs!) Maximum number of external LoadBalancers (AWS ALB/NLB, Azure LB, etc.).
    • services.nodeports: Maximum number of NodePort services. Very useful for security to minimize open ports on your nodes.
    • secrets: Maximum number of Kubernetes Secrets.
    • configmaps: Maximum number of ConfigMaps.
  4. Extended API Object Counts (The count/ Syntax): Kubernetes introduced a generic way to limit almost any standard API resource using the count/<resource>.<group> syntax. This is highly useful for preventing a team from running too many automated jobs.
    • count/deployments.apps: Limit the number of Deployments.
    • count/replicasets.apps: Limit the number of ReplicaSets.
    • count/statefulsets.apps: Limit the number of StatefulSets.
    • count/jobs.batch: Limit the number of Batch Jobs.
    • count/cronjobs.batch: Limit the number of CronJobs.
  5. Hardware Accelerators & Custom Resources: If you are running AI/ML workloads or using specialized hardware, Quotas can manage those too.
    • requests.nvidia.com/gpu: Limit the number of physical GPUs a namespace can request.
    • requests.amd.com/gpu: Same logic for AMD GPUs.

Production-Grade Tooling Integration:

  • OPA Gatekeeper: Use Gatekeeper mutating webhooks to dynamically enforce organizational policies alongside built-in quotas.
  • Kyverno: An excellent Kubernetes-native policy engine to automatically generate ResourceQuotas and LimitRanges every time a new Namespace is created.
  • Kubecost: Integrates with your namespace quotas to provide real-time cost tracking and alerting when a team nears their budget.
  • Prometheus: Always monitor the kube_resourcequota metrics. Trigger alerts when a namespace reaches 80% of its quota.

Additional Details
  1. Key Components
    • ResourceQuota Object: The Kubernetes API object that defines the constraints.
    • Admission Controller: The control plane component that intercepts API requests to enforce the quota.
    • Namespace: The logical boundary where the quota is applied.
    • LimitRange (Companion): Often used alongside Quotas to provide default resource values.
  2. Key Characteristics
    • Immutability in Enforcement: Once applied, the limits are hard. No exceptions are made by the API server.
    • Aggregate Calculation: Always looks at the sum total of resources, not individual spikes.
    • Real-time evaluation: Every kubectl apply is evaluated instantly against the remaining quota balance.
  3. Use Case
    • Multi-tenant Clusters: Safely hosting multiple development teams on a single large cluster.
    • Cost Control: Preventing accidental deployment of expensive cloud resources (LoadBalancers, massive PVCs).
    • Cluster Stability: Protecting the etcd database from object-spamming CI/CD pipelines.
  4. Benefits
    • Eradicates the “Noisy Neighbor” problem.
    • Provides predictable capacity planning.
    • Forces developers to think about resource optimization and explicitly declare their application requirements.
    • Maintains control plane health and security.
  5. Best Practices
    • Always pair a ResourceQuota with a LimitRange to prevent developer frustration over missing YAML fields.
    • Set alerts at 80% quota utilization using Prometheus so teams can request more capacity before their deployments fail.
    • Use services.nodeports: "0" in quotas for developer namespaces to minimize security risks from open ports on your nodes.
    • Separate quotas by environment (e.g., strict limits for dev, higher limits for prod).
  6. Technical Challenges
    • Determining the “right” numbers for a quota can be difficult and requires historical profiling of application behavior.
    • Developers may initially struggle with 403 Forbidden errors if they are not used to defining resource requests and limits.
  7. Limitations
    • No “Borrowing”: Namespace A cannot borrow unused quota from Namespace B, even if the cluster is empty. It is a hard wall.
    • CPU Throttling: If you hit CPU limits, the app slows down (throttles). If you hit Memory limits, the app crashes (OOMKilled).
  8. Common Issues, Problems and Solutions
ProblemRoot CauseSolution
“Forbidden: exceeded quota”The new pod requests more resources than available in the remaining quota.Increase the Quota or optimize the Pod’s resource requests.
Deployment stuck at 0 replicasQuota is full, so the ReplicaSet cannot create the Pod.Check kubectl describe quota to see which resource is exhausted.
Pods failing without explicit errorOften caused by Ephemeral Storage limits being hit.Add requests.ephemeral-storage to your quota monitoring.

Yaml file to create Namespace and its Quota
YAML
# Create namespace
apiVersion: v1
kind: Namespace
metadata:
  name: project-alpha   # <--- We create this specific name
  labels:
    team: backend-devs
    environment: production
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: hardened-team-quota
  namespace: project-alpha  # The namespace where this applies
  labels:
    environment: production
    tier: gold
spec:
  hard:
    # --- COMPUTE RESOURCES (The "Engine" Limits) ---
    # requests.cpu: "4" means 4 vCPUs (or 4 Cores) are GUARANTEED.
    # This is equivalent to writing "4000m".
    requests.cpu: "4"
    
    # limits.cpu: "8" means the namespace can burst up to 8 vCPUs
    # if the node has free capacity. It cannot exceed 8 vCPUs.
    limits.cpu: "8"
    
    # Total Memory requests (8 GiB guaranteed)
    requests.memory: "8Gi"
    # Total Memory limits (Hard stop at 16 GiB)
    limits.memory: "16Gi"

    # --- OBJECT COUNTS (The "Clutter" Limits) ---
    # Maximum number of pods allowed (prevents IP address exhaustion)
    pods: "20"                
    # Limit Services to prevent network clutter
    services: "10"            
    # Limit Secrets/ConfigMaps to protect etcd database size
    secrets: "30"             
    configmaps: "30"          
    
    # --- STORAGE RESOURCES (The "Warehouse" Limits) ---
    # Total storage space requested across all PVCs
    requests.storage: "100Gi" 
    # Max number of PVCs (disks) allowed
    persistentvolumeclaims: "10" 

    # --- EXPENSIVE CLOUD RESOURCES (The "Wallet" Savers) ---
    # Critical for cost control on AWS/GCP/Azure!
    # Prevents creating too many expensive external Load Balancers
    services.loadbalancers: "2" 
    # Specific limits for NodePorts (security risk if too many open)
    services.nodeports: "0"

Contents
Scroll to Top