Skip to main content
< All Topics

EKS Auto-Scaling HPA, VPA, and Karpenter

In Kubernetes, scaling happens at two levels: the Pod level (making your app bigger or more numerous) and the Node level (adding more physical servers to the cluster).

1. HPA: Beyond Just CPU and Memory

While CPU and Memory are the default triggers, modern HPA strategies rarely rely on them alone. CPU usage can be a lagging indicator (by the time CPU hits 80%, your users might already be experiencing latency).

  • The Pro Move: Teams use KEDA (Kubernetes Event-Driven Autoscaler) alongside HPA. KEDA allows you to scale based on external queues. Think of it as opening more checkout lanes because you see cars pulling into the parking lot, rather than waiting for the line inside the store to get long.

2. VPA: The “Safe” Way to Use It

You correctly pointed out that VPA usually requires a Pod restart, which makes teams hesitant to use it in production for user-facing apps.

  • The Pro Move: Most teams run VPA in “Recommendation Mode” (Off Mode). In this mode, VPA acts purely as an advisor. It watches your pods and tells you, “Hey, you requested 2 CPUs, but this app never uses more than 0.5.” Engineers then take these recommendations and manually update their deployment manifests. This saves money without risking unexpected mid-day pod restarts.

3. The Golden Rule: Don’t Cross the Streams

A classic mistake engineers make is enabling HPA and VPA on the exact same metric (like CPU) for the same Deployment.

  • The Conflict: If CPU spikes, HPA tries to add more pods to lower the average load. Simultaneously, VPA tries to kill the existing pods to give them a larger CPU limit. They end up fighting each other, leading to scaling loops and broken apps. If you use both, HPA must scale on a custom metric (like web requests), while VPA manages the CPU/Memory sizes.

4. Karpenter: Why it’s the 2026 Standard

Your assessment of Karpenter is perfect. The old “Cluster Autoscaler” relied on rigid “Auto Scaling Groups” (ASGs). You had to pre-define a group of, say, exactly m5.large instances.

  • The Pro Move: Karpenter is completely “group-less.” It uses Bin-packing. When HPA demands 50 new pods, Karpenter looks at the exact CPU and Memory requirements of those specific pods. It then talks directly to the cloud provider’s API and buys the absolute cheapest, perfectly sized instances (maybe one massive instance, or three smaller spot instances) to fit those pods like Tetris blocks.
  • Consolidation: Karpenter also works in reverse. If it notices your pods are scattered across several half-empty nodes, it will proactively evict those pods, pack them tightly onto fewer nodes, and delete the empty servers to instantly cut your cloud bill.

Kubernetes Scaling Summary Matrix

AutoscalerTargetYour AnalogyPrimary Use Case
HPAPod QuantityOpening more checkout lanesHandling spikes in web traffic or queue processing.
VPAPod Size (CPU/RAM)Giving workers a bigger shovelRightsizing databases or legacy single-threaded apps.
KarpenterNode InfrastructureExpanding the store buildingJust-in-time, cost-optimized physical server provisioning.

Contents
Scroll to Top