EKS Auto-Scaling HPA, VPA, and Karpenter
In Kubernetes, scaling happens at two levels: the Pod level (making your app bigger or more numerous) and the Node level (adding more physical servers to the cluster).
1. HPA: Beyond Just CPU and Memory
While CPU and Memory are the default triggers, modern HPA strategies rarely rely on them alone. CPU usage can be a lagging indicator (by the time CPU hits 80%, your users might already be experiencing latency).
- The Pro Move: Teams use KEDA (Kubernetes Event-Driven Autoscaler) alongside HPA. KEDA allows you to scale based on external queues. Think of it as opening more checkout lanes because you see cars pulling into the parking lot, rather than waiting for the line inside the store to get long.
2. VPA: The “Safe” Way to Use It
You correctly pointed out that VPA usually requires a Pod restart, which makes teams hesitant to use it in production for user-facing apps.
- The Pro Move: Most teams run VPA in “Recommendation Mode” (Off Mode). In this mode, VPA acts purely as an advisor. It watches your pods and tells you, “Hey, you requested 2 CPUs, but this app never uses more than 0.5.” Engineers then take these recommendations and manually update their deployment manifests. This saves money without risking unexpected mid-day pod restarts.
3. The Golden Rule: Don’t Cross the Streams
A classic mistake engineers make is enabling HPA and VPA on the exact same metric (like CPU) for the same Deployment.
- The Conflict: If CPU spikes, HPA tries to add more pods to lower the average load. Simultaneously, VPA tries to kill the existing pods to give them a larger CPU limit. They end up fighting each other, leading to scaling loops and broken apps. If you use both, HPA must scale on a custom metric (like web requests), while VPA manages the CPU/Memory sizes.
4. Karpenter: Why it’s the 2026 Standard
Your assessment of Karpenter is perfect. The old “Cluster Autoscaler” relied on rigid “Auto Scaling Groups” (ASGs). You had to pre-define a group of, say, exactly m5.large instances.
- The Pro Move: Karpenter is completely “group-less.” It uses Bin-packing. When HPA demands 50 new pods, Karpenter looks at the exact CPU and Memory requirements of those specific pods. It then talks directly to the cloud provider’s API and buys the absolute cheapest, perfectly sized instances (maybe one massive instance, or three smaller spot instances) to fit those pods like Tetris blocks.
- Consolidation: Karpenter also works in reverse. If it notices your pods are scattered across several half-empty nodes, it will proactively evict those pods, pack them tightly onto fewer nodes, and delete the empty servers to instantly cut your cloud bill.
Kubernetes Scaling Summary Matrix
| Autoscaler | Target | Your Analogy | Primary Use Case |
| HPA | Pod Quantity | Opening more checkout lanes | Handling spikes in web traffic or queue processing. |
| VPA | Pod Size (CPU/RAM) | Giving workers a bigger shovel | Rightsizing databases or legacy single-threaded apps. |
| Karpenter | Node Infrastructure | Expanding the store building | Just-in-time, cost-optimized physical server provisioning. |