EKS Auto-Scaling HPA, VPA, and Karpenter

PostedDecember 26, 2021

UpdatedMarch 3, 2026

Author -Rajkumar Aute

In Kubernetes, scaling happens at two levels: the Pod level (making your app bigger or more numerous) and the Node level (adding more physical servers to the cluster).

1. HPA: Beyond Just CPU and Memory

While CPU and Memory are the default triggers, modern HPA strategies rarely rely on them alone. CPU usage can be a lagging indicator (by the time CPU hits 80%, your users might already be experiencing latency).

The Pro Move: Teams use KEDA (Kubernetes Event-Driven Autoscaler) alongside HPA. KEDA allows you to scale based on external queues. Think of it as opening more checkout lanes because you see cars pulling into the parking lot, rather than waiting for the line inside the store to get long.

2. VPA: The “Safe” Way to Use It

You correctly pointed out that VPA usually requires a Pod restart, which makes teams hesitant to use it in production for user-facing apps.

The Pro Move: Most teams run VPA in “Recommendation Mode” (Off Mode). In this mode, VPA acts purely as an advisor. It watches your pods and tells you, “Hey, you requested 2 CPUs, but this app never uses more than 0.5.” Engineers then take these recommendations and manually update their deployment manifests. This saves money without risking unexpected mid-day pod restarts.

3. The Golden Rule: Don’t Cross the Streams

A classic mistake engineers make is enabling HPA and VPA on the exact same metric (like CPU) for the same Deployment.

The Conflict: If CPU spikes, HPA tries to add more pods to lower the average load. Simultaneously, VPA tries to kill the existing pods to give them a larger CPU limit. They end up fighting each other, leading to scaling loops and broken apps. If you use both, HPA must scale on a custom metric (like web requests), while VPA manages the CPU/Memory sizes.

4. Karpenter: Why it’s the 2026 Standard

Your assessment of Karpenter is perfect. The old “Cluster Autoscaler” relied on rigid “Auto Scaling Groups” (ASGs). You had to pre-define a group of, say, exactly m5.large instances.

The Pro Move: Karpenter is completely “group-less.” It uses Bin-packing. When HPA demands 50 new pods, Karpenter looks at the exact CPU and Memory requirements of those specific pods. It then talks directly to the cloud provider’s API and buys the absolute cheapest, perfectly sized instances (maybe one massive instance, or three smaller spot instances) to fit those pods like Tetris blocks.
Consolidation: Karpenter also works in reverse. If it notices your pods are scattered across several half-empty nodes, it will proactively evict those pods, pack them tightly onto fewer nodes, and delete the empty servers to instantly cut your cloud bill.

Kubernetes Scaling Summary Matrix

Autoscaler	Target	Your Analogy	Primary Use Case
HPA	Pod Quantity	Opening more checkout lanes	Handling spikes in web traffic or queue processing.
VPA	Pod Size (CPU/RAM)	Giving workers a bigger shovel	Rightsizing databases or legacy single-threaded apps.
Karpenter	Node Infrastructure	Expanding the store building	Just-in-time, cost-optimized physical server provisioning.

Tech should learn

AWS(Draft)

AWS-Cloud-Tech

AWS-Compute

DevOps Essentials

DevSecOps Essentials(Draft)

CI/CD

GitHub Actions

Docker

Kubernetes (Draft)

The Kubernetes Foundation

Kubernetes Architecture

Kubernetes Setting Up the Lab

Kubernetes Namespace

Kubernetes Pod

Kubernetes Workload Controller

Kubernetes Storage and Configurations

Kubernetes Networking

Kubernetes Authentication & Authorization

AWS Elastic Kubernetes Service

EKS Architecture

AWS EKS Identity & Access Management

EKS Configuration & Storage

EKS Workload Controllers

EKS Advanced Networking & Traffic Management

EKS Workload Security

EKS Observability & Troubleshooting

EKS CI/CD, GitOps

EKS Platform Engineering

EKS Cluster Upgrades & Reliability

EKS AI, ML, LLMs

Programming

Python