Skip to main content
< All Topics

EKS Data Plane / Worker Nodes

EKS Worker Node

A worker node is a virtual machine (Amazon EC2 instance) or serverless compute environment (AWS Fargate) that provides the compute, memory, and storage resources required to run your Kubernetes Pods. They register themselves with the EKS Control Plane and continuously communicate with it to receive pod scheduling instructions.

Core Components of a Worker Node

Every EC2-based EKS worker node runs three critical background processes:

  • Kubelet: The primary “node agent.” It communicates with the EKS Control Plane, ensuring that the containers described in your PodSpecs are running and healthy.
  • Kube-proxy: Maintains network rules on the node, allowing network communication to your Pods from network sessions inside or outside of your cluster.
  • Container Runtime: The software responsible for running containers. EKS standardizes on containerd (Docker is no longer supported as the underlying runtime).

Types of Compute Options in EKS

AWS offers three primary ways to provision and manage worker nodes. Choosing the right one depends on your operational overhead preference and specific use cases.

FeatureManaged Node Groups (MNG)Self-Managed NodesAWS Fargate
ManagementAWS handles provisioning, lifecycle, and updates.You manage EC2 Auto Scaling Groups (ASG) and updates manually.Serverless; no underlying EC2 instances to manage.
OS CustomizationSupports standard EKS AMIs and custom AMIs.Full control over the OS and bootstrapping process.Managed by AWS (Amazon Linux only).
AccessSSH access allowed (if configured).Full SSH and root access.No SSH access allowed.
PricingStandard EC2 pricing (Spot or On-Demand).Standard EC2 pricing (Spot or On-Demand).Pay per vCPU and memory allocated per Pod.
Best ForMost standard workloads. Balances control with ease of use.Legacy apps, deep OS-level tweaking, or extreme custom compliance.Serverless workloads, bursty traffic, zero-maintenance requirements.

Choosing the Right Operating System (AMI)

When using EC2 instances for your worker nodes, you must choose an Amazon Machine Image (AMI). AWS provides several EKS-optimized options out of the box.

A. Amazon EKS Optimized Amazon Linux (AL2023 / AL2)

This is the default and most common choice. It comes pre-configured with the AWS CLI, containerd, kubelet, and the AWS IAM Authenticator. AL2023 is the modern standard, offering better performance and security over the legacy AL2.

B. Bottlerocket

Bottlerocket is a Linux-based open-source OS purpose-built by AWS specifically for running containers.

  • Why use it? It has a drastically reduced attack surface (no shell, no SSH by default), boots faster, and updates are applied in a single transactional atomic change. It is highly recommended for production security.

C. Other OS Options

  • Ubuntu: Canonical provides EKS-optimized Ubuntu AMIs for teams familiar with Debian-based systems.
  • Windows Server: Supported for teams needing to run Windows-based containers alongside Linux nodes.

Scaling Worker Nodes

Scaling your nodes efficiently is critical to ensure performance during traffic spikes and cost savings during low usage.

There are two primary tools used for node scaling in EKS:

Cluster Autoscaler (The Traditional Method)

  • How it works: It watches for pods that fail to schedule due to insufficient resources and increases the size of your EC2 Auto Scaling Group (ASG).
  • Limitations: It relies heavily on ASGs, meaning it is tied to specific instance types and availability zones. Scaling can be slow because it must wait for the ASG to react.

Karpenter (The Modern AWS Standard)

  • How it works: Karpenter is a highly flexible, high-performance Kubernetes cluster autoscaler built by AWS. It bypasses ASGs entirely and provisions EC2 instances directly based on the exact requirements of pending pods.
  • Why use Karpenter?
    • JIT (Just-in-Time) Provisioning: It provisions nodes in seconds.
    • Instance Flexibility: It automatically mixes On-Demand and Spot instances, and chooses the exact instance type/size needed (e.g., graviton vs x86) based on pod requirements.
    • Node Consolidation: It actively monitors cluster utilization and can move pods to smaller, cheaper nodes to save money.

Networking and Security

Amazon VPC CNI

By default, EKS uses the Amazon VPC CNI plugin. This means every pod running on your worker node gets a real IP address from your AWS VPC subnet.

  • Note: This can lead to IP exhaustion in small subnets. If you have many pods, you may need to enable “Prefix Delegation” to attach IP prefixes to ENIs instead of individual IPs.

Node Security Groups

  • Cluster Security Group: Created by EKS to allow communication between the control plane and managed nodes.
  • Node Security Group: Controls inbound and outbound traffic to the EC2 instances. You must ensure port 10250 (kubelet) is open to the control plane.

IAM and Permissions

  • Node IAM Role: The EC2 instance must have an IAM role attached (e.g., AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly) so the kubelet can pull container images from ECR and interact with AWS APIs.
  • Best Practice (IRSA): Do not give your worker node IAM role permissions to interact with your AWS resources (like S3 or DynamoDB). Instead, use IAM Roles for Service Accounts (IRSA) or EKS Pod Identity to grant those permissions directly to the specific pods that need them.

Best Practices for Production Worker Nodes

  1. Use Multi-AZ Deployments: Spread your node groups across at least three Availability Zones to survive data center outages.
  2. Leverage Spot Instances Cautiously: Use Spot instances for stateless, fault-tolerant workloads (like background workers or web APIs) to save up to 90% on compute costs. Keep stateful apps or ingress controllers on On-Demand nodes.
  3. Implement Pod Disruption Budgets (PDBs): Prevent node upgrades or Karpenter consolidation from taking down too many replicas of your application at once.
  4. Monitor Node Health: Integrate Amazon CloudWatch Container Insights or Prometheus/Grafana to track node CPU, memory, and disk utilization. Set alerts for when disk space exceeds 85% to prevent node failure.
  5. Use Managed Node Groups or Karpenter for Upgrades: EKS releases new Kubernetes versions frequently. Managed Node Groups automate the “drain and replace” cycle safely, minimizing downtime.
Contents
Scroll to Top