AI/ML Workloads on EKS

PostedDecember 26, 2024

UpdatedMarch 3, 2026

Author -Rajkumar Aute

1. GPU Provisioning: The “Heavy Lifters” Standard EC2 instances use CPUs, which are like smart professors—they can do anything, but one thing at a time. GPUs (Graphics Processing Units) are like an army of 5,000 students—they are simpler, but they can all do math at the same time.

GPU-Optimized AMIs: You can’t use a standard Linux image for AI. You must use the EKS-Optimized Accelerated AMI, which comes pre-installed with NVIDIA drivers.
NVIDIA Device Plugin: Kubernetes doesn’t natively “see” GPUs. You must install this plugin (usually via a DaemonSet) so that the API server can track nvidia.com/gpu as a resource, just like CPU and Memory.

2. Hosting LLMs: vLLM vs. Ollama In 2026, we don’t just “run” a model; we use Inference Engines to make them fast.

vLLM (The Production Giant): This is the high-performance choice. It uses a technology called PagedAttention to handle thousands of users simultaneously without running out of memory. It’s OpenAI-compatible, meaning you can drop it into existing apps easily.
Ollama (The Developer Friend): Great for local testing or internal tools. It packages models into a “Docker-like” format, making it incredibly easy to pull and run a model (e.g., ollama run llama3.1) with one command.

3. Data on EKS: The Throughput Problem Training an AI model is like trying to drink water from a firehose. If your storage is slow (like standard EBS), your expensive $30,000 GPU will sit idle waiting for data.

Amazon FSx for Lustre: This is the 2026 standard for AI storage. It is a “Parallel File System” that can deliver hundreds of gigabytes per second. It integrates directly with S3, acting as a high-speed cache for your massive datasets.

Tech should learn

AWS(Draft)

AWS-Cloud-Tech

AWS-Compute

DevOps Essentials

DevSecOps Essentials(Draft)

CI/CD

GitHub Actions

Docker

Kubernetes (Draft)

The Kubernetes Foundation

Kubernetes Architecture

Kubernetes Setting Up the Lab

Kubernetes Namespace

Kubernetes Pod

Kubernetes Workload Controller

Kubernetes Storage and Configurations

Kubernetes Networking

Kubernetes Authentication & Authorization

AWS Elastic Kubernetes Service

EKS Architecture

AWS EKS Identity & Access Management

EKS Configuration & Storage

EKS Workload Controllers

EKS Advanced Networking & Traffic Management

EKS Workload Security

EKS Observability & Troubleshooting

EKS CI/CD, GitOps

EKS Platform Engineering

EKS Cluster Upgrades & Reliability

EKS AI, ML, LLMs

Programming

Python

AI/ML Workloads on EKS