-
Tech should learn
-
AWS(Draft)
-
DevOps Essentials
- DevOps Essentials
- 1. What DevOps really is
- 2. Life before DevOps
- 3. DevOps SDLC.
- 4. DevOps principles
- 6. DevOps Metrics
- 7. DevOps Leadership - People & Change leadership
- 8. Designing a DevOps transformation strategy.
- 9. DevSecOps - Security Embedded into DevOps
- 10. Site Reliability Engineering (SRE).
-
DevSecOps Essentials(Draft)
-
CI/CD
-
Docker
- Docker Mastery
- 1. The Compute Evolution Physical vs. Virtual vs. Containerization
- 2. Docker Internals
- 3. Docker Image Engineering
- 4. Registries and The Secure Supply Chain
- 5. Multi-Container Orchestration - Docker Compose
- 6. Docker Networking: The Connectivity Matrix
- 7. Docker Storage: The Persistence Layer
- 8. Docker Observability: The Eyes and Ears of Your Microservices
- 9. Hardening Security for Containers
- Writing Dockerfile
- Docker Commands
-
Kubernetes (Draft)
-
- Kubernetes ConfigMaps for Decoupling Configuration
- Kubernetes Secrets for Decoupling Configuration
- Kubernetes Downward API for Decoupling Configuration
- Kubernetes Volumes
- Kubernetes PV & PVC
- Kubernetes StorageClasses
- Kubernetes Volume Snapshots
- Kubernetes Volume Expansion using PVC and StorageClass
- Kubernetes Secrets Management at Scale
-
AWS Elastic Kubernetes Service
-
Programming
-
Python
< All Topics
AI/ML Workloads on EKS
Posted
Updated
Author -Rajkumar Aute
Views4
1. GPU Provisioning: The “Heavy Lifters” Standard EC2 instances use CPUs, which are like smart professors—they can do anything, but one thing at a time. GPUs (Graphics Processing Units) are like an army of 5,000 students—they are simpler, but they can all do math at the same time.
- GPU-Optimized AMIs: You can’t use a standard Linux image for AI. You must use the EKS-Optimized Accelerated AMI, which comes pre-installed with NVIDIA drivers.
- NVIDIA Device Plugin: Kubernetes doesn’t natively “see” GPUs. You must install this plugin (usually via a DaemonSet) so that the API server can track
nvidia.com/gpuas a resource, just like CPU and Memory.
2. Hosting LLMs: vLLM vs. Ollama In 2026, we don’t just “run” a model; we use Inference Engines to make them fast.
- vLLM (The Production Giant): This is the high-performance choice. It uses a technology called PagedAttention to handle thousands of users simultaneously without running out of memory. It’s OpenAI-compatible, meaning you can drop it into existing apps easily.
- Ollama (The Developer Friend): Great for local testing or internal tools. It packages models into a “Docker-like” format, making it incredibly easy to pull and run a model (e.g.,
ollama run llama3.1) with one command.
3. Data on EKS: The Throughput Problem Training an AI model is like trying to drink water from a firehose. If your storage is slow (like standard EBS), your expensive $30,000 GPU will sit idle waiting for data.
- Amazon FSx for Lustre: This is the 2026 standard for AI storage. It is a “Parallel File System” that can deliver hundreds of gigabytes per second. It integrates directly with S3, acting as a high-speed cache for your massive datasets.
Contents