AWS Native Storage in EKS EBS & EFS
To understand storage in Kubernetes, you need to understand three key concepts: The StorageClass, the PV, and the PVC.
1. The Storage Workflow (The Restaurant Analogy)
- StorageClass (The Menu): This defines what kind of storage is available (e.g., “Fast SSD,” “Cheap HDD,” or “Shared Network Drive”).
- PersistentVolume (PV) (The Food in the Kitchen): This is the actual physical storage piece (like an AWS EBS volume).
- PersistentVolumeClaim (PVC) (The Order): This is the request made by the user. You don’t ask for a specific hard drive; you say, “I want 10GB of Fast SSD storage.” Kubernetes then goes to the “Menu” (StorageClass), creates the “Food” (PV), and gives it to you.
2. AWS Integration: EBS vs. EFS
On AWS EKS, you primarily use two types of storage, which differ greatly in how they connect to your nodes:
Amazon EBS (Elastic Block Store)
- Performance: Extremely fast, low latency.
- Access Mode: ReadWriteOnce (RWO). This means it can only be attached to one node at a time.
- Limitation: It is bound to a specific Availability Zone (AZ). If your pod dies and reschedules to a node in a different AZ, it cannot attach to the original EBS volume.
- Use Case: High-performance, single-writer workloads like databases (MySQL, PostgreSQL, MongoDB).
Amazon EFS (Elastic File System)
- Performance: Slightly higher latency than EBS, but highly flexible and scalable.
- Access Mode: ReadWriteMany (RWX). Hundreds of pods across different nodes—and even across different Availability Zones—can read and write to the same file system simultaneously.
- Use Case: Shared workloads like media uploads, content management systems (WordPress), or distributed log processing.
3. The Translator: CSI Drivers
Kubernetes doesn’t know how to talk to AWS directly out of the box. It needs a translator.
The Container Storage Interface (CSI) Driver: This is a plugin you install on your EKS cluster that bridges the gap. It allows Kubernetes to call the AWS API to automatically create, attach, detach, and delete EBS or EFS volumes dynamically as your PVCs dictate.
Let’s tackle both! Seeing the actual code makes these abstract concepts tangible, and StatefulSets are exactly where all this storage theory comes together for real-world applications like databases.
1. The YAML in Action (EBS Example)
Here is how the “Restaurant Analogy” looks in actual Kubernetes YAML manifests.
Step 1: The Menu (StorageClass) This is usually set up by the cluster administrator. It tells Kubernetes to use the AWS EBS CSI driver.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-sc
provisioner: ebs.csi.aws.com # The CSI Driver (The Translator)
volumeBindingMode: WaitForFirstConsumer # Waits for a pod to be created before making the EBS volume
parameters:
type: gp3 # General Purpose SSD on AWS
Step 2: The Order (PersistentVolumeClaim) This is what you, the developer, create. Notice how it references the ebs-sc menu.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-ebs-claim
spec:
accessModes:
- ReadWriteOnce # Standard for EBS (one node at a time)
storageClassName: ebs-sc # Pointing to the StorageClass above
resources:
requests:
storage: 10Gi # Asking for 10GB of storage
Step 3: The Consumer (The Pod) Finally, you create a pod that “eats” the order by mounting the PVC into its file system.
apiVersion: v1
kind: Pod
metadata:
name: my-database-pod
spec:
containers:
- name: my-db-container
image: mysql:8.0
volumeMounts:
- name: data-volume
mountPath: /var/lib/mysql # Where the DB saves its files
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: my-ebs-claim # Linking the Pod to the PVC
2. Enter the StatefulSet: Why Deployments Aren’t Enough
If you use a standard Kubernetes Deployment to spin up three replicas of a database, all three Pods will try to connect to the exact same PVC. Because EBS is ReadWriteOnce, this will fail entirely—only the first Pod will attach, and the others will crash.
To solve this, Kubernetes provides the StatefulSet.
StatefulSets are designed specifically for applications that need persistent, unique identities and storage (like databases). Here is how they handle storage differently:
- Stable Network Identity: Instead of random hash names (like
db-pod-7bx92), pods get predictable names (db-pod-0,db-pod-1). - VolumeClaimTemplates: This is the magic feature. Instead of pointing all pods to a single PVC, a StatefulSet includes a template. Every time it spins up a new Pod replica, it automatically generates a brand-new, unique PVC (and therefore a unique EBS volume) just for that specific Pod.
- Sticky Storage: If
db-pod-0crashes, Kubernetes spins up a newdb-pod-0and forcefully reconnects it to its original EBS volume. It never loses its data.
Let’s go ahead and look at both! Seeing the volumeClaimTemplate makes the StatefulSet concept click, and understanding backups is crucial before running any database in production.
1. The StatefulSet Magic: VolumeClaimTemplates
When you write a StatefulSet, you don’t create a separate PVC manifest. Instead, you embed a “blueprint” for the PVC directly inside the StatefulSet manifest.
Here is what that looks like:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql-db
spec:
serviceName: "mysql"
replicas: 3 # We want 3 database pods
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
volumeMounts:
- name: data # This must match the name in the template below
mountPath: /var/lib/mysql
# The Magic Happens Here:
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "ebs-sc" # Points to our AWS EBS menu
resources:
requests:
storage: 10Gi
What happens when you apply this? Kubernetes reads the replicas: 3 and spins up three pods (mysql-db-0, mysql-db-1, mysql-db-2). Because of the volumeClaimTemplates, it automatically generates three separate 10GB PVCs, provisions three separate AWS EBS volumes, and attaches one to each pod.
2. Protecting Your Data: VolumeSnapshots
Now that you have databases running on EBS volumes, how do you back them up? You don’t want to log into the AWS console to do it manually. Kubernetes handles this natively using VolumeSnapshots.
Just like standard storage, snapshots use a similar three-part architecture:
- VolumeSnapshotClass: The “Menu” for snapshots (e.g., “Use the AWS EBS CSI driver to take a snapshot”).
- VolumeSnapshot: The “Order” (e.g., “Take a snapshot of
mysql-db-0‘s PVC right now”). - VolumeSnapshotContent: The actual physical snapshot residing in AWS (an EBS Snapshot in AWS S3).
Here is how simple it is to request a backup of your specific PVC:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: mysql-db-0-snapshot
spec:
volumeSnapshotClassName: ebs-snapshot-class # Points to the snapshot menu
source:
persistentVolumeClaimName: data-mysql-db-0 # The exact PVC we want to back up
Once you apply this, the CSI driver tells AWS, “Take a point-in-time snapshot of the EBS volume attached to this PVC.” If your database ever gets corrupted, you can create a brand-new PVC that restores its data directly from this mysql-db-0-snapshot!