Kubernetes StatefulSet
The Guardian of Data & Identity
If you are just starting with Kubernetes, you might feel that managing data (state) is a bit tricky. Simply put, a StatefulSet is a special controller in Kubernetes used for applications that need to “remember” things.
Unlike standard applications (stateless) where any pod can be replaced by another without issue, stateful applications (like databases) care very much about identity. Each pod in a StatefulSet has a unique name and a fixed “seat” in the cluster. If it crashes, it comes back with the exact same name and the exact same storage. It is like having a reserved parking spot no matter how many times you leave and come back, that spot is yours.
Think of a Deployment (stateless) like a General Compartment in an Indian Train.
- Passengers (Pods) can sit anywhere.
- If one passenger leaves and another enters, it doesn’t matter who sits where.
- They are interchangeable. You don’t need to know their specific names; you just need to know the train is full.
Now, think of a StatefulSet like a Reserved AC Coach.
- Each passenger has a specific seat number (Ordinal Index: 0, 1, 2…).
- Passenger A must always sit in Seat 1. If Passenger A leaves for a moment, no one else can take Seat 1. When they come back, they return to Seat 1.
- Order matters. You typically board in order (Seat 1, then Seat 2) and deboard in reverse.
Key Characteristics
Here are the key things you must remember. Keep this handy for your interviews.
- Sticky Identity: Pods have a fixed name (e.g.,
web-0,web-1). - Ordered Creation: Pods start one by one (0 → 1 → 2).
- Ordered Deletion: Pods terminate in reverse order (2 → 1 → 0).
- Stable Storage: If a Pod dies, its Hard Disk (PVC) stays there. When the Pod comes back, it re-attaches to the same disk.
| Feature | Deployment (Stateless) | StatefulSet (Stateful) |
| Pod Name | Random (e.g., web-789xyz) | Predictable (e.g., web-0) |
| Storage | Ephemeral or Shared | Unique per Pod (VolumeClaimTemplate) |
| Network ID | Changing IP | Stable DNS Name |
| Use Case | Web Servers, APIs | Databases (MySQL, Kafka, Cassandra) |
In the Kubernetes ecosystem, the StatefulSet is the workload API object specifically designed to manage stateful applications. While Deployments are excellent for stateless apps, they fail when an application requires a unique network identifier, stable persistent storage, or ordered deployment scaling.
The Stateful Controller is the brain behind this. It runs a reconciliation loop that watches the desired state (what you wrote in YAML) and the actual state (what is running). It ensures that if you ask for 3 replicas, you get pod-0, pod-1, and pod-2. Crucially, it creates a PersistentVolumeClaim (PVC) for each pod automatically using a template. This ensures that pod-0 always gets pvc-0. Even if pod-0 is rescheduled to a completely different node in the cluster, the controller ensures pvc-0 follows it. This “stickiness” is the magic that allows us to run complex databases on Kubernetes.
To get started, you only need to understand three moving parts:
- Headless Service: This is a service without a ClusterIP. It allows us to reach specific pods directly (like talking to
web-0directly instead of a random pod). - VolumeClaimTemplates: Instead of manually making a PVC, we give a template. The StatefulSet uses this to “stamp out” a new disk for every new pod.
- Selector: Just like Deployments, we use labels to tell the controller which pods belong to it.
DevSecOps Engineer Level
Now let us go a bit deeper.
- Stable Network Identity: Every pod gets a DNS name in this format:
<pod-name>.<service-name>.<namespace>.svc.cluster.local.- Example:
mysql-0.mysql-svc.default.svc.cluster.local. - This is critical for “Service Discovery” in clusters like Cassandra or ZooKeeper, where nodes need to find each other by name.
- Example:
- Pod Management Policies:
OrderedReady(Default): Strict ordering. Pod 1 won’t start until Pod 0 is fully ready.Parallel: Good for things where state matters but order doesn’t (like a sharded database where shards are independent). All pods start at once.
- Identity assignment: The controller assigns an integer ordinal (0 to N-1). This is not just a counter; it is a profound identity index stored in the Pod’s name and hostname.
- ControllerRevision: Kubernetes uses
ControllerRevisionobjects to track the history of the StatefulSet (similar to ReplicaSets for Deployments). This allows for rolling updates and rollbacks. When you update a StatefulSet, it creates a new revision hash. - Update Strategy (RollingUpdate):
- Partitioning: You can perform a “Canary” rollout by setting a
partitionnumber. If you have 10 pods and set partition to 8, only pods 8 and 9 will update. Pods 0-7 will stay on the old version. This is a lifesaver for checking if a database upgrade will break things before upgrading the whole cluster.
- Partitioning: You can perform a “Canary” rollout by setting a
- Storage Isolation: The Controller creates PVCs but never deletes them. This is a safety feature. If you accidentally delete the StatefulSet
mysql, thepvc-mysql-0remains. You must manually delete the PVCs if you want to wipe the data. This prevents accidental data loss during simple scaling down operations.
In a Deployment, if a Node dies, K8s immediately starts a new Pod elsewhere. In a StatefulSet, K8s is very careful. It will NOT start pod-0 on a new node until it is 100% sure the old pod-0 is dead. Why? Because of the “Split-Brain” problem.
- If two
pod-0instances run at the same time and write to the same file, your data gets corrupted. - Solution: If a node is permanently dead (e.g., hardware failure), the StatefulSet pod will stay in “Terminating” or “Unknown” state forever. You, the admin, must manually “Force Delete” the pod (
kubectl delete pod <pod-name> --force) to tell K8s.
Key Components:
- Headless Service: Controls the network domain.
- StatefulSet Object: The definition file (YAML).
- VolumeClaimTemplate: The blueprint for storage.
Use Cases:
- Databases: MySQL, PostgreSQL, MongoDB (Primary/Replica setups).
- Big Data: Kafka, Elasticsearch, Hadoop.
- Coordination: Zookeeper, Etcd.
Benefits:
- Automatic storage provisioning.
- Graceful, ordered startup/shutdown (prevents database corruption).
- Consistent DNS names for internal communication.
Common Issues:
- Stuck in “Terminating”: As mentioned, if a Node fails, the Pod won’t move automatically.
- Solution: Force delete the pod or fix the node.
- Volume Resizing: Expanding the storage of a StatefulSet is painful. You cannot simply edit the template.
- Solution: You often have to update the PVCs manually or use advanced operators.
- Slow Rollouts: Since it updates one by one, a large cluster (e.g., 50 Kafka brokers) can take hours to update.
Limitations:
- Requires a storage provisioner (CSI) that supports dynamic provisioning.
- More complex to debug than stateless apps.
- https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
- https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/
- https://kubernetes.io/docs/tasks/debug/debug-application/debug-statefulset/
Practice creating Statefulset
1. Headless Service (service.yaml)
This service is “Headless” because it does not have a single IP address (ClusterIP) for load balancing. Instead, it returns the IP of every single Pod directly. This is required for StatefulSets so that we can find web-0, web-1, etc., individually.
# API version for Service objects
apiVersion: v1
kind: Service
metadata:
name: nginx # Name of the service. This forms part of the DNS name (e.g., web-0.nginx)
labels:
app: nginx
spec:
ports:
- port: 80
name: web # Name of the port, good for reference in other resources
# CRITICAL: Setting clusterIP to None makes this a "Headless Service".
# Instead of giving one single VIP (Virtual IP) for all pods,
# DNS will return the specific IP of *each* pod.
# This allows direct communication with specific pods (like connecting only to the Primary DB).
clusterIP: None
selector:
app: nginx # This must match the labels in the StatefulSet Pod Template below
2. StatefulSet Definition (statefulset.yaml)
This file defines the workload. It tells Kubernetes to create pods in order (0 1 2) and attach a unique storage disk to each one.
# API version for StatefulSet (apps/v1 is stable)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web # Name of the StatefulSet. Pods will be named web-0, web-1, web-2.
spec:
# This links the StatefulSet to the Headless Service we created above.
# It is used to generate the DNS names for the pods: <pod-name>.<service-name>
serviceName: "nginx"
# How many copies do you want?
# They will be created strictly in order: web-0 first, then web-1, then web-2.
replicas: 3
# This selector tells the controller which pods belong to this StatefulSet.
# It must match the labels in the 'template' section below.
selector:
matchLabels:
app: nginx
# --- POD TEMPLATE STARTS HERE ---
# This defines what runs inside each Pod.
template:
metadata:
labels:
app: nginx # Matches the selector above
spec:
containers:
- name: nginx
# We are using a small Nginx image for this demo
image: registry.k8s.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
# This tells the container: "Take the volume named 'www' and mount it at this path."
# If the pod dies and restarts, this mount re-connects automatically.
- name: www
mountPath: /usr/share/nginx/html
# --- STORAGE MAGIC STARTS HERE ---
# distinct from Deployments, we do NOT manually create PVCs.
# We provide a "Template". The StatefulSet Controller uses this to
# automatically create a unique PVC for EACH pod (e.g., www-web-0, www-web-1).
volumeClaimTemplates:
- metadata:
name: www # This name must match the 'volumeMounts' name above
spec:
accessModes: [ "ReadWriteOnce" ] # Means only one node can mount this disk (typical for DBs)
# StorageClass is omitted here, so it uses the cluster's "Default" storage class.
# On cloud (AWS/GCP/Azure), this will provision a real block storage disk (EBS/PD/Managed Disk).
resources:
requests:
storage: 1Gi # Each pod gets its OWN 1GB disk. Total = 3GB for 3 replicas.
3. How to Verify (Commands for the Lab)
1: Apply the files:
kubectl apply -f service.yaml
kubectl apply -f statefulset.yaml2: Watch the “Ordered Creation” in real-time:
kubectl get pods -wYou will see web-0 change to Running before web-1 even appears. This proves the strict ordering!
3: Check the Persistent Volumes:
kubectl get pvcYou will see distinct claims: www-web-0, www-web-1, www-web-2. This proves every pod has its own dedicated data.