Kubernetes Volumes

PostedDecember 26, 2021

UpdatedFebruary 18, 2026

Author -Rajkumar Aute

Imagine you are writing a document on a computer. If you restart that computer, your document is gone forever because the computer resets itself. This is how Pods work in Kubernetes they are temporary. If a Pod crashes or restarts, any data saved inside it is lost.

Kubernetes Volumes solve this problem. They are like attaching an external hard drive or a USB stick to that computer. Even if the computer (Pod) restarts, your data stays safe on the external drive (Volume). Volumes allow your applications to store data permanently (persistence) or share data between different containers.

Let’s use a “Laptop & Storage” analogy to understand the components:

The Pod (Your Laptop): It does the processing. If it breaks, you get a new one, but the internal hard drive is wiped.
Volume (USB Stick): Good for quick file transfers. If you plug it into the laptop, you can read/write. If the laptop dies, the USB stick might still be there, or it might get wiped depending on the type.
PersistentVolume – PV (External Hard Drive Locker): This is a physical hard drive sitting in a server room (or cloud). It exists independently of your laptop. It’s the actual storage hardware.
PersistentVolumeClaim – PVC (The Ticket): This is a “claim ticket” you give to the IT admin. It says, “I need 10GB of storage.” The admin takes your ticket, finds a matching hard drive (PV), and plugs it into your laptop.
StorageClass (The Service Plan): Instead of asking an admin manually, you have a “Gold,” “Silver,” or “Bronze” plan.
- Gold: Super fast SSD (automatically provisions a high-speed PV).
- Bronze: Cheap HDD (automatically provisions a slower PV).

Key Characteristics to Remember

“Pods are ephemeral; Data should be persistent.” Always remember this rule.
“PVC requests; PV provides.” The Pod talks to the PVC, not the PV directly.
“Access Modes matter.” RWO (ReadWriteOnce) = 1 Node only. RWX (ReadWriteMany) = Many Nodes (like NFS).
“Reclaim Policy determines the end.” Delete = Data gone when PVC is deleted. Retain = Data stays for manual recovery.

Component	Simple Explanation	Lifetime
Volume	Directory with data accessible to containers in a Pod.	Dies with the Pod (mostly).
PersistentVolume (PV)	The actual piece of storage (disk/SSD).	Indefinite (exists until deleted).
PersistentVolumeClaim (PVC)	Request for storage (like a voucher).	Lives until the user deletes it.
StorageClass (SC)	Template to create PVs automatically.	Permanent config.
CSI	The standard interface for storage vendors.	Permanent.

In Kubernetes, storage is decoupled from the compute (Pod). When you deploy a standard application (like a web server), it doesn’t need to remember anything. But for a database (like MySQL), you need data to survive.

The Workflow:

Administrator sets up a StorageClass (defines the type of storage, e.g., AWS EBS or Google Persistent Disk).
Developer creates a PersistentVolumeClaim (PVC) asking for “5Gi” of space.
Kubernetes looks at the PVC and the StorageClass. It talks to the cloud provider (via CSI Driver) and dynamically creates a PersistentVolume (PV).
The PV is “Bound” to the PVC.
The Pod is created. In its configuration, it references the PVC.
The Kubelet on the node mounts the actual disk (PV) into the Pod container.

If the Pod dies, the PVC and PV remain. A new Pod starts, grabs the existing PVC, and the data is back!

`emptyDir` Details:

Think of emptyDir as a temporary workspace created just for your Pod. It’s like a blank whiteboard in a meeting room. When the meeting (Pod) is over, the whiteboard is wiped clean. It is perfect for temporary data that doesn’t need to be saved forever.

Creation: It is created the moment a Pod lands on a Node.
Deletion: It is deleted permanently when the Pod is removed from the Node.
RAM Disk: You can tell Kubernetes to store emptyDir in the RAM (Memory) instead of the Hard Disk for super-fast speed.

Advanced emptyDir Configuration: You can control where the emptyDir stores data using the medium field.

medium: "" (Default): Uses the node’s backing storage (Disk/SSD). Slower, but larger capacity.
medium: "Memory": Mounts a tmpfs (RAM disk).
- Pros: Extremely fast IO.
- Cons: Data is lost on reboot. The size counts against the container’s Memory Limit. If you fill up the RAM disk, Kubernetes might kill your Pod for using too much memory!

Security Context & emptyDir:

fsGroup: When using emptyDir, multiple containers might have different users (UIDs). To ensure they can all read/write to the shared volume, set the securityContext.fsGroup at the Pod level. Kubernetes will change the group ownership of the volume to match this ID.

–

`hostPath` Details:

Think of hostPath as a secret door. It allows your Pod to open a door into the server (Node) it is running on and see the files stored there. This is powerful but dangerous because if you change something important on the server through that door, you could crash the whole machine!

Direct Access: It bypasses the container runtime isolation.
Persistence: If you delete the Pod, the file on the Node stays there. If you create a new Pod on the same Node, it can see that file again.

Advanced hostPath Configuration: The type field is crucial for safety. Don’t just leave it empty!

DirectoryOrCreate: If the path doesn’t exist on the Node, K8s creates it (permission 755).
Directory: The path must exist; otherwise, the Pod fails to start.
File: Mounts a specific file.
Socket: Useful for mounting Docker socket (/var/run/docker.sock) to run Docker-in-Docker (very dangerous!).

Security Context & hostPath (CRITICAL):

Privileged Escalation: A hacker who compromises a container with hostPath access to / (root) effectively owns the entire server.
Mitigation:
- Use Pod Security Standards (PSS) or OPA Gatekeeper to block hostPath usage in standard namespaces.
- Mount as ReadOnly: Always set readOnly: true in volumeMounts unless writing is strictly necessary.
- SELinux: If SELinux is enabled on the node, hostPath mounts might be blocked unless labeled correctly (refer to z or Z options in Docker, though K8s handles this differently via SecurityContext).

Use Cases

emptyDir:
- Checkpointing: A long-running calculation saves its progress here. If the app crashes, it restarts and reads the progress from emptyDir (since the Pod didn’t die, just the container).
- Git Repo Sync: One container (InitContainer) pulls code from Git into emptyDir; the main container runs the code.
hostPath:
- Fluentd/Logstash: Needs to access /var/log/containers to ship logs to Elasticsearch.
- Container Network Interface (CNI): Plugins often need access to /etc/cni/net.d.

Best Practices

For emptyDir: Always set a sizeLimit to prevent a runaway process from filling the Node’s disk.
For hostPath: limit usage to DaemonSets (system agents) only. Never use it for standard Deployments.
Avoid hostPath: Try to use LocalPersistentVolumes instead of hostPath if you need persistent local storage. It is safer and the scheduler understands it better.

CSI (Container Storage Interface): This is the industry standard for exposing arbitrary block and file storage systems to containerized workloads. It allows vendors (NetApp, Pure, AWS, Azure) to update their storage plugins without waiting for a Kubernetes release.
- Architecture: Includes a CSI Controller (runs as a Deployment, talks to cloud API) and a CSI Node (runs as DaemonSet, mounts drives on nodes).
Volume Cloning: Creating a new PVC with dataSource pointing to an existing PVC. This creates an exact clone of the data instantly.
Security Context (fsGroup): When a volume is mounted, ownership permissions can be tricky. Using securityContext in the Pod definition ensures the group ID (fsGroup) owns the files on the volume so the container can write to them.

Rook/Ceph: Open-source cloud-native storage orchestrator. Rook.io
Velero: Essential for backup and restore of PVs and Cluster resources. Velero.io
Longhorn: Lightweight distributed block storage for K8s. Longhorn.io

Use Cases

StatefulSets: Use PVs for databases (MongoDB, PostgreSQL, Cassandra) where each instance needs its own dedicated identity and storage.
CI/CD: Shared RWX volumes for Jenkins agents to share build artifacts.
Logging: Centralized log collection writing to a persistent volume.

Best Practices

Always use Dynamic Provisioning: Don’t create PVs manually unless strictly necessary.
Use StatefulSets for DBs: Don’t use Deployments for databases; StatefulSet handles storage stability better.
Monitor Disk Usage: K8s doesn’t automatically stop you from filling up a disk. Use Prometheus to alert on PVC usage.
Label your PVCs: helps in billing and organization.

https://kubernetes.io/docs/concepts/storage/persistent-volumes

https://kubernetes.io/docs/concepts/storage/storage-classes

https://kubernetes.io/docs/concepts/storage/volume-snapshots

Tech should learn

AWS(Draft)

AWS-Cloud-Tech

AWS-Compute

DevOps Essentials

DevSecOps Essentials(Draft)

CI/CD

GitHub Actions

Docker

Kubernetes (Draft)

The Kubernetes Foundation

Kubernetes Architecture

Kubernetes Setting Up the Lab

Kubernetes Namespace

Kubernetes Pod

Kubernetes Workload Controller

Kubernetes Storage and Configurations

Programming

Python

Kubernetes Volumes

Key Characteristics to Remember

`emptyDir` Details:

`hostPath` Details:

Use Cases

Best Practices

Use Cases

Best Practices

Tech should learn

AWS(Draft)

AWS-Cloud-Tech

AWS-Compute

DevOps Essentials

DevSecOps Essentials(Draft)

CI/CD

GitHub Actions

Docker

Kubernetes (Draft)

The Kubernetes Foundation

Kubernetes Architecture

Kubernetes Setting Up the Lab

Kubernetes Namespace

Kubernetes Pod

Kubernetes Workload Controller

Kubernetes Storage and Configurations

Programming

Python

Kubernetes Volumes

Key Characteristics to Remember

emptyDir Details:

hostPath Details:

Use Cases

Best Practices

Use Cases

Best Practices

`emptyDir` Details:

`hostPath` Details: