Kubernetes Volume Snapshots
Imagine you are working on an important document. Before you make big changes, you click “Save As” and create a copy called final-v1.doc. If you mess up final-v2.doc, you can always go back to v1.
Volume Snapshots are exactly that for your Kubernetes storage. A Volume Snapshot is a point-in-time copy of your Persistent Volume (PV). It allows you to:
- Backup: Save the current state of your database or files.
- Restore: If data gets corrupted or deleted, you create a new volume from that snapshot.
- Clone: Create a copy of your production database for testing (Dev/Test environments).
Key Characteristics to Remember
- “Snapshots are for safety; PVCs are for usage.”
- “You cannot mount a Snapshot directly. You must restore it to a PVC first.”
- “Snapshots are usually incremental (cheap), but restores are always full (fast).”
| Feature | Description |
| What is it? | A copy of the storage at a specific second in time. |
| Who handles it? | The CSI (Container Storage Interface) Driver (e.g., AWS EBS, Google PD). |
| Incremental? | Yes! The 1st snapshot is full size. The 2nd snapshot only saves changes made since the 1st. This saves money and time. |
| Location | Snapshots are stored in the Cloud Provider’s object storage (e.g., AWS S3, GCS), not on the disk itself. |
| Scope | Snapshots are Namespaced (just like PVCs). |
Kubernetes did not always have snapshots. In the early days, you had to use cloud-specific scripts. Now, Kubernetes has a standard API for this.
The Workflow:
- VolumeSnapshotClass: The Admin sets up the “camera” settings. This tells Kubernetes which driver to use (e.g.,
ebs.csi.aws.com) and how to handle the snapshot when deleted. - VolumeSnapshot: The Developer asks to “take a photo” of a specific PVC. The CSI driver talks to the cloud, freezes the disk for a split second, and starts the copy process.
- Restore: The Developer creates a new PVC but adds a special field:
dataSource. This tells Kubernetes, “Don’t give me an empty disk; fill it with data from this snapshot.”
- Prerequisites: Your cluster must have a CSI driver installed (like AWS EBS CSI or Azure Disk CSI). Old “in-tree” drivers do not support snapshots.
- The
VolumeSnapshotContentObject: Just likePVCbinds to aPV, aVolumeSnapshot(user request) binds to aVolumeSnapshotContent(actual physical snapshot pointer). K8s handles this automatically.
Use Cases
- Database Upgrades: Snapshot before upgrading MySQL 5.7 to 8.0.
- Cloning Environments: Take a snapshot of “Production Data” -> Restore it to “Dev Namespace”. Now developers can test with real data without breaking production.
- Forensics: If a hacker attacks a pod, snapshot the disk immediately for analysis later, then kill the pod.
Technical Challenges, Limitations, Common Issues
| Issue | Cause | Solution |
| Snapshot Stuck in “ReadyToUse: False” | The CSI driver is missing or the cloud API is timing out. | Check logs of the CSI controller pod. Ensure IAM permissions allow CreateSnapshot. |
| Restore Failed (Size Mismatch) | Trying to restore a 100GB snapshot into a 50GB PVC. | The new PVC must be >= the size of the original snapshot. You cannot shrink data. |
| “Snapshot content not found” | The VolumeSnapshotContent was deleted manually or by policy. | Change deletionPolicy to Retain to prevent accidental loss. |
| Data Corruption on Restore | Snapshot was taken while DB was writing heavily (Crash Consistent). | Use application-level freezing (Velero Hooks) for databases. |