Kubernetes DaemonSet
In simple words, imagine you have a cluster with 10 worker nodes. You want to run a specific software (like a log collector or a monitoring agent) on exactly every single node automatically. You don’t want to manually count nodes and scale up replicas. You want Kubernetes to say, “Oh, a new node joined? Let me automatically start this specific Pod on it.”
That is exactly what a DaemonSet does. It ensures that a copy of a Pod is running across all (or a specific subset of) nodes. If you add a node, the DaemonSet adds the Pod. If you remove a node, the DaemonSet cleans up that Pod.
DaemonSet it guarantees one Pod per Node.
| Feature | Description |
| Primary Goal | Ensure a copy of a Pod runs on every single node (or selected nodes). |
| Replica Management | You do not specify replicas: 3. K8s calculates it automatically based on node count. |
| Node Scaling | Automatically creates a Pod when a new node joins the cluster. |
| Scheduling | Historically handled by DaemonSet Controller, but now handled by the default K8s Scheduler using Node Affinity. |
| Typical Use Cases | Logging agents (Fluentd), Monitoring (Prometheus Node Exporter), Networking (CNI plugins). |
| Update Strategy | Supports RollingUpdate (default) and OnDelete. |
A DaemonSet is a Kubernetes workload object used primarily for system-level operations rather than user-facing applications. Unlike a Deployment, which focuses on maintaining a desired number of replicas regardless of where they run, a DaemonSet focuses on where they run (specifically, on every node). It is essential for cluster bootstrapping services like networking (Calico, Flannel), storage drivers (CSI), and observability tools. When a DaemonSet is created, the Kubernetes scheduler ignores the usual “resource availability” checks in a strict sense; it tries to ensure the node has the critical system components it needs to function.
Key Components:
- Controller: Watches the node list.
- Pod Template: Defines what runs in the container.
- Selector: Matches the pods to the DaemonSet.
Use Cases:
- Cluster Storage Daemon: Running
glusterdorcephon each node. - Logs Collection: Running
fluentdorlogstashto grab/var/logfrom every node. - Node Monitoring: Running
collectdornode-exporterto check CPU/RAM of the node itself.
Benefits:
- Zero-touch operations: Add a node, and the software is installed automatically.
- Maintenance: Easy to update the software across 1000 servers with one command.
Technical Challenges, Limitations & Solutions
| Challenge | Problem | Solution |
| Resource Consumption | Since it runs on every node, a heavy DaemonSet eats up a huge amount of total cluster CPU. | Limit Requests: Always set tight resources.requests and limits. Use VerticalPodAutoscaler if needed. |
| Failed Updates | If you push a bad image, all nodes might start crashing (CrashLoopBackOff). | Health Checks: Use minReadySeconds and ReadinessProbes to slow down the rollout so you catch errors early. |
| Node Upgrades | When you upgrade Kubernetes versions on nodes, DaemonSets might restart. | PodDisruptionBudgets (PDB): Set a PDB to ensure critical DaemonSets don’t go down entirely during maintenance. |
https://kubernetes.io/docs/concepts/workloads/controllers/daemonset
prometheus-node-exporter.yaml
https://github.com/prometheus/node_exporter
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring # Best practice: Keep monitoring tools in their own namespace
labels:
app: node-exporter
spec:
# 1. SELECTOR: Matches the Pods created by this DaemonSet
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
# 2. HOST NETWORK: Critical for Monitoring!
# We set this to 'true' so the Pod uses the Node's IP address directly.
# This allows it to report the TRUE network statistics of the server.
hostNetwork: true
# 3. HOST PID: (Optional but recommended)
# Allows the pod to see all processes running on the server, not just inside the container.
hostPID: true
containers:
- name: node-exporter
image: prom/node-exporter:v1.7.0
# 4. ARGS: Telling Node Exporter where to look
# Since we mount the host's folders to /host/..., we must tell the app to look there.
args:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
- "--path.rootfs=/host/root"
ports:
- containerPort: 9100
name: metrics
hostPort: 9100 # Exposes port 9100 directly on the Node IP
# 5. RESOURCES: Always limit your monitoring agents!
# You don't want the monitoring tool to crash the server it is monitoring.
resources:
limits:
cpu: 250m
memory: 180Mi
requests:
cpu: 102m
memory: 180Mi
# 6. VOLUME MOUNTS: The "Eyes" of the system
# We mount the Host's internal folders so the container can read them.
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
- name: root
mountPath: /host/root
mountPropagation: HostToContainer
readOnly: true
# 7. TOLERATIONS: Monitor the Master Nodes too!
# Without this, you will have no visibility into your Control Plane health.
tolerations:
- operator: Exists
effect: NoSchedule
# 8. VOLUMES: Mapping actual server paths
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /Note: this is just a Demonset yaml file. its require a ServiceMonitor is a Custom Resource Definition (CRD) used by the Prometheus Operator.
1. Running on Specific Nodes only (Node Selectors) Sometimes, you don’t want the Pod on every node. Maybe you only want it on nodes that have a GPU.
- How to do it: Use
nodeSelectorornodeAffinityin the YAML. - Example:
nodeSelector: type: gpu-node. The DaemonSet will ignore all non-GPU nodes.
2. Rolling Updates How do you update a DaemonSet without killing the whole cluster’s monitoring?
- Strategy:
RollingUpdate. - Key setting:
maxUnavailable. This controls how many nodes can be “down” during the update. - Default: 1. (Updates one node at a time).
Tools to explore:
- Fluentd: For logging (runs as DaemonSet). Official Site
- Prometheus Node Exporter: For metrics. Official Site
5.3 Architect Level Notes (Expert “Guru” Level)
This is where things get interesting for architects.
1. Taints and Tolerations (The “Master Node” Challenge) By default, DaemonSets will not run on the Control Plane (Master) nodes because they are “tainted” (marked as restricted).
- Architect’s Solution: If you need logs from the Master node too, you must add a Toleration to your DaemonSet YAML.
- Code snippet:YAML
tolerations: - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule
2. Critical Pod Priority If your cluster is full (100% CPU used), and a new node joins, the DaemonSet must run there immediately. If it’s a networking plugin (like Calico), the node won’t work without it.
- Architect’s Solution: Use
PriorityClassName: system-node-critical. This tells K8s, “If there is no space, kick out other pods to make space for this DaemonSet.”
3. Update Strategy for Large Clusters If you have 1,000 nodes, updating 1 by 1 (default) will take forever.
- Optimization: Set
maxUnavailableto a percentage, e.g.,10%. This updates 100 nodes at a time.
4. DaemonSets respect Node Taints. If you taint a node as NoSchedule, the DaemonSet will NOT run there unless you tolerate it.
5. Static Pods vs. DaemonSets:
- Static Pods are managed by the Kubelet on a specific node (no API server control).
- DaemonSets are managed by the API server/Controller Manager. Always prefer DaemonSets for easier management.
Practice lab will be added soon.