Skip to main content
< All Topics

Kubernetes Linux & Networking Prerequisites

Kernel Namespaces & Cgroups

Welcome to the very foundation of containerization! Before jumping into Docker or Kubernetes, it is crucial to understand what is happening “under the hood” of the Linux Operating System. Many people think containers are magic, but they are just standard Linux features put together in a smart way.

Think as:

  • Kernel Namespaces: Think of these as the “Walls” of a room in an apartment. They provide isolation. Even though you are in the same building (Server), you cannot see what is happening in the neighbor’s room (Container) because of the walls.
  • Cgroups (Control Groups): Think of these as the “Electricity Meters”. They provide limits. They ensure one room doesn’t use up all the electricity (CPU/RAM) of the entire building.
  • Linux Bridge: Think of this as a “Virtual Network Switch” connecting all the rooms to the outside world.

Cheat Sheet

Here are some easy to remember the core concepts:

  1. Namespaces decide what a process can see.
  2. Cgroups decide how much a process can use.
  3. Bridges allow containers to talk to each other on a single host.
  4. IPVS is the heavy-lifter for load balancing, much faster than iptables at scale.
  5. PID 1 is the boss process; if it dies, the container dies.

Key Characteristics to Remember

  • Isolation: Achieved via Namespaces (PID, NET, MNT, UTS, IPC, USER).
  • Resource Management: Achieved via Cgroups (CPU, Memory, PIDs).
  • Packet Filtering: Managed by iptables or IPVS.
  • Process Lifecycle: Containers usually do not run systemd; they run the application directly as PID 1.
FeatureRoleLinux Command (Try it!)Complexity
NamespacesIsolation (Visibility)unsharelsnsMedium
CgroupsResource Limiting (Usage)systemd-cgtopHigh
BridgeLayer 2 Switchingbrctlip linkLow
IptablesFirewall & Routing rulesiptables -LHigh
IPVSHigh-performance Load BalancingipvsadmVery High
SystemdHost Init System (Service Manager)systemctlMedium


Containers are not “real” physical objects; they are isolated execution environments created by combining two specific Linux kernel features: Namespaces and Control Groups (Cgroups).

Kernel Namespaces (Isolation)

Namespaces provide isolation. They trick a process into thinking it has its own dedicated system resources, separate from other processes. When a process is wrapped in a namespace, it cannot see or affect the resources of other processes outside that namespace.

Key namespaces include:

  1. PID (Process ID): The process looks like PID 1 inside the container, even if it is PID 12345 on the host.
  2. NET (Network): Gives the container its own network stack (IP, localhost, routing table) separate from the host.
  3. MNT (Mount): Allows the process to have its own root filesystem (/) different from the host.
  4. UTS (Unix Timesharing): Allows the container to have its own hostname.
  5. IPC (Inter-Process Communication): Isolates shared memory and semaphores.
  6. USER: Maps user IDs inside the container to different IDs on the host (e.g., root inside, non-root outside).

Cgroups (Resource Control)

While namespaces hide processes from each other, they do not stop one process from consuming all the RAM or CPU. Control Groups (Cgroups) provide resource limitation and accounting.

  1. Resource Limiting: You can set a hard limit on how much memory (e.g., 512MB) or CPU shares a container can use.
  2. Prioritization: You can guarantee that critical containers get more CPU time than background tasks.
  3. Accounting: Cgroups track exactly how much resource usage a group of processes has consumed (essential for billing and monitoring).

Namespaces decide what a process can see. Cgroups decide what a process can use.

DevSecOps Architect Level

The “PID 1” Problem (Systemd vs Docker Init) In a standard Linux server, Systemd is PID 1. It initializes the system, starts services, and crucially, it “reaps” zombie processes (cleans up dead child processes).

  • The Issue: In a container, your application (e.g., Java or Python) becomes PID 1. Most apps are not written to handle zombie reaping or system signals (like SIGTERM).
  • The Consequence: If your app crashes or spawns child processes that die, they become “zombies” and fill up the process table, eventually killing the container.
  • The Solution: Use a lightweight init system like Tini (built into Docker with --init) or ensure your entrypoint script handles signals correctly.

Iptables vs. IPVS (Scaling Networking)

  • Iptables: The traditional way Kubernetes/Docker handles Service networking. It is a long list of rules. If you have 5,000 services, the Kernel has to process a massive list of rules for every packet. This is slow (O(n) complexity).
  • IPVS (IP Virtual Server): Built into the Linux Kernel for Layer 4 load balancing. It uses a hash table structure. Even with 10,000 services, lookup is instant (O(1) complexity).
  • Architect Advice: For large-scale production clusters, always tune your CNI (Container Network Interface) to use IPVS mode instead of iptables for better performance.

Key Characteristics

  1. Granularity: Control down to the specific byte of memory.
  2. Transparency: The application inside doesn’t know it’s being limited; it just sees the resources provided.
  3. Ephemerality: Network interfaces (veth pairs) are created and destroyed dynamically as containers spin up and down.

Use Case

  • Multi-tenancy: Running apps for different customers on the same server without them accessing each other’s data (Namespaces).
  • Performance Protection: Ensuring a background backup job doesn’t slow down the main web server (Cgroups).

Benefits

  • Cost Efficiency: Squeeze more applications onto fewer servers safely.
  • Security Depth: If an app is hacked, the damage is contained within the Namespace “walls.”

Limitations

  1. Kernel Dependency: Unlike Virtual Machines (VMs) which have their own Kernel, containers share the Host Kernel. If the Host Kernel crashes (Kernel Panic), all containers die.
  2. Security Boundaries: Namespaces are not as secure as the hardware virtualization used in VMs. There are “escape” vulnerabilities.

Common Issues, Problems, and Solutions

IssueProblem DescriptionSolution
Zombie ApocalypseContainer process table fills up with defunct processes.Use --init flag in Docker or use a base image with tini installed.
OOM KilledContainer suddenly dies with “OOMKilled” error.The Cgroup memory limit was reached. Increase the limit or fix memory leaks in the app code.
Port Conflict“Address already in use” error.You are trying to bind a port on the Host that is already taken. Use Docker port mapping to map to a different host port.
Slow NetworkingHigh latency in service discovery.Switch from Iptables mode to IPVS mode in your Kubernetes/Docker config.

Quiz Linux Kernel Namespaces & Cgroups


Linux Networking: Bridges, iptables, IPVS

Linux networking is the backbone of modern containerization (like Docker) and orchestration (like Kubernetes). To understand how data moves between your computer and a container, or between two containers, you need to understand three core concepts: Bridgesiptables, and IPVS.

Think of it as a busy corporate office building:

  1. Linux Bridge (The Switch/Hub): Think of this as the physical extension cords or network switches in the office. It connects different cubicles (containers) so they can talk to each other and to the outside world. Without it, the cubicles are isolated islands.
  2. iptables (The Security Guard & Mailroom): Think of this as the security guard checking ID cards (Firewall) and the mailroom sorting packages (NAT). It checks every person (packet) entering or leaving. It says, “You can pass,” “You are blocked,” or “Please deliver this package to Desk 4, not the lobby.”
  3. IPVS (The High-Speed Receptionist): Think of this as a super-efficient receptionist handling thousands of visitors. While the Security Guard (iptables) checks a long list of names one by one (slow for big crowds), the High-Speed Receptionist (IPVS) uses a quick directory hash system to instantly direct massive crowds to the right department.

Cheat Sheet

  1. Linux Bridge: A software switch that connects virtual network interfaces (like containers) to the host network.
    • Bridge: Operates at Layer 2 (Data Link Layer). It deals with MAC addresses.
  2. iptables: A rule-based firewall utility that controls packet filtering and Network Address Translation (NAT).
    • iptables: Operates mostly at Layer 3 & 4 (Network/Transport). It deals with IP addresses and Ports.
  3. IPVS (IP Virtual Server): A transport-layer load balancer inside the Linux kernel, much faster than iptables for handling many services.
    • IPVS: Operates at Layer 4 (Transport). It is purely for load balancing.

FeatureLinux BridgeiptablesIPVS
Primary RoleConnectivity (Switching)Security (Firewall) & NATPerformance Load Balancing
OSI LayerLayer 2 (Data Link)Layer 3 (Network) / Layer 4Layer 4 (Transport)
ComplexityLowMedium (Sequential Rules)High (Hash Tables)
PerformanceFast (Local switching)Slower at scale (O(n) lookup)Very Fast at scale (O(1) lookup)
Key Use CaseDocker default networking (docker0)Firewalls, Docker Port MappingKubernetes Service Load Balancing (kube-proxy)

Linux Bridges

A Linux Bridge acts like a virtual Layer 2 switch.

  • Function: It connects multiple network interfaces together so they can communicate.
  • In Docker: The default docker0 interface is a bridge. When you run a container, Docker creates a pair of virtual ethernet pipes (veth pairs). One end plugs into the container, and the other plugs into the bridge on the host. This allows containers to talk to each other and the host.

Containers are basically isolated processes. They need a way to reach the outside world.

iptables

iptables is the traditional utility for configuring the Linux kernel firewall (netfilter).

  • Packet Filtering: It decides whether to accept, drop, or forward network packets.
  • NAT (Network Address Translation): This is critical for containers. When a container talks to the internet, iptables uses Masquerading (NAT) to make the traffic look like it is coming from the host’s IP address.
  • Port Forwarding: When you map a port (e.g., -p 8080:80), Docker writes iptables rules to forward traffic hitting port 8080 on the host into the specific container IP on port 80.

IPVS: IP Virtual Server

IPVS is a high-performance Layer 4 load balancer built into the Linux kernel.

  • Performance: While iptables works well for firewalls, it becomes slow when handling thousands of rules (like in large Kubernetes clusters). IPVS is designed specifically for load balancing and uses hash tables for faster lookups.
  • Use Case: It is often used in Kubernetes (kube-proxy) to direct traffic between services efficiently, replacing iptables mode in high-scale environments.

Containers are basically isolated processes. They need a way to reach the outside world.

  • Virtual Ethernet Pairs (veth): When a container starts, Linux creates a “pipe” with two ends. One end sits inside the container (seen as eth0), and the other end sits on the host machine.
  • The Bridge: The host end of that pipe is plugged into a Linux Bridge (like docker0). This bridge collects all the cables from different containers and allows them to talk.
  • Packet Flow: When a container pings Google (8.8.8.8), the packet goes:
    1. Container eth0 -> Host veth -> Bridge docker0.
    2. iptables sees this packet and performs NAT (Masquerading). It stamps the host’s IP on the packet so the internet knows where to send the reply.
    3. Packet leaves the physical interface (eth0 or wlan0).

DevSecOps Architect

At an architectural level, we move beyond simple connectivity to performance tuning and scale.

The Scalability Problem with iptables:

  • Sequential Processing: iptables (specifically the netfilter framework) processes rules in a chain, one by one (O(n) complexity).
  • The Bottleneck: In a Kubernetes cluster with 5,000 services, kube-proxy might generate 50,000 iptables rules. Every single packet must be checked against this long list. This causes high CPU usage and latency.

IPVS as the Solution:

  • Hash Tables: IPVS uses hash tables (O(1) complexity). Whether you have 10 rules or 10,000 rules, the lookup time is nearly identical.
  • Load Balancing Algorithms: IPVS supports advanced algorithms natively:
    • rr: Round Robin
    • lc: Least Connection
    • dh: Destination Hashing
    • sh: Source Hashing
  • Direct Routing (DR): IPVS allows packets to be forwarded directly to backend servers without passing back through the load balancer, massively increasing throughput.

Interaction with Namespaces:

  • Architects must verify that net.ipv4.ip_forward is enabled in the kernel (sysctl). Without this, the kernel drops packets attempting to cross bridges.

Key Characteristics

  • Bridge: Promiscuous mode operation (listening to all traffic on the link).
  • iptables chains: PREROUTING, INPUT, FORWARD, OUTPUT, POSTROUTING.
  • IPVS Schedulers: The algorithms used to decide which backend server receives the packet.

Use Cases

  • Linux Bridge: Default networking for Docker, LXC, and KVM virtualization.
  • iptables: Securing a server, setting up a VPN gateway, basic Docker port forwarding.
  • IPVS: Large-scale Kubernetes clusters, high-traffic load balancers (L4).

Benefits

  • Standardization: Bridges are standard in every Linux distro.
  • Granularity: iptables offers extremely fine-grained control over every single packet bit.
  • Efficiency: IPVS provides datacenter-grade load balancing on standard hardware.

Limitations

  • iptables: Performance degrades linearly with the number of rules. Updating thousands of rules is not atomic (can cause brief glitches).
  • Bridges: Can introduce latency if daisy-chained (bridging a bridge).
  • IPVS: Harder to debug than iptables because the rules are not as plainly visible in standard logs.

Common Issues, Problems, and Solutions

Quiz Linux Networking


Systemd vs. Docker Init

In the world of Linux, someone has to be the boss. That boss is the Process ID 1 (PID 1). It starts first and manages everything else.

Think as:

  1. Systemd: Think of Systemd as a Hotel manager. They manage the reception, housekeeping, kitchen, security, and electricity all at once. It’s heavy, powerful, and complex because running a whole hotel (Operating System) is hard work.
  2. Docker Init (Tini): Think of Docker Init as a Private Bodyguard for a VIP guest. Their only job is to protect that one specific guest (application). They don’t care about the kitchen or the laundry; they just ensure the guest is safe and leaves the building (shuts down) properly.

Cheat Sheet

Key Characteristics to Remember

  • Systemd is a “Suite of tools” for a full OS; it does too much for a simple container.
  • Docker Init solves the “Zombie Process” problem where dead processes eat up memory.
  • Signal Handling is the main reason we need an init process; otherwise, your app won’t stop when you tell it to.
FeatureSystemdDocker Init (Tini/Dumb-init)
RoleFull OS Service ManagerLightweight Process Supervisor
ComplexityHigh (Heavyweight)Low (Lightweight)
PID 1 CapabilityNative, full-featuredMinimal, focused on signals
Zombie ReapingYes, handles complex treesYes, handles direct children
Best Use CaseVirtual Machines (VMs), Bare MetalContainers (Docker, Kubernetes)
Signal HandlingComplexPassthrough (sends signals to app)

Systemd: The Heavyweight

systemd is the standard initialization system for most Linux distributions (Ubuntu, CentOS, Debian).

  • Role: It is PID 1. It boots the user space, manages services, handles logging (journald), mounts filesystems, and manages hardware changes.
  • Complexity: It is designed to manage an entire operating system with multiple background services running simultaneously.

Docker Init: The Lightweight

Containers are designed to run a single application, not a full OS. Running systemd inside a container is usually considered an anti-pattern because it adds unnecessary overhead and complexity.

However, a container still needs a PID 1 to handle specific kernel responsibilities:

  1. Reaping Zombie Processes: When a process dies, it becomes a “zombie” until its parent acknowledges it. If the application inside the container doesn’t handle this (and most don’t), the container fills up with zombie processes.
  2. Signal Handling: When you run docker stop, a SIGTERM signal is sent. A proper init process ensures this signal is passed to the application so it can shut down gracefully.

Tini (Docker Init): Docker includes a tiny init process (often tini) that can act as PID 1. It does nothing but forward signals and reap zombies, keeping the container lightweight and strictly focused on the application.


DevSecOps Architect

At an architectural level, using Systemd in Docker breaks the “One Container, One Service” philosophy. Systemd requires high privileges (often CAP_SYS_ADMIN), which is a security risk.

  • Signal Propagation: Tini registers signal handlers for standard signals (SIGTERM, SIGINT) and forwards them to the child process group. This ensures that if you kill the container, your database or web server gets the message and closes connections safely.
  • Orchestration Impact: In Kubernetes, graceful termination is critical. If your container doesn’t handle signals (because it lacks a proper Init), K8s will wait for the terminationGracePeriodSeconds (usually 30s) and then forcefully kill the pod. This slows down deployments and can corrupt data.

  • Dumb-init: Similar to Tini, developed by Yelp.
  • S6 Overlay: If you really need multiple processes in one container (anti-pattern but sometimes necessary), use S6 instead of Systemd.

 Key Characteristics

  • Minimal Footprint: Docker Init binaries are typically extremely small (KB size).
  • Transparency: You shouldn’t even know it’s there; it just makes standard commands work.

Use Case

  • Java/Node/Python Apps: Interpreters often don’t handle PID 1 responsibilities well. Always use --init or Tini.
  • CI/CD Agents: Jenkins agents running in Docker often spawn many child processes (git, build tools) that need reaping.

Benefits

  • Stability: Prevents “Zombie Apocalypse” inside containers.
  • Speed: Faster container shutdowns (no waiting 10s for timeout).
  • Security: Avoids giving containers dangerous privileges required by Systemd.

Limitations

  • Systemd Dependencies: Legacy applications that expect Systemd (e.g., they try to run systemctl start apache2) will fail in standard Docker containers.
  • Logging: Systemd handles logging via Journald. In Docker Init, logs must go to stdout/stderr for Docker to capture them.

Common Issues & Solutions

ProblemSolution
Container takes 10s to stopThe app isn’t receiving SIGTERM. Use docker run --init or ensure your entrypoint uses exec.
“Defunct” processes in topZombies are not being reaped. Add Tini to your Dockerfile.
“System has not been booted with systemd” errorYou are trying to use systemctl. Rewrite the startup to call the binary directly (e.g., httpd -DFOREGROUND).

Quiz Linux Systemd vs. Docker Init


Contents
Scroll to Top