Linux Networking: Bridges, iptables, IPVS
Linux networking is the backbone of modern containerization (like Docker) and orchestration (like Kubernetes). To understand how data moves between your computer and a container, or between two containers, you need to understand three core concepts: Bridges, iptables, and IPVS.
Think of it as a busy corporate office building:
- Linux Bridge (The Switch/Hub): Think of this as the physical extension cords or network switches in the office. It connects different cubicles (containers) so they can talk to each other and to the outside world. Without it, the cubicles are isolated islands.
- iptables (The Security Guard & Mailroom): Think of this as the security guard checking ID cards (Firewall) and the mailroom sorting packages (NAT). It checks every person (packet) entering or leaving. It says, “You can pass,” “You are blocked,” or “Please deliver this package to Desk 4, not the lobby.”
- IPVS (The High-Speed Receptionist): Think of this as a super-efficient receptionist handling thousands of visitors. While the Security Guard (iptables) checks a long list of names one by one (slow for big crowds), the High-Speed Receptionist (IPVS) uses a quick directory hash system to instantly direct massive crowds to the right department.
Cheat Sheet
- Linux Bridge: A software switch that connects virtual network interfaces (like containers) to the host network.
- Bridge: Operates at Layer 2 (Data Link Layer). It deals with MAC addresses.
- iptables: A rule-based firewall utility that controls packet filtering and Network Address Translation (NAT).
- iptables: Operates mostly at Layer 3 & 4 (Network/Transport). It deals with IP addresses and Ports.
- IPVS (IP Virtual Server): A transport-layer load balancer inside the Linux kernel, much faster than iptables for handling many services.
- IPVS: Operates at Layer 4 (Transport). It is purely for load balancing.
| Feature | Linux Bridge | iptables | IPVS |
| Primary Role | Connectivity (Switching) | Security (Firewall) & NAT | Performance Load Balancing |
| OSI Layer | Layer 2 (Data Link) | Layer 3 (Network) / Layer 4 | Layer 4 (Transport) |
| Complexity | Low | Medium (Sequential Rules) | High (Hash Tables) |
| Performance | Fast (Local switching) | Slower at scale (O(n) lookup) | Very Fast at scale (O(1) lookup) |
| Key Use Case | Docker default networking (docker0) | Firewalls, Docker Port Mapping | Kubernetes Service Load Balancing (kube-proxy) |
–
Linux Bridges
A bridge works at the Data Link Layer (Layer 2) of the OSI model. When a network packet arrives at a bridge, the kernel inspects the destination MAC address and decides where to send it.
- MAC Learning: Just like a physical switch, a Linux bridge maintains a Forwarding Information Base (FIB) or “MAC table.” It learns which MAC addresses are associated with which attached interfaces by inspecting the source MAC of incoming traffic.
- Forwarding: If it knows the destination MAC, it forwards the packet exclusively to that port.
- Flooding: If it doesn’t know the destination MAC, it floods the packet to all attached ports (except the one it received the packet on) until it learns the correct path.
- Spanning Tree Protocol (STP): Linux bridges support STP to prevent network loops if multiple bridges are connected in a way that creates a circular path.
iptables
These two terms are often used interchangeably, but technically they are distinct:
- Netfilter: The packet filtering framework inside the Linux kernel. It provides a set of “hooks” at various points in the networking stack where kernel modules can register to inspect or modify packets.
- iptables: The user-space utility used by system administrators (and container runtimes) to interact with Netfilter. You use
iptablescommands to write the rules that Netfilter enforces.
The Architecture: Tables, Chains, and Rules
To master iptables, you have to understand its hierarchical structure. Rules are grouped into Chains, and Chains are categorized into Tables.
Tables (The “What”)
Tables define the broad category of the operation you want to perform on a packet.
- Filter Table (Default): Used for standard firewall duties—deciding whether to let a packet through or block it.
- NAT Table (Network Address Translation): Used to modify the source or destination IP addresses and ports. This is highly critical for Docker port forwarding and Kubernetes Services.
- Mangle Table: Used for specialized packet alteration (like changing TTL or marking packets for specific routing).
- Raw Table: Used to bypass connection tracking (conntrack) for specific packets.
Chains (The “When”)
Chains correspond to the specific Netfilter “hooks” in the packet’s journey. They dictate when the rules in a table are evaluated.
- PREROUTING: Triggered the moment a packet arrives at the network interface, before any routing decisions are made. (Commonly used in the NAT table for Destination NAT / Port Forwarding).
- INPUT: Triggered if the packet is destined for the local host itself.
- FORWARD: Triggered if the packet is not for the local host, but needs to be routed through it (e.g., traffic moving from the host’s physical NIC across a Linux Bridge to a container).
- OUTPUT: Triggered for packets generated by the local host itself, heading out.
- POSTROUTING: Triggered right before a packet leaves the network interface, after routing decisions. (Commonly used in the NAT table for Source NAT / Masquerading).
Rules and Targets (The “Action”)
A rule is a specific matching criteria (e.g., “protocol TCP, destination port 80”). If a packet matches, a Target dictates the action:
- ACCEPT: Let the packet through.
- DROP: Silently discard the packet.
- REJECT: Discard the packet and send an error response back.
- SNAT: Change the Source IP.
- DNAT: Change the Destination IP.
- MASQUERADE: A special form of SNAT used when the outbound IP is dynamic (this is how internal Docker containers access the external internet).
IPVS: IP Virtual Server
In standard Linux networking and default Kubernetes clusters, iptables handles the routing of service traffic. However, as clusters grow to enterprise scale, iptables becomes a massive performance bottleneck.
Enter IPVS. Built directly into the Linux kernel, IPVS is a highly optimized Layer 4 (Transport Layer) load balancer. It provides extreme performance, scalability, and multiple load-balancing algorithms that iptables simply cannot match.
The Core Problem: Why Not Just Use iptables?
To understand IPVS, you must first understand the limitations of iptables at scale.
- The O(N) Problem:
iptableswas designed as a firewall, not a load balancer. It evaluates rules sequentially. If you have 10,000 Kubernetes Services,iptablescreates a massive list of rules. When a packet arrives, the kernel must check it against rule 1, then rule 2, then rule 3, all the way down the chain until it finds a match. - The Result: As you add more Services to a cluster, network latency increases linearly, and CPU utilization spikes just trying to route internal traffic. Updates to the rule list also become painfully slow.
The IPVS Solution: Hash Tables
IPVS solves this performance crisis through its architecture.
- The O(1) Solution: Instead of a sequential list, IPVS uses hash tables to store its routing rules. This means that whether you have 10 Services or 100,000 Services, the time it takes the kernel to find the correct routing destination is exactly the same near instantaneous.
- Purpose-Built: IPVS is specifically designed for load balancing. While it still hooks into the overarching Netfilter framework, it bypasses the standard
iptablesevaluation chains for the traffic it manages.
IPVS in Kubernetes (kube-proxy)
By default, the kube-proxy component on every Kubernetes worker node runs in “iptables mode.” However, all modern K8s distributions (including EKS, AKS, and GKE) allow you to configure kube-proxy to run in “ipvs mode.”
Here is what happens under the hood when IPVS mode is enabled in a Kubernetes cluster:
- The Dummy Interface:
kube-proxycreates a virtual network interface on the node, typically namedkube-ipvs0. - Binding Service IPs: It binds all the virtual IP addresses of your Kubernetes Services (ClusterIPs) to this
kube-ipvs0interface. - Creating Virtual Servers: For every Service, IPVS creates a “Virtual Server.”
- Attaching Real Servers: It then attaches the actual Pod IPs (Endpoints) as “Real Servers” behind that Virtual Server.
When a Pod tries to talk to a K8s Service, the packet hits the IPVS routing table, is instantly hashed, and is forwarded directly to the backend Pod.
Advanced Load Balancing Algorithms
Because iptables is a firewall, its load balancing is essentially limited to simple random distribution. IPVS, being a true load balancer, supports sophisticated algorithms out of the box.
When configuring IPVS (or letting K8s do it for you), you can utilize:
- rr (Round Robin): Distributes connections equally across all backend servers.
- lc (Least Connections): Sends new traffic to the backend server with the fewest active connections (excellent for long-lived connections like WebSockets or database queries).
- sh (Source Hashing): Hashes the source IP address to consistently route traffic from a specific client to the exact same backend server (Session Affinity / Sticky Sessions).
- wrr (Weighted Round Robin): Allows you to send more traffic to more powerful nodes and less to weaker ones.
Containers are basically isolated processes. They need a way to reach the outside world.
- Virtual Ethernet Pairs (veth): When a container starts, Linux creates a “pipe” with two ends. One end sits inside the container (seen as
eth0), and the other end sits on the host machine. - The Bridge: The host end of that pipe is plugged into a Linux Bridge (like
docker0). This bridge collects all the cables from different containers and allows them to talk. - Packet Flow: When a container pings Google (8.8.8.8), the packet goes:
- Container
eth0-> Hostveth-> Bridgedocker0. - iptables sees this packet and performs NAT (Masquerading). It stamps the host’s IP on the packet so the internet knows where to send the reply.
- Packet leaves the physical interface (
eth0orwlan0).
- Container
Key Characteristics
- Bridge: Promiscuous mode operation (listening to all traffic on the link).
- iptables chains:
PREROUTING,INPUT,FORWARD,OUTPUT,POSTROUTING. - IPVS Schedulers: The algorithms used to decide which backend server receives the packet.
Use Cases
- Linux Bridge: Default networking for Docker, LXC, and KVM virtualization.
- iptables: Securing a server, setting up a VPN gateway, basic Docker port forwarding.
- IPVS: Large-scale Kubernetes clusters, high-traffic load balancers (L4).
Benefits
- Standardization: Bridges are standard in every Linux distro.
- Granularity: iptables offers extremely fine-grained control over every single packet bit.
- Efficiency: IPVS provides datacenter-grade load balancing on standard hardware.
Limitations
- iptables: Performance degrades linearly with the number of rules. Updating thousands of rules is not atomic (can cause brief glitches).
- Bridges: Can introduce latency if daisy-chained (bridging a bridge).
- IPVS: Harder to debug than iptables because the rules are not as plainly visible in standard logs.
Common Issues, Problems, and Solutions
- Docker containers cannot access the internet.
- Solution: Check if
ip_forwardis enabled (sysctl -w net.ipv4.ip_forward=1). Check if iptables Masquerading is active.
- Solution: Check if
- Kubernetes services are slow/timing out.
- Ping works, but TCP fails.
- Solution: Check MTU (Maximum Transmission Unit) sizes on the bridge. If the container MTU is larger than the host interface MTU, packets get dropped.
- Bridge Utils (official)
- Netfilter/iptables Project
- IPVS Administration (ipvsadm)
- Kubernetes IPVS Proxy Mode