Skip to main content
< All Topics

Kubernetes Architect Program

The Foundation: Infrastructure & Basics

Focus: Setting the stage before running a single container.

Linux & Networking Prerequisites

Kernel Namespace & Cgroups read more (The technology behind containers)
  1. Namespaces (Isolation): Virtual “walls” that define what a process can see (Network, Files, PIDs).
  2. Cgroups (Limits): A “meter” that defines how much a process can use (CPU, RAM, Disk I/O).

Systemd vs. Docker Init read more
  • PID 1 Rule: The “boss” process must clean up Zombies (dead processes) and pass Signals (like stop/restart).
  • Systemd: Heavyweight OS manager; overkill/insecure for containers.
  • Docker Init (Tini): Tiny bodyguard binary; handles signals and zombies so your app doesn’t have to.
  • Zombie Apocalypse: When dead processes clog the system because PID 1 isn’t “reaping” them.
  • Graceful Shutdown: Using an init (or exec) ensures your app stops safely instead of being “force-killed” by the kernel.

Linux Networking read more (Bridges, iptables, IPVS)
  • Linux Bridge: A virtual Layer 2 switch that connects container “pipes” (veth pairs) to each other and the host.
  • iptables: The kernel’s security guard/mailroom; it handles firewall filtering and NAT (port mapping) via sequential rules.
  • IPVS: A high-performance Layer 4 load balancer that uses hash tables to keep networking fast even with thousands of services.
  • veth pairs: The virtual ethernet cables that act as a “pipe” between a container’s namespace and the host’s bridge.
  • NAT (Masquerading): The process where iptables rewrites a container’s private IP to the host’s public IP for internet access.
  • O(n) vs. O(1): iptables gets slower as rules increase (linear); IPVS stays fast regardless of rule count (constant).


Container Fundamentals (Docker & Beyond)

Docker vs. Containerd vs. CRI-O vs Podman read more
  • The OCI Standard: The “rulebook” ensuring any container image can run on any compliant runtime (Docker, Podman, etc.).
  • CRI (Container Runtime Interface): The standard “plug” that allows Kubernetes to talk to different container engines.
  • Docker: The “Luxury SUV” a full feature-rich suite for developers, now separated from the Kubernetes production core.
  • Containerd: The “Industrial Engine” the industry-standard, lightweight runtime used by AWS EKS, GKE, and AKS.
  • CRI-O: The “K8s Purist” a minimal runtime built strictly for Kubernetes, default for Red Hat OpenShift.
  • Podman: The “Daemonless Rebel” a rootless, secure Docker alternative that doesn’t require a background process.
  • Dockershim Deprecation: Kubernetes (v1.24+) no longer supports Docker directly; it now uses Containerd or CRI-O.

Docker vs. Containerd vs. CRI-O vs Podman read more
  • The OCI Standard: The “rulebook” ensuring any container image can run on any compliant runtime (Docker, Podman, etc.).
  • CRI (Container Runtime Interface): The standard “plug” that allows Kubernetes to talk to different container engines.
  • Docker: The “Luxury SUV” a full feature-rich suite for developers, now separated from the Kubernetes production core.
  • Containerd: The “Industrial Engine” the industry-standard, lightweight runtime used by AWS EKS, GKE, and AKS.
  • CRI-O: The “K8s Purist” a minimal runtime built strictly for Kubernetes, default for Red Hat OpenShift.
  • Podman: The “Daemonless Rebel” a rootless, secure Docker alternative that doesn’t require a background process.
  • Dockershim Deprecation: Kubernetes (v1.24+) no longer supports Docker directly; it now uses Containerd or CRI-O.

Writing Optimized Dockerfiles (Multi-stage builds) read more
  • Multi-Stage Builds: Using a “Builder” stage for heavy tools and a “Runner” stage for the app to shrink images from GBs to MBs.
  • Base Image Selection: Always prefer Alpine or Distroless versions (e.g., python:3.9-alpine) to minimize the attack surface and disk space.
  • The COPY --from Pattern: The secret to moving only compiled binaries or static assets into the final production image.
  • Layer Caching: Order commands from least frequent to most frequent changes (e.g., COPY package.json before COPY .) to speed up builds.
  • Security & Size: Smaller images pull faster across networks and contain fewer “bloat” binaries (like curl or git) for hackers to exploit.

Image Security (Distroless images, vulnerability scanning) read more
  • The Shell Threat: Standard images include tools like curl, wget, and sh, providing a ready-made “hacker toolkit” if your app is breached.
  • Distroless Images: The ultimate defense; these images contain only your app, removing the shell, package manager, and all OS utilities.
  • Vulnerability Scanning: The practice of using tools like Trivy or Clair to detect CVEs (vulnerabilities) in your OS layers and libraries before deployment.
  • Reduced Attack Surface: By using Distroless or Alpine, you decrease the number of installed packages, drastically lowering the chance of a “High” severity exploit.
  • Runtime Security: Without a shell (/bin/sh), attackers cannot easily execute scripts or lateral movement commands within your network.

kubectl: The Kubernetes Commands read more

kubeadm read more


Kubernetes Architecture

Control Plane read more: The central “brain” of Kubernetes that makes all cluster management decisions without running the actual application payloads.

Data Plane read more: The execution environment (VMs or physical servers) where your actual application containers are scheduled and run.

Below are the Control Plane and Data Plane components

Kube-API Server read more: The central gateway that validates and processes all cluster requests.
  • The Central Hub: The API Server is the stateless “brain” of the control plane and the only component that communicates directly with the Etcd database.
  • The Guard & Messenger: It intercepts, validates, and processes all RESTful traffic entering the cluster before pushing the desired state to Etcd.
  • The Scaler: Because it stores no local data, it can be horizontally scaled across multiple replicas behind a load balancer for high availability.
  • The Listener: It utilizes a “Watch” mechanism to instantly push state changes to connected components (like controllers and schedulers) rather than making them ask repeatedly.

The Three Security Gates

  • Sequential Pipeline: Every incoming request must sequentially pass Authentication, Authorization, and Admission Control before being written to Etcd.
  • Gate 1 – Authentication (AuthN): Answers “Who are you?” using external Identity Providers for humans and internal Service Accounts for Pods.
    • Human Users: Kubernetes relies on external Identity Providers (OIDC like Okta/Google) for human authentication, ensuring K8s never stores passwords.
    • Machine Users: Pods and internal processes authenticate using modern, auto-rotating, time-bound Bound Service Account Tokens (JWTs).
    • Admin Access: X509 Client Certificates are heavily used by cluster admins and system components (like kubelet), though they are difficult to revoke once issued.
    • Cloud Native: Webhook Token Authentication is the standard for managed clouds (like AWS EKS or GCP GKE) to integrate seamlessly with cloud IAM.
  • Gate 2 – Authorization (AuthZ): Answers “What can you do?” by checking if the authenticated user has the necessary RBAC permissions for the requested action.
    • The Core Check: Verifies if an identified user is allowed to perform a specific action (e.g., create pods).
    • Primary Tool: Typically handled by Role-Based Access Control (RBAC) to map users to permissions.
    • The Analogy: Acts like a bouncer checking if your visa or ticket grants you access to the VIP lounge.
    • Default Deny: If a rule doesn’t explicitly allow the request, the API Server automatically rejects it.
  • Gate 3 – Admission Control: Answers “Is this request safe?” by evaluating the actual content of the request against strict organizational policies and security
    • Governance as Code: Admission controllers enforce specific configurations (like requiring resource limits or blocking root access) regardless of a user’s permissions.
    • Mutating Webhooks (Runs First): Intercepts requests to automatically modify or inject “best practices” (e.g., adding a sidecar container or default CPU limits).
    • Validating Webhooks (Runs Second): Acts as the “hard deny” phase, inspecting the final request and blocking it if it fails compliance or security checks.standards.

Traffic & Extensibility

  • API Priority and Fairness (APF): Categorizes and prioritizes incoming traffic to ensure low-priority tasks (like log scraping) don’t crash the server or block critical system updates.
  • Custom Resource Definitions (CRDs): Allows you to extend the API Server to manage custom objects (like databases or backups) exactly as if they were native K8s Pods.
  • Aggregation Layer: Lets you run extension API servers (like the metrics-server) behind the main API IP to handle specific specialized requests.

Etcd read more: The cluster’s “single source of truth” and memory, storing every piece of configuration data in a distributed key-value vault.
  • The “Memory”: Etcd is a highly available, strongly consistent key-value store that acts as the absolute “truth” and shared memory of a Kubernetes cluster.
  • Exclusive Access: Only the Kubernetes API Server can communicate directly with Etcd; all other components must ask the API Server for data.
  • Data Vault: It stores every piece of cluster metadata, including Secrets, ConfigMaps, node states, and Kubernetes objects.
  • Key-Value Structure: Data is stored like a dictionary or phonebook (e.g., Key: /pods/nginx, Value: Pod configuration), rather than a traditional spreadsheet.
  • Key Use Cases: Beyond storing state, it handles service discovery and distributed locking (e.g., preventing two Schedulers from acting at once).

Consensus & Consistency

  • Raft Algorithm: Etcd relies on the Raft consensus algorithm, meaning a “Leader” node handles writes and requires a majority vote (Quorum) from “Followers” to commit data.
  • Strong Consistency: Unlike “eventually consistent” databases, Etcd guarantees that a read immediately following a write will return the newest data.
  • Versioning: Every single change gets a revision number, creating a history that allows for “time travel” (like kubectl rollout undo).

Deployment Topologies

  • Stacked Etcd: Etcd shares servers with the Control Plane components; it is easier and cheaper but creates resource contention and single points of failure.
  • External Etcd: Etcd runs on its own dedicated cluster; it is complex and expensive but offers maximum resilience for mission-critical enterprise environments.

Limitations & Troubleshooting

  • Manual Recovery: Dead Etcd nodes do not auto-heal like Kubernetes Pods; they must be manually removed (etcdctl member remove) and replaced.
  • Size Limits: Etcd is strictly for metadata, not big data; it has a default 2GB limit (expandable to 8GB) and will reject large files.
  • Network Sensitivity: Because it replicates data instantly to maintain consensus, it is highly sensitive to network latency.
  • Database Full: If the database exceeds its space limit, the cluster stops accepting writes until you manually compact history and defrag using the etcdctl tool.

Kube-Scheduler read more: The “matchmaker” that determines the best home for new Pods by filtering and scoring available Worker Nodes.
  • The Matchmaker: The Kube-Scheduler assigns unbound Pods to the most suitable Nodes but never actually runs the containers (the Kubelet does that).
  • Two-Step Process: Scheduling always follows two phases: Filtering (eliminating nodes that don’t meet hard constraints) and Scoring (ranking the remaining nodes to find the best fit).
  • The Lifecycle: The scheduler watches the queue, filters nodes (e.g., checking CPU limits), scores them (e.g., checking image locality), and sends a Binding object to the API server.

Advanced Architecture

  • Pluggable Framework: The scheduler is highly extensible, allowing custom logic to be injected at specific extension points like QueueSort, PreFilter, Score, and Bind.
  • Multi-Scheduler Setup: You can run multiple, custom-named schedulers in a single cluster to handle different types of workloads (like batch jobs vs. web apps).
  • The Descheduler: Because the default scheduler only places pods upon creation, you need the Descheduler tool to evict and rebalance pods if the cluster later becomes uneven.

Placement Controls

  • Taints & Tolerations (Repellents): Taints act as a “bad smell” applied to Nodes to repel pods, while Tolerations are applied to Pods to let them ignore that smell (great for reserving GPU nodes).
  • Affinity & Anti-Affinity (Magnets): Node Affinity attracts pods to specific zones, Pod Affinity groups related pods together, and Anti-Affinity forces pods apart to ensure High Availability.

Troubleshooting & Value

  • Business Impact: Intelligent scheduling ensures cost optimization (bin-packing), resource efficiency, and stability by preventing “noisy neighbors.”
  • Pending Pods: If a pod is stuck in “Pending,” it means the scheduler cannot find a node that satisfies all filters (check this using kubectl describe pod <pod-name>).
  • Resource Exhaustion: “Insufficient CPU” errors occur when total pod requests exceed available node capacity, requiring either lower pod requests or cluster autoscaling.

Kube-Controller Manager read more: The “enforcer” that runs continuous loops to ensure the cluster’s actual state matches your desired state.
  • Architecture: It runs as a single binary containing multiple specialized control loops to minimize system overhead and simplify operation.
  • API Efficiency: Controllers avoid polling the API server directly; instead, they use a SharedInformer and DeltaFIFO queue for an “edge-driven, level-triggered” response.
  • High Availability: Multi-master setups use the Leases API for Leader Election, ensuring only one Controller Manager actively enforces state at any given time.
  • Security & Monitoring: It runs with restricted RBAC privileges via its own Service Account and exposes a /metrics endpoint to monitor queue depths and API load.

Node Management

  • Heartbeat Monitoring: The Node Controller tracks Kubelet heartbeats (sent every 10 seconds) to determine node health.
  • Eviction Lifecycle: If a node fails, it is marked “Unknown,” tainted with NoExecute, and its pods are evicted after a default 5-minute grace period.
  • Cloud Decoupling (CCM): Cloud-specific logic is now handled by the Cloud Controller Manager, which can instantly trigger deletions if the underlying cloud VM is destroyed.
  • Thundering Herd Mitigation: Mass evictions are throttled by the --node-eviction-rate parameter to prevent overloading the scheduler during large-scale failures.

Workload Orchestration & State

  • Deployments & ReplicaSets: Deployments manage version rollouts, while ReplicaSets handle the strict enforcement of pod replica counts using label selectors.
  • StatefulSets: Ensures sequential network identities (e.g., web-0) and stable storage bindings across node rescheduling.
  • DaemonSets: Guarantees that specific utility pods run on all or label-matched nodes within the cluster.
  • Jobs & CronJobs: Manages the retry limits, parallel execution, and scheduling of finite batch tasks.
  • Garbage Collection: Relies on ownerReferences to cascade the deletion of parent objects (like Deployments) safely down to child objects (like Pods).

Networking & Routing

  • Endpoints Mapping: The Endpoint Controller links stable Service IPs to ephemeral Pod IPs by verifying label selectors and readiness probes.
  • EndpointSlices: Replaces the legacy, monolithic Endpoints object by chunking IPs into lists of 100, drastically reducing network congestion and API load.

Storage & Volumes

  • PV/PVC Binding: The PersistentVolume Binder matches storage claims with available volumes or triggers dynamic provisioning.
  • Attach/Detach Loop: Communicates with infrastructure providers to securely attach block storage to nodes before pods can initialize.
  • Protection Controllers: Injects finalizers to prevent the accidental deletion of PVCs that are currently in use by active pods.

Security, Identity & Resources

  • ServiceAccounts & Tokens: Automatically provisions default service identities and secret tokens for new namespaces.
  • TLS & Certificates: The CSR controllers validate, approve, and sign certificates to secure internal kubelet communication.
  • Resource Limits: The ResourceQuota controller intercepts and rejects pod creation requests that exceed predefined namespace CPU, memory, or object limits.

Cloud Controller Manager read more: The “liaison” that links your cluster to cloud provider APIs to manage Load Balancers, storage, and networking.
  • The Bridge: The CCM acts as a specialized contractor, translating generic Kubernetes requests into specific API calls for your cloud provider (AWS, Azure, GCP).
  • Out-of-Tree: It runs as a separate binary outside the core Kubernetes code, allowing cloud vendors to patch and iterate without waiting for K8s version releases.
  • Infrastructure Glue: It strictly manages cloud resources (Nodes, Routes, Load Balancers) and does not manage workloads like Pods or Deployments.
  • The Gatekeeper: It uses taints (uninitialized) to prevent the Scheduler from placing Pods on a Node until the cloud provider confirms the VM is fully prepped.

The Three Core Loops

  • Node Controller (Inventory): Syncs your Kubernetes cluster with the actual cloud inventory, instantly removing K8s Nodes if the underlying cloud VM is terminated.
  • Route Controller (Networking): Updates cloud VPC route tables to map Pod IP ranges (PodCIDRs) to specific Node VMs so packets know where to go.
  • Route Controller Limit: Because cloud providers have hard limits on route table entries, large clusters (100+ nodes) typically disable this and use an Overlay CNI (like Calico) instead.
  • Service Controller (Load Balancers): Watches for type: LoadBalancer Services and provisions physical cloud load balancing hardware to route external traffic to your Nodes.
  • Annotation King: You unlock specific, advanced cloud features (like making a load balancer internal-only or highly performant) by using annotations on your Service objects.

Storage & CSI

  • CSI Evolution: Storage drivers (CSI) are now separate from K8s core code, running as standalone Pods maintained by storage vendors.
  • The Topology Dependency: CSI drivers rely entirely on the CCM‘s Node Controller to label incoming nodes with accurate Region and Zone metadata so disks are created in the correct physical location.
  • Architect Best Practice: Always use volumeBindingMode: WaitForFirstConsumer so storage is provisioned in the exact zone where the Scheduler decides to place your Pod.

High Availability (HA) & Migration

  • Leader Election: To prevent duplicate cloud costs or API conflicts (“Split Brain”), multiple CCM replicas use a “Lease” lock so only one active leader makes cloud changes at a time.
  • In-Tree to Out-of-Tree: The Kubernetes migration process uses special adapters to safely transfer cloud logic from the old hardcoded loops to the external CCM without cluster downtime.

Kubelet read more (Linux Demon): The “foreman” on every node that takes instructions from the API Server to ensure containers are running as planned.
  • Primary Job: It is a node-centric agent that ensures containers described in assigned PodSpecs are running and healthy.
  • Pull-Based Model: It constantly watches the API Server for new instructions rather than waiting for commands to be pushed.
  • The Bridge: Without the Kubelet, a worker node is offline to the cluster; it is the sole communicator between the node’s resources and the Control Plane.
  • Static Pods: It can independently run critical components by reading YAML manifests directly from the node’s disk (/etc/kubernetes/manifests) without the API Server.

Workflow & Lifecycle Management

  • CRI Communication: It talks to the container runtime (like containerd) using gRPC via the Container Runtime Interface (CRI) to pull images and start containers.
  • Health Probing: It actively executes Liveness, Readiness, and Startup probes to monitor application health and restarts crashed containers automatically.
  • Status Reporting: It acts as a node monitor, sending health metrics (Ready, MemoryPressure, OutOfDisk) back to the Control Plane every few seconds.
  • Resource Orchestration: It coordinates with CNI for networking and CSI for storage, delegating the actual execution to the runtime and respective plugins.

DevSecOps & Advanced Concepts

  • PLEG Mechanism: Instead of heavy polling, Kubelet uses the Pod Lifecycle Event Generator (PLEG) to efficiently track container states; a “PLEG not healthy” error usually means a struggling container runtime.
  • Garbage Collection: It acts as the node’s janitor, automatically deleting old images and dead containers when disk thresholds (e.g., 85%) are reached to prevent node failure.
  • Security Hardening: Critical security measures include disabling anonymous authentication on its API port (10250) and enabling Webhook authorization.
  • Certificate Rotation: For secure operations, Kubelet should be configured to automatically renew its own client certificates for API server communication.

Container Runtime read more: The “engine” (like containerd) that does the heavy lifting of pulling images and starting the actual processes.
  • The Analogy: The Kubelet is the Site Manager giving orders, and the Container Runtime is the Worker actually building and running the containers.
  • The Docker Shift: Kubernetes no longer uses Docker as the runtime; it connects directly to pluggable runtimes like containerd or CRI-O for less bloat and faster startups.
  • High-Level (CRI): Runtimes like containerd manage images, storage, and the API connection to the Kubelet.
  • Low-Level (OCI): Binaries like runc do the actual Linux kernel interaction (system calls, namespaces, cgroups) to execute the isolated process.
  • The Shim Process: A tiny middleman (containerd-shim) that keeps containers alive and running even if the main runtime daemon crashes or updates.

The Lifecycle & Flow

  • Pod Creation Sequence: The runtime creates a Sandbox (triggering CNI for networking), pulls the image, and finally starts the application container inside that sandbox.
  • Lazy Pulling: Using advanced snapshotters (like stargz), runtimes can start containers instantly by streaming image chunks on demand rather than waiting for a full 2GB download.
  • Debugging Tool: Because Docker is gone, engineers must use crictl instead of docker CLI to debug containers directly on the node.

DevSecOps & Best Practices

  • Image Verification: Runtimes like CRI-O can cryptographically verify image signatures during the pull phase, instantly rejecting tampered images before they execute.
  • RuntimeClass Routing: You can dynamically route trusted workloads to standard runc and untrusted/multi-tenant workloads to secure sandboxes (gVisor or Kata) in the same cluster.
  • Cgroup Driver Rule: Both the Kubelet and the Container Runtime must be configured to use the systemd cgroup driver to prevent node instability and crashes.
  • Built-in Security: Modern runtimes natively reduce blast radiuses by applying default Seccomp profiles and dropping dangerous Linux system calls.

Kube-Proxy read more: The “traffic cop” that manages network rules on nodes to allow seamless communication between Pods and Services.
  • Definition: Kube-Proxy is a network service running on every node that maps stable Service IPs to dynamic, ever-changing Pod IPs.
  • The “Traffic Cop”: It doesn’t “touch” the data; it simply programs the node’s network rules (the “GPS”) to send traffic to the right destination.
  • Deployment: It typically runs as a DaemonSet, ensuring one instance exists on every single worker node.
  • Scope: It operates strictly at Layer 4 (TCP/UDP) and handles internal (East-West) cluster traffic.

Modes & Performance

  • iptables (Default): Simple but slow for large clusters because it searches rules sequentially ($O(n)$ complexity).
  • IPVS (Performance): Uses hash tables for near-instant lookups ($O(1)$ complexity), ideal for clusters with thousands of services.
  • eBPF (Modern): A “Kube-Proxy replacement” (like Cilium) that bypasses the standard Linux networking stack for maximum speed and security.
  • Statelessness: If Kube-Proxy crashes, it loses nothing; it simply re-reads the API Server and recreates the rules upon restart.

Scaling & Troubleshooting

  • L7 Limitation: It cannot route based on URLs or headers; you need an Ingress Controller for that level of intelligence.
  • EndpointSlices: A modern optimization that sends only small “diffs” of IP changes to nodes instead of massive, cluster-wide updates.
  • The “Fake IP”: Without Kube-Proxy, a ClusterIP is just a “dead” virtual address with no actual routing path.
  • Conntrack: A common bottleneck where high traffic fills the Linux connection tracking table, requiring manual sysctl tuning.

Kubernetes Resource Controllers read more: A continuous loop inside the kube-controller-manager that watches the API server to align the cluster’s current state with your desired state.
  1. Workload Controllers (Scaling & Self-healing)
    • Deployment Controller: Manages ReplicaSets to handle seamless rolling updates, rollbacks, and scaling.
    • ReplicaSet Controller: Strictly guarantees that your specified number of Pod replicas are always running.
    • StatefulSet Controller: Manages stateful Pods requiring unique, persistent identities and stable hostnames (perfect for databases).
    • DaemonSet Controller: Ensures that a specific Pod (like a logging or monitoring agent) runs on all or selected Nodes.
    • Job Controller: Creates Pods to execute a finite, one-off task until successful completion.
    • CronJob Controller: Automates the creation of Job objects based on a defined time schedule.
  2. Infrastructure & Discovery Controllers (Traffic Routing)
    • Node Controller: Monitors node health, handles cloud evictions, and reacts to node unreachability.
    • Endpoints Controller: Connects Services to Pods by continuously updating the Endpoints object.
    • EndpointSlice Controller: A highly scalable, modernized alternative to the Endpoints controller for tracking network endpoints.
    • Service Controller: Interacts with the Cloud Provider API to manage external infrastructure like LoadBalancers.
    • Route Controller: Configures cloud network routes to ensure Pods can communicate across different nodes.
  3. Governance & Lifecycle Controllers (Policy Enforcement)
    • Namespace Controller: Guarantees all internal resources are cleanly deleted before fully removing a namespace.
    • ServiceAccount Controller: Injects default ServiceAccounts and necessary credentials into new namespaces and Pods.
    • ResourceQuota Controller: Blocks resource creation if it would exceed the namespace’s total allowed resource consumption.
    • LimitRange Controller: Enforces strict constraints on resource requests and limits for individual Containers/Pods.
    • Garbage Collector (GC) Controller: Automatically sweeps and deletes orphaned resources whose parent objects no longer exist.
    • TTL Controller: Cleans up completed Jobs automatically after a designated time-to-live period expires.
  4. Storage Controllers (Data Persistence)
    • PersistentVolume (PV) Controller: Watches for PersistentVolumeClaims (PVCs) and actively binds them to matching PVs.
    • PV Protection Controller: Blocks the deletion of a PV as long as it remains bound to a PVC.
    • PVC Protection Controller: Blocks the deletion of a PVC as long as it is actively used by a running Pod.
    • Expandable PVC Controller: Facilitates the dynamic resizing of underlying storage volumes if the provider supports it.
  5. Cloud-Specific Controllers (cloud-controller-manager)
    • Service Controller: Automates the lifecycle (create, update, delete) of native cloud load balancers.
    • Node Controller: Verifies directly with the cloud provider if an unresponsive node was permanently deleted.
    • Route Controller: Provisions and manages native routing tables within your specific cloud environment.

Kubernetes Resource Operators read more
  • Definition: An Operator is an application-specific controller that extends the Kubernetes API to manage complex, stateful applications using domain-specific knowledge.
  • The “Human” Analogy: If a Controller is an automated assembly line, an Operator is the expert supervisor who knows how to repair, back up, and tune the machines.
  • The Golden Rule: All Operators are Controllers, but not all Controllers are Operators.
  • The Formula: Operator = Custom Controller + Custom Resource Definition (CRD) + Domain Knowledge.

Architecture & Mechanisms

  • CRDs: Custom Resource Definitions define the “schema” or blueprint for your specific application (e.g., a kind: Database).
  • Reconciliation Loop: The continuous process where the Operator observes the current state, compares it to the desired state, and performs actions to bridge the gap.
  • State Management: Unlike StatefulSets which manage Pod identity, Operators manage the internal state and logic of the application itself.
  • Finalizers: Special keys in resource metadata that prevent deletion until the Operator completes specific cleanup tasks (e.g., deleting an external cloud storage volume).

Capabilities & Tools

  • Capability Levels: Progression from Basic Install (Level 1) to Seamless Upgrades (Level 2), Full Lifecycle (Level 3), Deep Insights (Level 4), and Auto-Pilot (Level 5).
  • Operator SDK: The CNCF standard framework for building Operators using Go, Ansible, or Helm.
  • Kubebuilder: A lower-level, Go-centric tool preferred by developers who want to stay close to upstream Kubernetes logic.
  • OperatorHub.io: The central community registry for discovering and sharing pre-built Operators for popular software like Postgres and Kafka.

DevSecOps & Operational Reality

  • Self-Healing: Operators eliminate “snowflake” configurations by automatically detecting and correcting configuration drift.
  • The Security Risk: Operators often require high-level RBAC permissions; always apply the Principle of Least Privilege to limit their blast radius.
  • Resource Overhead: Every Operator is a running pod; over-installing them can lead to high CPU/RAM consumption and API server “noise.”
  • The “Finalizer Trap”: If an Operator crashes, resources can get stuck in a Terminating state, requiring a manual metadata patch to remove the finalizer.

Kubernetes Open Standards read more
  • Interoperability: Open standards ensure that different container tools (Docker, Podman, Kubernetes) can communicate and function together seamlessly.
  • Vendor Neutrality: Standards prevent “lock-in” by allowing users to swap components (storage, networking, runtimes) without rewriting application code.
  • CNCF Governance: Most Kubernetes standards are managed under the Cloud Native Computing Foundation to ensure community-driven evolution.


Kubernetes Setting Up the Lab

Install kubectl read more

Local Setup on personal computer read more

Local Setup on personal computer read more

Local Setup on personal computer read more

Web Based Kubernetes Learning: read morePlay with Kubernetes, Killercoda,

Cloud Setup Real World: read moreCreate an EKS Cluster

Communicating with Kubernetes: read more
  • All communication with a Kubernetes cluster happens via RESTful HTTPS requests exclusively through the API Server (the gatekeeper).
  • kubectl is the primary command-line interface used to translate your declarative commands into these API requests.
  • You never interact with the cluster’s backend directly; every action is an API call that must be verified.

Security & The 3 Checkpoints

  • Authentication (AuthN): The API server first verifies your identity using your token or certificate (Who are you?).
  • Authorization (AuthZ): It then checks Role-Based Access Control (RBAC) rules to see if you have the right permissions (Are you allowed to do this?).
  • Admission Control: Finally, it validates the request against cluster policies, like resource limits, before executing it (Is this request safe/legal?).

Identities & The Kubeconfig File

  • Humans interact using User Accounts (defined via identity providers), while automated tools/pipelines use Service Accounts (tied to the cluster).
  • Your kubeconfig file (usually at ~/.kube/config) acts as your ID card and connection guide.
  • The kubeconfig links Clusters (Where the API server is), Users (Who you are), and Contexts (Which User + Cluster pair is currently active).
  • Security Rule: Never share or commit your kubeconfig file to a repository, as it holds the keys to your cluster.

Advanced Architecture & Best Practices

  • For enterprise security, humans should authenticate using OIDC (like Azure AD or Okta) instead of static certificates.
  • CI/CD Pipelines should follow the Principle of Least Privilege, using restricted ServiceAccounts and short-lived tokens rather than human credentials.

Tooling & Troubleshooting

  • Quick Fixes: “Forbidden” errors mean you need RBAC permissions; “localhost:8080 refused” usually means kubectl cannot find your kubeconfig file.
  • Use kubectl config get-contexts to list available clusters and kubectl config use-context to switch between them.
  • For faster management, adopt tools like Lens (GUI), k9s (terminal UI), and kubectx/kubens (fast context switching).

Creating Objects in Kubernetes: read more

Imperative: a terminal and Declarative: YAML Manufest file


kubectl: The Kubernetes Commands

  1. https://kubernetes.io/docs/reference/kubectl/quick-reference
  2. https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands


Core Workloads: Running Applications

Focus: Mastering the basic objects.

Governance & Lifecycle Controllers

Kubernetes Namespace read more – act as logical, virtual clusters within a single physical Kubernetes cluster to organize resources and support multi-tenancy.
  • Naming: Resource names must be unique within their own namespace but can be safely reused across different namespaces.
  • DNS Resolution: Services are automatically discoverable using the naming pattern <service-name>.<namespace-name>.svc.cluster.local.
  • Global Exceptions: Hardware-level and storage resources like Nodes and PersistentVolumes exist globally and do not belong to any namespace.

Built-in Namespaces

  • default: The standard destination for resources deployed without a specifically declared namespace.
  • kube-system: Reserved for critical internal Kubernetes components (e.g., scheduler, kube-dns) and should rarely be modified by users.
  • kube-node-lease: Used by nodes to send heartbeat signals to the control plane to indicate they are healthy and active.
  • kube-public: A rarely used namespace intended for data that must be readable by all users.

Operations & Commands

  • Creation & Deletion: Create with kubectl create ns <name>; deleting a namespace destroys all resources inside it simultaneously.
  • Targeting: Append -n <namespace-name> to kubectl commands to target a specific namespace (e.g., kubectl get pods -n dev-env).
  • Setting Defaults: Avoid typing -n repeatedly by changing your context: kubectl config set-context --current --namespace=<name>.

Architecture & Governance

  • Service Mesh: Tools like Istio can be applied at the namespace level to enforce strict security protocols like mTLS.
  • ResourceQuotas: Prevent “noisy neighbors” by strictly limiting the total amount of CPU and Memory a namespace can consume.
  • LimitRanges: Establish the default, minimum, and maximum resource limits for individual pods created within the namespace.
  • Network Policies: Namespaces can communicate by default; use Network Policies to enforce strict traffic isolation between them.

The Fundamentals

  • Definition: A Namespace is a logical partition that turns one physical cluster into multiple virtual ones.
  • The Analogy: Think of it as separate folders on a laptop or dedicated departments in an office building.
  • Naming Scope: You can reuse resource names (like “mysql-db”) as long as they are in different namespaces.
  • Default State: Kubernetes starts with default, kube-system (core components), kube-public, and kube-node-lease.
  • DNS Pattern: Internal communication follows the FQDN: <service>.<namespace>.svc.cluster.local.

Administration & Governance

  • Resource Quotas: Acts as a “monthly budget” for CPU and Memory to prevent one team from starving others.
  • LimitRanges: Sets constraints on the size of individual Pods to ensure developers write efficient code.
  • RBAC Scope: RoleBindings allow you to give a user “Owner” rights to Dev without letting them touch Prod.
  • Network Policies: Namespaces are not isolated by default; you must apply policies to block cross-namespace traffic.
  • Global Resources: Remember that Nodes, PVs, and ClusterRoles exist outside of any namespace.

The Architect’s Pro-Tips

  • Security Reality: A namespace is a soft boundary; for hard physical isolation, use sandboxed runtimes like gVisor.
  • Automation: Never create namespaces manually in production use GitOps (ArgoCD/Flux) and IaC (Terraform).
  • Policy as Code: Use Kyverno or OPA to automatically reject any pod that doesn’t have required labels or resource limits.
  • The “Terminating” Bug: If a namespace won’t delete, it’s usually a Finalizer blocking it; you must patch the JSON to clear it.
  • Cost Tracking: Use namespaces as the primary unit for labeling and billing (e.g., via Kubecost) to see which team is spending the most.

ResourceQuota: read more – act as a strict, aggregate budget for an entire Namespace to prevent “Noisy Neighbor” cluster crashes.

  • Resource Quotas act as a strict, aggregate budget for an entire Namespace to prevent “Noisy Neighbor” cluster crashes.
  • Quotas enforce hard limits; the Kubernetes API server will immediately reject any deployment that exceeds the mathematical sum of the budget.
  • If a namespace has a Compute Quota, every new Pod must explicitly define resource requests in its YAML, otherwise it gets rejected with a 403 Forbidden error.

Architecture & DevSecOps Strategy

  • Always pair a ResourceQuota (Namespace-wide limit) with a LimitRange (individual Pod defaults) to auto-inject missing fields so valid pods aren’t instantly blocked.
  • Put a hard cap on services.loadbalancers to prevent developers from accidentally racking up massive cloud provider bills (AWS, GCP, Azure).
  • Enforce Object Count Quotas (limiting the number of pods, secrets, and configmaps) to protect the cluster’s etcd database from resource exhaustion attacks.

Advanced Scoping & Granular Control

  • Use ScopeSelectors and PriorityClasses to conditionally apply strict quotas to low-priority batch jobs while leaving critical system daemons uncapped.
  • Prevent local node disk exhaustion by enforcing requests.ephemeral-storage and limits.ephemeral-storage.
  • Control storage budgets by restricting expensive SSD StorageClasses while allowing higher limits for cheaper HDD tiers.
  • Limit the creation of standard API objects (like Deployments, StatefulSets, or CronJobs) using the highly flexible count/<resource>.<group> syntax.
  • Manage AI/ML hardware usage and costs by capping physical GPU requests (e.g., requests.nvidia.com/gpu).

Ecosystem Integration

  • Proactively monitor kube_resourcequota metrics in Prometheus to trigger alerts before a team hits their absolute ceiling.
  • Automate the injection of standard Quotas and LimitRanges for every new Namespace using policy engines like Kyverno or OPA Gatekeeper.

LimitRange read more: enforces specific Min, Max, and Default resource boundaries (CPU, Memory, Storage) for individual Pods, Containers, and PVCs.

  • Purpose: LimitRange enforces specific Min, Max, and Default resource boundaries (CPU, Memory, Storage) for individual Pods, Containers, and PVCs.
  • The Rule of Thumb: ResourceQuota manages the total budget for the whole Namespace, while LimitRange dictates the limits for an individual Pod.
  • The Analogy: If ResourceQuota is your total bank balance, LimitRange is your ATM daily withdrawal limit.

How It Works

  • Auto-Defaulting: It serves as a safety net by automatically injecting default CPU and Memory values if developers omit them from their YAML.
  • Admission Control: It acts as an API gatekeeper in two phases: mutating (injecting missing defaults) and validating (rejecting out-of-bounds requests).
  • Quality of Service (QoS): By injecting defaults, it prevents Pods from falling into the highly killable “BestEffort” QoS class.

Security & Strategy

  • Blast Radius: Capping resources prevents DoS attacks and limits the impact of compromised containers (e.g., stopping cryptojacking by hitting CPU limits).
  • Architectural Standard: A production Namespace should never exist without a LimitRange, often enforced globally via policy engines like Kyverno or OPA Gatekeeper.
  • Burst Control: The maxLimitRequestRatio prevents developers from requesting tiny minimums but bursting to massive maximums.

Limitations & Troubleshooting

  • Non-Retroactive: Applying a new LimitRange only affects newly created Pods; it does not resize or kill existing ones.
  • Node Blindness: It mathematically validates your limits against the rules, but it does not check if your physical worker nodes actually have that much capacity.
  • Common Fixes: “Forbidden” creation errors mean you broke Min/Max rules; “OOMKilled” errors mean the default memory limit is too small for your application.

Kubernetes Namespace-Level vs. Global Cluster-Level Objects

Pods: Core Concepts & Architecture

  1. The Pod: The smallest deployable unit in Kubernetes, acting as a “logical host” for one or more containers.
    • Pod vs. Container: You never deploy containers directly; you deploy Pods that encapsulate containers, storage, and network.
    • Shared Network: All containers in a Pod share the same IP address and Network Namespace, communicating via localhost.
    • Shared Storage: Containers in a Pod can share data by mounting the same Volumes defined at the Pod level.
    • Pause Container: A hidden infra-container that holds the Network Namespace open so app containers can restart without losing their IP.
  2. Pod Lifecycle
    • Pod Phases: High-level status summary: Pending (waiting), Running (active), Succeeded (finished), Failed (crashed), or Unknown.
  3. Static Pods: Pods managed directly by a node’s Kubelet (via local files) rather than the API Server, used primarily for bootstrapping control planes.
  4. Multi-Container Patterns
    • Sidecar Pattern: An “assistant” container that extends the main app (e.g., a log shipper or config reloader).
    • Adapter Pattern: A “translator” that standardizes app output (e.g., converting custom logs to a format a monitoring tool understands).
    • Ambassador Pattern: A “proxy” that handles outbound connections (e.g., a database proxy allowing the app to connect via localhost).
    • Init Containers: Specialized containers that run to completion sequentially before the main application containers start.
  5. Restart Policy: Instructions for the Kubelet on whether to restart containers: Always (default), OnFailure, or Never.
  6. Pod Health Probes
    • Liveness Probe: A health check that restarts the container if it becomes “stuck” or deadlocked.
    • Readiness Probe: A traffic gatekeeper that removes the Pod from the Load Balancer if the app is temporarily unable to serve requests.
    • Startup Probe: A “bodyguard” that disables Liveness/Readiness checks for slow-starting apps until they are fully initialized.
  7. Pod Requests and Limits are guaranteed resources for scheduling; Limits are the maximum resources a container can consume.
  8. Pod Security Contexts
    • Security Context: A set of rules defining user IDs, privileges, and kernel-level permissions for Pods and containers.
    • runAsNonRoot: A safety check that prevents the Pod from starting if the container tries to run as the root user.
    • readOnlyRootFilesystem: A hardening technique that makes the container’s disk immutable to prevent hackers from installing malware.
    • fsGroup: A Pod-level setting that automatically changes volume ownership so non-root users can read and write to mounts.
    • Privilege Escalation: Setting allowPrivilegeEscalation: false prevents child processes from gaining more power than their parent.
    • Capabilities: Allows “dropping all” Linux superpowers and adding back only the specific ones needed (like NET_BIND_SERVICE).

Workload Controllers

Which manages Pods for specific requirements

  1. Stateless vs Stateful Application
  2. Stateless Workload Controllers
  3. statefulset – Stateful Workload Controller
  4. Batch & Scheduled Workload Controllers
  5. Labels and Selectors
  6. Annotations

State & Configuration

Focus: Managing data and dynamic configuration.

  1. Decoupling Configuration
  2. Kubernetes Storages

Kubernetes Networking

Focus: Controlling every packet.

  1. Kubernetes Foundational Network Model
  2. Service (ClusterIP, NodePort, LoadBalancer, ExternalName)
  3. Kubernetes Ingress
  4. Kubernetes Ingress Controller
  5. Kubernetes Gateway API
  6. CoreDNS Service Discovery
  7. Network Policies
  8. TLS Termination & Managing Certificates
  9. Kube-Proxy
  10. CNI: Container Network Interface

Security & Governance (DevSecOps)

Focus: Locking down the fortress.

Authentication & Authorization

Role-Based Access Control read more

ServiceAccount read more – act as the identity card for Pods to securely authenticate with the Kubernetes API server.

  • Every Namespace automatically creates a default SA, but custom applications should always use dedicated ones.
  • Since Kubernetes v1.24, SAs use short-lived, auto-rotating ephemeral JWTs via the TokenRequest API instead of permanent static Secrets.
  • The Kubelet securely injects the SA token, namespace info, and CA cert into the Pod at /var/run/secrets/kubernetes.io/serviceaccount/.
  • Legacy applications that cache the API token in memory will crash with a 401 error when the modern ephemeral token rotates.

Permissions & RBAC

  • ServiceAccounts have zero permissions by default until you explicitly link them to a Role or ClusterRole using a Binding.
  • While a ServiceAccount object is strictly tied to its home Namespace, it can be granted cross-namespace permissions via RoleBindings.

DevSecOps & Best Practices

  • SAs can map directly to external cloud IAM roles (like AWS IRSA or GCP Workload Identity) to grant Pods secure, passwordless access to cloud resources.
  • Always enforce one dedicated ServiceAccount per application to maintain granular security and accurate audit logs.
  • Set automountServiceAccountToken: false on the SA to prevent token injection unless the application actively needs K8s API access.
  • Link registry credentials to an SA using imagePullSecrets so Pods can automatically pull private images without hardcoded Docker credentials.

OIDC Connecting Kubernetes to Google, Okta, and Active Directory read more:

Kubernetes Certificate Signing Requests (CSR) read more:

https://github.com/kubernetes-sigs/metrics-server

Contents
Scroll to Top