3. Docker Image Engineering

PostedDecember 26, 2021

UpdatedJanuary 11, 2026

Author -Rajkumar Aute

Advanced Image Engineering is the art of building “Gold Images” containers that are super small, fast, and secure by removing everything except the application.

Think of The Construction Site Imagine you are building a new house. You need heavy cranes, trucks, and 100 workers (Build Environment). Once the house is finished, you don’t leave the crane in the living room! You take only the Finished House (Production Image) and hand over the keys. This keeps the house clean, safe, and efficient.

3.1. Layering & The Union File System: Storage Strategy

Must understand the Copy-on-Write (CoW) strategy. Docker uses a Union File System where each RUN, COPY, and ADD instruction creates a new read-only layer.

The “Weight” Problem: If you install a package in one RUN command and delete it in another, the image size does not decrease. The file is simply “hidden” in the top layer but remains in the history of the lower layer.
The Solution (Chaining): Always combine commands that belong together.
- Bad:
  - RUN apt-get update
  - RUN apt-get install -y git
- Architect’s Way:
  - RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
Architect Note: By cleaning up in the same layer, you prevent “Ghost Weight” from bloating your production images.

Docker uses a storage technology called Union File System (UnionFS). This allows files from separate filesystems (layers) to be transparently overlaid into a single coherent view.

—

3.1.1 How Layers are Created

Every instruction in your Dockerfile that modifies the filesystem creates a new layer.

#Dockerfile
FROM ubuntu:22.04 # Creates the Base Layer (Read-Only).
RUN apt-get install python # Adds a new Layer with Python binaries (Read-Only).
COPY . /app # Adds a new Layer with your source code (Read-Only).

—

3.1.2 The “Read-Write” Container Layer

Images are Immutable: Once an image is built, its layers are locked as Read-Only. You cannot change them.
Containers are Mutable: When you run a container (docker run), Docker adds a thin, empty Read-Write Layer on the very top.
Action: Any file you create, modify, or delete inside a running container happens only in this thin top layer. When you delete the container, this top layer is destroyed, but the underlying image remains untouched.

—

3.1.3. The “Copy-on-Write” (CoW) Strategy

This is the magic that makes Docker fast.

You want to edit a config file /etc/nginx/nginx.conf that exists in the Base Image.
The Process:
1. Docker sees the file is in a Read-Only layer (locked).
2. It Copies the file up to the Read-Write layer.
3. It allows you to Write changes to this copy.
4. The original file in the bottom layer is hidden but safe.

As an architect, your goal is to minimize image size and maximize build speed (Caching).

—

3.1.4 The “Ghost Weight” Problem

This is the 1^st mistake some engineers make.

The Mistake:

#Dockerfile
RUN wget http://big-file.zip # Layer 1: Adds 100MB
RUN unzip big-file.zip # Layer 2: Adds extracted files
RUN rm big-file.zip # Layer 3: Marks 100MB as "deleted"

The Reality: The image size is Still 100MB+. Why? Because Layer 1 is Read-Only. Layer 3 just “hides” the file from view. The data is still physically in the image history and will be downloaded by every server.
The Fix (Chaining): Do it all in one layer.

#Dockerfile
RUN wget http://big-file.zip && unzip big-file.zip && rm big-file.zip

—

3.1.5. Caching Strategy (Layer Ordering)

Docker caches layers. If Layer 2 hasn’t changed, it reuses the cached version. If Layer 2 changes, all subsequent layers (3, 4, 5…) must be rebuilt.

Architect’s Rule: Place the least changing instructions at the top and most changing at the bottom.
Example:Dockerfile# BAD Architecture COPY . . # Source code changes daily -> Cache breaks here! RUN npm install # Heavy install runs every time code changes. # GOOD Architecture COPY package.json . # Changes rarely. RUN npm install # Cached until dependencies change. COPY . . # Code changes frequently, but installs are skipped!

—

—

Technical Challenges

Information Leakage: If you accidentally COPY a file containing passwords (like .env) in Layer 3, and then delete it in Layer 4, a hacker can pull the image, use docker history, and extract the secret from Layer 3.
- Fix: Use .dockerignore files or Multi-Stage Builds.
Inode Exhaustion: Creating thousands of tiny layers (or very small files in one layer) can exhaust the filesystem Inodes on the host, even if disk space is available.
Storage Drivers: Docker relies on the Host OS storage driver (Overlay2, fuse-overlayfs, btrfs). If your kernel is old or incompatible, performance (Disk I/O) will suffer significantly.

—

Docker Storage Drivers: docs.docker.com/storage/storagedriver/
Best Practices for Dockerfiles: docs.docker.com/develop/develop-images/dockerfile_best-practices/

Cheat Sheet: Image Layering

Concept	Analogy	Technical Function	Architect’s Goal
UnionFS	Transparent Sheets.	Merges multiple directories into one view.	Allows massive reuse of base OS layers.
Base Image	The Canvas.	The bottom-most Read-Only layer (OS).	Keep it small (Alpine/Distro less).
Container Layer	Scratchpad.	The top thin Read-Write layer.	Temporary storage only. Use Volumes for data.
Copy-on-Write	Tracing Paper.	Copies file to top layer before editing.	Ensures base images never get corrupted.
Chaining	One Drawing.	Joining commands with `&&`.	Prevents “Ghost Weight” (hidden junk files).
Cache Busting	Domino Effect.	Changing a layer invalidates all layers after it.	Order Dockerfile from Stable to Volatile.

3.2. Multi-Stage Builds: The Architect’s Secret Weapon

Think of a Multi-Stage Build like a Restaurant Kitchen vs. The Customer’s Table.

Stage 1 (The Kitchen/Builder): This is where the cooking happens. It is messy. There are knives, ovens, vegetable peelings, and raw ingredients (Source Code). You need all these tools (Compilers) to make the food, but you don’t want the customer to see them.
Stage 2 (The Table/Runner): You take only the finished dish (Compiled Binary) out of the kitchen and serve it on a clean plate.
The Result: The customer gets a perfect meal. They do not get the dirty pans, the knives, or the vegetable scraps. If a customer or Hacker tries to find a knife to hurt someone, there are none on the table they were left back in the kitchen (which was destroyed).

Before Multi-Stage builds, developers used two Dockerfiles (one for building, one for running) and a shell script to glue them together. Multi-Stage builds allow you to do this in a single file using FROM ... AS ....

—

3.2.1. The “Builder” Stage

To compile code and handle dependencies.
Contains heavyweight tools like gcc (C compiler), mvn (Maven), node_modules, git, and SSH keys.
Once the build is done, this entire environment is discarded. It does not exist in the final image.

—

3.2.2. The “Runner” Stage

To run the application container required only the compiled binary (artifact) and the minimal Runtime libraries (like libc).
The COPY --from=builder instruction acts like a teleportation device. It grabs a specific file from the Builder stage and pulls it into the Runner stage.

—

—

DevSecOps Architect Perspective

This is your primary tool for Hardening.

Attack Surface Reduction
- The Threat: Hackers often use “Living off the Land” attacks. They use tools already installed on your server (like wget, curl, compiler) to download malware or compile exploits.
- The Defense: In a multi-stage image, these tools simply don’t exist.
  - No git? They can’t clone malicious code.
  - No gcc? They can’t compile a privilege escalation exploit.
  - No shell? (If using Distroless) They can’t even run commands.
Massive Size Reduction
- Example: Go Application:
  - Single Stage: Uses golang:1.21 image → Size: ~850MB.
  - Multi-Stage: Uses alpine or scratch → Size: ~10MB.
- Benefit: Smaller images mean faster deployments, faster scaling, and lower storage costs.
The “Non-Root” Mandate
- The Risk: By default, Docker runs as root. If a hacker escapes the app, they have root access to the container.
- The Fix: You must manually create a user.
  #Dockerfile RUN adduser -D appuser # Create user USER appuser # Switch context
- Note: Place the USER instruction at the very end. If you place it too early, you might lose permission to COPY files or install packages.

—

Technical Challenges

Debugging is Hard: The more secure you make the image (e.g., removing the shell), the harder it is to debug. You can’t just docker exec -it my-container bash because bash doesn’t exist.
- Fix: Use “Ephemeral Debug Containers” (Kubernetes allows this) that attach a temporary shell side-car.
Dynamic Linking: If you compile a C++ app in Stage 1 (Ubuntu) and try to run it in Stage 2 (Alpine), it will crash because Alpine uses musl libc while Ubuntu uses glibc.
- Fix: Compile with CGO_ENABLED=0 (Static Linking) or ensure base OS compatibility.
Certificate Roots: If your app talks to HTTPS endpoints (like AWS S3 or Stripe), a “Scratch” (empty) image will fail because it lacks CA Certificates.
- Fix: You must manually copy /etc/ssl/certs/ from the builder stage.

—

Docker Multi-Stage Builds: docs.docker.com/build/building/multi-stage/
Google Distroless Images: github.com/GoogleContainerTools/distroless
OWASP Container Security: cheatsheetseries.owasp.org

—

Cheat Sheet (Multi-Stage Strategy)

Feature	Single Stage (The “Fat” Image)	Multi-Stage (The “Lean” Image)	Architect’s Verdict
Size	Huge (800MB+). Includes source code & tools.	Tiny (10MB-50MB). Binary only.	Lean. Saves money & bandwidth.
Tools	Includes `git`, `curl`, `gcc`, `make`.	Removes ALL build tools.	Secure. “Featureless Desert” for hackers.
Secrets	Risk of leaving `.env` or SSH keys in layers.	Secrets stay in Stage 1; Stage 2 is clean.	Secure. Prevents accidental leaks.
Complexity	Simple Dockerfile.	Requires `AS alias` and `COPY --from`.	Necessary complexity for production.
Debugging	Easy (Has full Shell/Tools).	Hard (Missing Shell/Tools).	Trade-off. Security > Convenience.

—

Practical Lab: The Perfect Dockerfile

Here is the annotated Architect-level Dockerfile

# --- STAGE 1: The Builder ---
# Naming the stage "builder" allows us to reference it later
FROM golang:1.21-alpine AS builder

# Security: Set working directory so we don't pollute root
WORKDIR /app

# Optimization: Copy dependency files FIRST to leverage Docker Cache
COPY go.mod go.sum ./
RUN go mod download

# Build: Copy source and build the binary
COPY . .
# CGO_ENABLED=0 ensures we create a "Static Binary" (no external library dependency)
RUN CGO_ENABLED=0 go build -o main .


# --- STAGE 2: The Production Runner ---
# Use "Alpine" for a small footprint, or "Scratch" for zero footprint
FROM alpine:3.18

# Security: Install CA Certs for HTTPS calls
RUN apk --no-cache add ca-certificates

# Security: Create a non-root system user
RUN adduser -D appuser

# Security: Switch to the non-root user
USER appuser

# Architecture: Copy ONLY the binary from the "builder" stage
COPY --from=builder /app/main .

# Run the app
CMD ["./main"]

3.4. The .dockerignore File: The Gatekeeper

Imagine you are moving to a new house (Building a Docker Image).

With .dockerignore: You make a strict list of things NOT to pack. You leave the trash, the heavy old furniture, etc.
- Result: You only need 1 small truck. The move is 10x faster. Your secrets stay safe in the old house.
Without .dockerignore: You blindly pack everything in your current house including the trash under the sink, old receipts etc.
- Result: It takes 10 trucks to move. The move is slow.

When you run docker build . the Docker CLI (Client) doesn’t just start building. First, it packages the entire current directory (called the Build Context) and uploads it to the Docker Daemon (Engine).

Speed (The 1GB Problem): If you have a .git folder or node_modules totaling 1GB, Docker uploads that 1GB to the daemon before looking at the Dockerfile. .dockerignore stops this upload.
Security (The .env Problem): If you run COPY . . and you have a .env file with passwords, that file is baked into the image. Even if you rm .env in the next line, the secret remains in the “Image History” layer forever.
Reliability (The OS Mismatch): If you copy local node_modules (compiled for Mac/Windows) into a Linux container, the app will crash with cryptic errors like ELF header mismatch.

Most people list what to ignore. An Architect lists what to keep. This is the “Deny All, Allow Some” whitelist strategy.

The Perfect .dockerignore:

#.dockerigore
# 1. First, ignore EVERYTHING (The "Deny All" rule)
**

# 2. Then, "Un-ignore" only what you strictly need (The "Allow" rule)
!src/
!package.json
!package-lock.json
!go.mod
!go.sum
!main.go

# 3. Explicitly block dangerous files (Safety Net)
# Just in case you remove the "**" rule later
.git
.env*
**/*.log
docker-compose*
Dockerfile*
README.md

—

—

Technical Challenges

These are the hidden risks that separate a “Docker User” from a “DevSecOps Architect.”

Challenge	The Architect’s Fix	Why it Matters
Dependency Confusion	Private Scoped Registries. Use scoped names (e.g., `@mycorp/auth-lib`) and strict lockfiles.	Prevention. Stops hackers from uploading a malicious package with the same name to public npm/pip, tricking your build into pulling it.
Image Poisoning	Binary Authorization (Cosign/Sigstore). Sign images in CI/CD. Production rejects unsigned images.	Trust. Ensures that only images built by your trusted pipeline can run. If a hacker pushes a fake image, K8s blocks it.
Privilege Escalation	User Namespaces (`userns-remap`). Map “Root” inside the container to “Nobody” on the Host.	Containment. Even if a hacker becomes root inside the container, the Host kernel sees them as a powerless user.
Exploiting Syscalls	Seccomp Profiles. Whitelist only necessary kernel calls (e.g., block `ptrace`, `reboot`).	Hardening. Stops malware from making dangerous requests to the Host Kernel.
Stale Image Risks	Automated Patching (Renovate/Dependabot). Daily scans for base image updates.	Hygiene. Ensures you never run an image with a “Critical” CVE discovered after deployment.
Container Drift	Read-Only Root Filesystem. Run with `--read-only`.	Immutability. Prevents hackers (or devs) from installing tools (`apt install`) or modifying files at runtime.

—

The Gold Standard: Dockerfile Best Practices (Official)
Security Tooling: Hadolint (Dockerfile Linter) – Every Architect should enforce this in CI/CD.

3.5 Docker Security Commands Cheat Sheet.

Think of your CI/CD Pipeline like an Airport Security Checkpoint.

Trivy/Docker Scout: This is the X-Ray Machine. It looks inside your luggage (Image) to find dangerous items like guns or knives (Vulnerabilities/CVEs).
Cosign: This is the Passport Stamp. It proves who you are and that your luggage hasn’t been opened or tampered with since you checked it in.

Trivy (The Scanner)
- The most popular open-source scanner by Aqua Security. It is fast, comprehensive, and scans OS packages (Alpine/RedHat) and language dependencies (Node/Python).
- The Command: trivy image <image-name>
- DevSecOps Usage: You run this before pushing to the registry. If it finds “Critical” issues, you fail the build.
Docker Scout (The Native / Legacy “Scan”)
- docker scan (based on Snyk) is deprecated. The new standard is Docker Scout.
- Docker’s native tool to analyze image contents. It gives you a “Health Score” and remediation advice.
- Command: docker scout quickview <image-name>
Cosign (The Signer)
- A tool from the “Sigstore” project. It allows you to digitally sign your container images.
- Use this to prevent “Supply Chain Attacks.” Even if a hacker steals your credentials, they can’t sign a malicious image with your private key.
- The Process:
  - Sign: You sign the image after building.
  - Verify: Your Kubernetes cluster verifies the signature before running it.

—

DevSecOps Architect Perspective

As an Architect, your job is Automated Policy Enforcement.

The “Blocker” Rule: Do not just “report” vulnerabilities. Block the pipeline.
- Command: trivy image --exit-code 1 --severity CRITICAL my-app:latest
- Result: If a critical bug is found, the script exits with error 1, stopping Jenkins/GitHub Actions instantly.
The “Golden Image” Strategy: Do not scan every single build from scratch. Scan your Base Images (e.g., your corporate Java base) nightly. If the base is safe, the apps built on top are 90% safe.
Provenance (SLSA): Use Cosign not just to sign, but to attach an “SBOM” (Software Bill of Materials). It’s like an ingredients list for your software.

—

Technical Challenges

False Positives (The Noise): Scanners often report bugs that don’t affect you (e.g., a bug in a kernel module you deleted).
- Fix: Use a .trivyignore file to acknowledge and suppress known non-issues so your dashboard stays green.
Zero-Day Lag: A vulnerability is announced today (Log4j). It takes hours for the scanner database to update. During this window, you are blind.
Rate Limiting: If you scan 1000 times an hour against Docker Hub, you will get blocked. Use a local caching proxy or Harbor Registry (which has Trivy built-in).

—

Trivy Docs: aquasecurity.github.io/trivy
Cosign (Sigstore): docs.sigstore.dev/cosign/overview/
Docker Scout: docs.docker.com/scout/

—

Cheat Sheet (Security Commands)

Tool	Command	What it does	When to use
Trivy	`trivy image myapp:v1`	Basic Scan. Lists all vulnerabilities.	Local development.
Trivy	`trivy image --severity CRITICAL --exit-code 1 myapp`	Gatekeeper. Fails build if critical bugs found.	CI/CD Pipeline (Must Have).
Trivy	`trivy fs .`	Filesystem Scan. Scans code/config before building.	Pre-commit hook.
Scout	`docker scout cves myapp`	Deep Dive. detailed CVE list by Docker.	Debugging specific Docker layers.
Cosign	`cosign generate-key-pair`	Setup. Creates `cosign.key` and `cosign.pub`.	One-time setup.
Cosign	`cosign sign --key cosign.key myrepo/myapp:v1`	Sign. Attaches digital signature to registry.	After Build & Scan pass.
Cosign	`cosign verify --key cosign.pub myrepo/myapp:v1`	Verify. Checks if image is trusted.	Before Deployment ubernetes).

Tags:

Tech should learn

AWS(Draft)

AWS-Cloud-Tech

AWS-Compute

DevOps Essentials

DevSecOps Essentials(Draft)

Programming

Python

CI/CD

GitHub Actions

Kubernetes

Docker

3. Docker Image Engineering

3.1. Layering & The Union File System: Storage Strategy

3.1.1 How Layers are Created

3.1.2 The “Read-Write” Container Layer

3.1.3. The “Copy-on-Write” (CoW) Strategy

3.1.4 The “Ghost Weight” Problem

3.1.5. Caching Strategy (Layer Ordering)

3.2. Multi-Stage Builds: The Architect’s Secret Weapon

3.2.1. The “Builder” Stage

3.2.2. The “Runner” Stage

3.4. The .dockerignore File: The Gatekeeper

3.5 Docker Security Commands Cheat Sheet.