Skip to main content
< All Topics

Kubernetes Jobs and CronJobs

In Kubernetes, usually, we want our applications to run forever, like a web server or a database. But sometimes, we have tasks that just need to run once and then stop, like performing a database backup, processing a batch of files, or sending out emails.

This is where Jobs and CronJobs come in.

  • Job: Use this when you have a specific task to do now. Once the task finishes successfully, the Job is considered complete.
  • CronJob: Use this when you have a task that needs to happen repeatedly on a schedule (like “every day at 5 PM” or “every Monday”).
FeatureJobCronJob
Primary GoalRun a task once until completion.Run a task periodically on a schedule.
TriggerManual (kubectl apply) or external trigger.Time-based (Unix Cron format).
Pod LifecyclePods terminate (Exit 0) after success.Creates a Job object, which then creates Pods.
Restart PolicyOnFailure or Never (cannot be Always).OnFailure or Never.
Key Parametercompletions (how many times to succeed).schedule (when to run).
Failure HandlingRetries based on backoffLimit.Retries via the Job it creates.

When we talk about Workload Controllers like Deployments or StatefulSets, we are talking about Long-Running Processes. However, Jobs and CronJobs handle Batch Processes.

The Job Controller:

When you create a Job, the Job Controller starts a Pod. It watches that Pod closely. If the Pod crashes (exit code non-zero), the controller starts a new one to replace it. It keeps doing this until the Pod finishes successfully (exit code 0).

The CronJob Controller:

The CronJob Controller is actually a manager of Jobs. It does not touch Pods directly. Every time the schedule strikes (e.g., midnight), the CronJob creates a new Job object. That new Job object then goes ahead and creates the Pods. This separation is important for stability.

  • Exit Codes Matter: In a Job, your container must send an “Exit Code 0” to tell Kubernetes “I finished successfully.” If your script crashes or returns exit code 1, Kubernetes thinks it failed and will retry it.
  • Restart Policy: You cannot use restartPolicy: Always for a Job. Why? Because Always means “if it stops, start it again.” But a Job wants to stop when it is done. So, we use OnFailure (restart only if it crashes) or Never (create a totally new pod if it fails).

DevSecOps Architect Level

For a production-grade DevSecOps environment, simply running a Job isn’t enough. You must handle resources, security, and cleanup.

1. Automatic Cleanup (TTL Controller) One common problem is that completed Jobs stay in your cluster forever, cluttering up your kubectl get jobs list.

  • Solution: Use .spec.ttlSecondsAfterFinished.
  • Architect Note: Set this to e.g., 100 seconds. This automatically deletes the Job and its Pods after they finish.

2. Handling “Sidecars” in Jobs

  • The Problem: If you use a service mesh (like Istio or Linkerd) or a log shipper sidecar, the main application container might finish, but the sidecar keeps running. Because one container is still running, the Pod never “completes,” and the Job hangs forever.
  • The Solution: You often need a script wrapper to kill the sidecar once the main app is done, or use native Kubernetes sidecar support (SidecarContainers feature gate in newer K8s versions).

3. Concurrency Policy in CronJobs This is critical for data integrity.

  • Allow (Default): If the 1:00 PM backup is slow and takes 2 hours, and the 2:00 PM backup starts, both run at the same time. This might crash your database!
  • Forbid: If the 1:00 PM backup is still running, the 2:00 PM backup is skipped entirely. This is usually the safest for heavy ops.
  • Replace: The 1:00 PM backup is killed, and the 2:00 PM starts.

Lab 1: The Robust “Pi” Calculator Job

This version includes resource limits, retry logic, and cleanup strategies suitable for a shared cluster.

1: Create file pi-job-robust.yaml

Bash
apiVersion: batch/v1  # The API version for Batch workloads (Jobs/CronJobs)
kind: Job             # The type of resource we are creating
metadata:
  name: pi-calculator-robust
  labels:
    app: math-processing
    owner: devsecops-team
spec:
  # --- RETRY STRATEGY ---
  # If the Pod fails, how many times should K8s try again?
  # Default is 6. We set it to 4 to save resources if code is broken.
  backoffLimit: 4

  # --- CLEANUP STRATEGY (Cost Saving) ---
  # Critical Feature: Automatically delete this Job (and its Pods)
  # 60 seconds after it finishes successfully.
  # This prevents thousands of "Completed" pods from clogging your cluster.
  ttlSecondsAfterFinished: 60

  # --- DEADLINE (Safety Valve) ---
  # If the job takes longer than 5 minutes (300s), kill it.
  # This prevents a "zombie" job from stuck running forever.
  activeDeadlineSeconds: 300

  template:
    metadata:
      name: pi-calculator
    spec:
      containers:
      - name: pi
        image: perl:5.34.0
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]

        # --- RESOURCE LIMITS (Best Practice) ---
        # Always set these so one job doesn't eat all cluster memory.
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"

      # --- RESTART POLICY ---
      # 'OnFailure': If the pod crashes, restart the container on the same node.
      # 'Never': If it crashes, create a totally NEW pod (good for debugging).
      # Note: 'Always' is NOT allowed for Jobs.
      restartPolicy: OnFailure

2: Run Command:

Bash
kubectl apply -f pi-job-robust.yaml

3: Verify Job Creation & Status

First, check if the Job has been accepted and is currently running.

Bash
kubectl get jobs -l owner=devsecops-team
  • Expected Output: You should see pi-calculator-robust with COMPLETIONS as 0/1 (running) or 1/1 (finished).

4: View the Output (The Value of Pi)

Bash
kubectl logs job/pi-calculator-robust
  • Expected Output: A long string of numbers starting with 3.14....

5: Verify the Pod Details

Check the Pod to see if the Resource Limits and Restart Policy were applied correctly.

Bash
# List pods associated with this specific job
kubectl get pods -l job-name=pi-calculator-robust

# Describe the pod to see Events and Resource Limits
kubectl describe pod -l job-name=pi-calculator-robust

6: What to look for:

  • Under Limits, verify cpu: 500m and memory: 128Mi.
  • Under Events, ensure there are no OOMKilled (Out of Memory) errors, which would mean our limits were too tight.

7: Test the “Cleanup Strategy” (TTL)

Your YAML included ttlSecondsAfterFinished: 60. This is a critical feature to test.

  1. Ensure the job shows COMPLETIONS: 1/1.
  2. Wait for 60 seconds.
  3. Run the get command again:
Bash
kubectl get jobs pi-calculator-robust
kubectl get pods -l job-name=pi-calculator-robust
  • Expected Result: Kubernetes returns Error from server (NotFound), confirming that the Job and its Pods were automatically garbage collected to save cluster space.

8: Troubleshooting (If it fails)

If the job fails or gets stuck (perhaps due to activeDeadlineSeconds), look at the events:

Bash
kubectl describe job pi-calculator-robust

Look for: DeadlineExceeded (if it took > 300s) or BackoffLimitExceeded (if the code crashed more than 4 times).


Lab 2: The Production-Grade Nightly Backup CronJob

This version adds history limits, starting deadlines, and concurrency controls to ensure your backups are reliable and don’t crash the server.

1: Create file backup-cron-robust.yaml

Bash
apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-backup-secure
spec:
  # --- SCHEDULING ---
  # Run at 00:00 (Midnight) every day.
  # Syntax: Minute | Hour | Day of Month | Month | Day of Week
  schedule: "0 0 * * *"

  # --- TIMEZONE (New in K8s 1.27+) ---
  # Optional: Ensures the job runs at midnight YOUR time, not UTC.
  # timeZone: "Asia/Kolkata"

  # --- MISSED SCHEDULE HANDLING ---
  # If the cluster is down at midnight and comes up at 00:30,
  # should it run the missed job?
  # If the delay is > 200s, skip it. Prevents old jobs from piling up.
  startingDeadlineSeconds: 200

  # --- CONCURRENCY (Data Safety) ---
  # 'Forbid': If the previous backup is still running, SKIP this new one.
  # 'Allow': Run both (Dangerous for backups!).
  # 'Replace': Kill the old one, start new.
  concurrencyPolicy: Forbid

  # --- HISTORY (Log Management) ---
  # Keep the last 3 successful jobs so we can check logs if needed.
  successfulJobsHistoryLimit: 3
  # Keep only 1 failed job so we can debug, but don't clutter the list.
  failedJobsHistoryLimit: 1

  jobTemplate:
    spec:
      template:
        spec:
          # --- SECURITY CONTEXT (DevSecOps) ---
          # Run as non-root user for security.
          securityContext:
            runAsUser: 1000
            runAsGroup: 3000
            fsGroup: 2000

          containers:
          - name: backup-tool
            image: busybox
            # Simulate a backup process
            args:
            - /bin/sh
            - -c
            - "echo 'Starting secure backup...'; sleep 10; echo 'Backup Complete'"
            
            # --- RESOURCES ---
            resources:
              requests:
                memory: "100Mi"
                cpu: "100m"
              limits:
                memory: "200Mi"
                cpu: "200m"
          
          restartPolicy: OnFailure

2: Run command:

Bash
kubectl apply -f backup-cron-robust.yaml

3: Verify the CronJob is Active

First, confirm the scheduler has registered your CronJob.

Bash
kubectl get cronjob nightly-backup-secure
  • Expected Output: You should see SCHEDULE: 0 0 * * * and SUSPEND: False. The LAST SCHEDULE column will likely be <none> since it hasn’t run yet.

4: Manually Trigger a Job (The “Test Run”)

Instead of waiting for midnight, we can manually create a Job from the CronJob template. This tests if the permissions, image, and commands work.

Bash
# Create a manual job named 'manual-test-1' from the CronJob
kubectl create job --from=cronjob/nightly-backup-secure manual-test-1
  • Why do this? This validates your jobTemplate logic without changing the actual CronJob schedule.

5: Verify Execution & Logs

Now watch the manual job execute.

Bash
# Watch the pod status until it shows 'Completed'
kubectl get pods -w

# Once completed (or running), check the logs
kubectl logs -l job-name=manual-test-1
  • Expected Output:
Bash
Starting secure backup...
(10 second pause)
Backup Complete

6: Verify History Limits

Your YAML has successfulJobsHistoryLimit: 3. To test this, you can trigger the job 4 or 5 times rapidly.

Bash
# Trigger multiple manual jobs quickly
kubectl create job --from=cronjob/nightly-backup-secure manual-test-2
kubectl create job --from=cronjob/nightly-backup-secure manual-test-3
kubectl create job --from=cronjob/nightly-backup-secure manual-test-4
kubectl create job --from=cronjob/nightly-backup-secure manual-test-5

# Check the list of jobs
kubectl get jobs -l job-name!=manual-test-1
  • Note: Depending on your Kubernetes version and garbage collector timing, you might see exactly 3 completed jobs (plus the running ones), or you might see the older ones marked for deletion.

7: Verify Security Context

Ensure the Pod is actually running as the non-root user (User ID 1000) as specified in your YAML.

Bash
# Check the UID of the running process inside the pod
kubectl exec -it job/manual-test-1 -- id
  • Expected Output: uid=1000 gid=3000 groups=2000,3000
  • If you see uid=0 (root), the securityContext was ignored or configured incorrectly.

8: Clean Up Manual Tests

Since manual jobs created via kubectl create job are not managed by the CronJob’s history limit, you should delete them manually to keep the cluster clean.

Bash
kubectl delete job manual-test-1 manual-test-2 manual-test-3 manual-test-4 manual-test-5

Contents
Scroll to Top