Kubernetes TTL Controller

PostedDecember 26, 2021

UpdatedFebruary 19, 2026

Author -Rajkumar Aute

Imagine you are baking cookies. You use a timer. When the timer goes off (the job is done), you don’t leave the dirty trays in the oven forever; you clean them up.

In Kubernetes, Jobs create Pods to do work (like a backup or a calculation). Once the work is Finished (Completed or Failed), these Pods and Job objects usually stay there forever, cluttering your cluster like dirty dishes. The TTL Controller is the automatic dishwasher. You set a timer (ttlSecondsAfterFinished), and once the Job is done, the controller waits for that time and then automatically deletes the Job and its Pods.

Key Characteristics

“If your completed Jobs are piling up and cluttering kubectl get jobs, use ttlSecondsAfterFinished.”
“Setting ttlSecondsAfterFinished: 0 triggers immediate deletion after completion.”
“This controller does NOT terminate running Pods; it only cleans up dead ones.”

Feature	Description
Primary Goal	Automatically clean up finished Jobs and Pods to save resources.
Key Field	`.spec.ttlSecondsAfterFinished`
Trigger	Triggers only when the resource status is `Complete` or `Failed`.
Scope	Currently supports Jobs and Pods (Beta/Stable depending on version).
Component	Runs inside the kube-controller-manager.

The TTL (Time-To-Live) Controller is a control loop within the kube-controller-manager that manages the lifecycle of resource objects that have finished execution. Its primary function is to enforce a TTL policy on Jobs (and Pods), ensuring that they are garbage collected after a user-defined duration once they reach a terminal state (Completed or Failed).

This mechanism solves the “resource leak” problem where thousands of old, completed Job objects remain in the API server, consuming etcd storage and slowing down API responses. Unlike the standard Garbage Collector (which handles owner-dependent relationships), the TTL controller handles time-based cleanup for finished resources.

The “Zombie” Job Problem: By default, if you run a Kubernetes Job, it stays there until you delete it. If you run a cronjob every minute, after 24 hours you have 1,440 dead Job objects.
The Fix: You add one line to your YAML: ttlSecondsAfterFinished: 100.
The Result: 100 seconds after the Job says “I’m done!”, it vanishes.
Supported Resources: Primarily Jobs. (Pod support exists but is less commonly used directly by users, as Jobs manage Pods).

Use Cases

CI/CD Runners: Ephemeral build agents spawned as Jobs.
Machine Learning Training: Massive batch jobs where keeping metadata for 10,000 completed runs crashes the dashboard.
Database Migrations: One-off tasks that run on deploy.

Best Practices

Logging: If you rely on kubectl logs to debug failures, do not set TTL to 0. Give yourself a buffer (e.g., 3600 seconds / 1 hour) to inspect logs before they are deleted.
Centralized Logging: If you ship logs to Elastic/Splunk/Datadog, you can safely set a low TTL (e.g., 60 seconds) since you don’t need the Pod for logs.
Default Policy: Use a Mutating Admission Webhook (like Kyverno) to inject a default ttlSecondsAfterFinished: 86400 (24h) to all Jobs to prevent clutter.

https://kubernetes.io/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically

https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates

Tech should learn

AWS(Draft)

AWS-Cloud-Tech

AWS-Compute

DevOps Essentials

DevSecOps Essentials(Draft)

CI/CD

GitHub Actions

Docker

Kubernetes (Draft)

The Kubernetes Foundation

Kubernetes Architecture

Kubernetes Setting Up the Lab

Kubernetes Namespace

Kubernetes Pod

Kubernetes Workload Controller

Kubernetes Storage and Configurations

Kubernetes Networking

Programming

Python

Kubernetes TTL Controller

Key Characteristics

Use Cases

Best Practices