Kubernetes CoreDNS Service Discovery
CoreDNS
Hello everyone! Welcome back to DevSecOpsGuru.in. Today, we are taking a deep dive into something every Kubernetes engineer must know: CoreDNS Service Discovery. If you are learning Kubernetes, you already know that Pods are constantly dying and spinning back up with new IP addresses. So, how do your frontend microservices find your backend microservices if the IPs keep changing? That is exactly what CoreDNS solves! It acts as the automatic “address book” for your cluster, allowing your apps to talk to each other using simple names instead of confusing, ever-changing IP addresses. Let’s master this concept step-by-step!
Imagine a huge corporate office building where employees are constantly moving desks. If you want to talk to the “Finance Department,” you don’t memorize their desk numbers because they change every day. Instead, you call the friendly office receptionist. You simply say, “Connect me to Finance,” and the receptionist looks up their current desk and routes your call. In Kubernetes, CoreDNS is that friendly receptionist. The “Finance Department” is your Service, and the individual employees at the desks are your Pods.
Quick Reference
- Default IP: The CoreDNS service (often named
kube-dns) usually lives at IP10.96.0.10. - DNS Name Format:
<service-name>.<namespace>.svc.cluster.local - Headless Services: Return the IPs of all the backend Pods directly, bypassing the Service IP.
- Corefile: The main configuration file for CoreDNS, stored as a ConfigMap in the
kube-systemnamespace.
| Record Type | What it resolves | Format / Example |
| A | Service Name to ClusterIP (IPv4) | my-svc.my-namespace.svc.cluster.local |
| AAAA | Service Name to ClusterIP (IPv6) | my-svc.my-namespace.svc.cluster.local |
| A (Headless) | Service Name to Pod IPs | my-svc.my-namespace.svc.cluster.local |
| SRV | Named Port, Protocol, and Target | _http._tcp.my-svc.my-namespace.svc.cluster.local |
| PTR | Reverse Lookup (IP to Name) | 10.96.0.10.in-addr.arpa |
CoreDNS is a fast, flexible, Go-based DNS server that runs as a Deployment within the kube-system namespace. When you create a Service, Kubernetes automatically updates CoreDNS. Simultaneously, the Kubelet on each worker node configures every new Pod’s /etc/resolv.conf file to point to the CoreDNS service IP. Whenever a Pod wants to communicate with another Service, it queries CoreDNS using the Service’s name, gets the stable ClusterIP back, and traffic is routed perfectly.
Architecture and Mechanism CoreDNS watches the Kubernetes API for changes to Services, Endpoints, and EndpointSlices. The translation from application query to network resolution occurs through a rigid chain:
Kubeletprovisions Pods with/etc/resolv.confpointing thenameserverto the ClusterIP of thekube-dnsservice.- The Pod issues a DNS query (UDP/TCP 53) intercepted by the
kube-dnsService, which load-balances the request to active CoreDNS pods. - CoreDNS processes the query through its plugin chain. If the query suffix matches the cluster domain (default
cluster.local), thekubernetesplugin intercepts it and resolves it against the cached Kubernetes API state. - Non-matching queries (external domains) are handled by the
forwardplugin, passing the query to the node’s upstream DNS resolvers.
To really master the foundation, you need to know exactly what happens inside a Pod:
- The
/etc/resolv.conffile: If you runcat /etc/resolv.confinside any standard Kubernetes Pod, you will seenameserver 10.96.0.10(or similar). This forces the container to ask CoreDNS for directions. - The Search Path: The file also contains a
searchline (e.g.,default.svc.cluster.local svc.cluster.local cluster.local). This is a massive time-saver! Because of this search path, if two Pods are in the same namespace, they don’t need the full domain name. A frontend Pod can just saycurl http://backend-service, and CoreDNS will automatically append the search path and find it. - Endpoints Object: CoreDNS doesn’t do the routing; it just gives out the IP. When a Service is created, an
Endpointsobject is also created, maintaining the active list of healthy Pod IPs that belong to that Service.
Record Mapping CoreDNS provisions specific DNS records based on the Service type:
- Normal Services (ClusterIP): Yield an A record mapping
<service>.<namespace>.svc.cluster.localto the Service’s stable ClusterIP. - Headless Services (
clusterIP: None): Yield multiple A records mapping directly to the individual Pod IP addresses, allowing client-side load balancing. - SRV Records: Created for named ports to expose protocol and port combinations, formatted as
_<port>._<proto>.<service>.<namespace>.svc.cluster.local.
For Architects designing enterprise-grade clusters, CoreDNS must be customized and highly available:
- Plugin Chaining: CoreDNS operates on a chain of plugins. The default
Corefileuses plugins likeerrors,health,kubernetes,prometheus,forward, andcache. - Custom ConfigMaps: In managed services like Azure Kubernetes Service (AKS) or Amazon EKS, you should never edit the default
corednsConfigMap directly, as it gets overwritten. Instead, you create acoredns-customConfigMap and use.serveror.overrideblock extensions to configure stub domains (routing on-premise DNS queries through a VPN) or custom upstream forwarders. - Horizontal Autoscaling: Production clusters should use the
cluster-proportional-autoscalerto scale CoreDNS replicas dynamically based on the number of nodes and CPU cores in the cluster, preventing DNS timeouts during massive scaling events.
Corefile Configuration and Plugins The operational logic of CoreDNS is defined in the Corefile, injected via a ConfigMap. A standard Kubernetes Corefile leverages specific plugins:
errors&log: For diagnostics and stdout logging.health&ready: Expose endpoints for Kubernetes liveness and readiness probes.kubernetes: The core logic defining the domain zone and instructing CoreDNS to watch the API.prometheus: Exposes metrics on port 9153, crucial for monitoringcoredns_dns_request_duration_secondsand cache hits.forward: Defines upstream resolution for non-cluster domains.cache: Caches responses (default 30 seconds) to mitigate load.
Troubleshooting and Operational Edge Cases
- The
ndots:5Impact: Kubernetes sets thendotssearch parameter to 5. An external query (e.g.,api.stripe.com) will trigger sequential lookups appending cluster domains (e.g.,api.stripe.com.default.svc.cluster.local) before attempting the absolute domain, resulting in anNXDOMAINstorm and high CoreDNS CPU utilization. Mitigation involves overridingdnsConfigat the Pod spec level. - Conntrack / Source Port Exhaustion: High-throughput UDP DNS queries can cause SNAT port collisions or race conditions in Netfilter/iptables. Transitioning to NodeLocal DNSCache (running a DNS caching agent as a DaemonSet on every node) is the standard architectural fix for large-scale clusters to bypass iptables NAT overhead for DNS.
Key Components
kubelet, Corefile (ConfigMap), CoreDNS Deployment, kube-dns Service, kube-proxy, and the Kubernetes API server.
Key Characteristics
Written in Go, highly modular (plugin-based chain), real-time API watching, supports forward/stub domains, and caches responses.
Use case
A payment processing Pod needs to verify a transaction with the fraud-detection Pod. Instead of querying a database for an IP, it simply makes an HTTP POST request to http://fraud-service.security.svc.cluster.local, and the infrastructure handles the rest.
Benefits
Ensures zero-downtime rolling updates (since IP changes are hidden behind the DNS name), simplifies cross-team communication, and removes hardcoded configuration files from your application code.
Best practices
- Always use
loadbalanceandcache 30plugins in yourCorefile. - Run at least 2 replicas of CoreDNS for high availability.
- Configure explicit
forwardrules for custom corporate domains so queries don’t get lost. - Use FQDNs (ending with a dot, e.g.,
api.external.com.) in application code to bypass thendotssearch penalty.
Limitations
Sticky connections. Because Kubernetes uses connection tracking (conntrack) for ClusterIPs, long-lived client connections can sometimes cause an uneven distribution of DNS queries, overloading one CoreDNS pod while the other sits idle.
Common issues
CrashLoopBackOffon CoreDNS pods due to syntax errors after manually editing theCorefile.- External domains failing to resolve because the upstream DNS server (like 8.8.8.8) is blocked by corporate firewalls.
Problems and solutions
- Problem: An application is experiencing random 5-second timeouts when trying to reach an external API.
- Solution: This is a classic
ndots:5race condition. Modify the application’s deployment manifest to includednsConfigwithndots: 2to reduce the search domain iteration, or append a trailing dot to the external domain string in the application code.
DNS Latency Debugging
DNS Latency. In Kubernetes, applications communicate via domain names. If translating those names into IP addresses takes too long, your entire microservice architecture slows down, leading to random timeouts and angry users. Let’s learn how to spot, debug, and eliminate these mysterious delays!
Quick Reference
- The 5-Second Rule: If your application experiences random timeouts exactly 5 seconds long, it is almost always a DNS UDP packet drop issue.
- ndots:5: The default Kubernetes setting that forces your Pod to guess a domain name up to 5 times before searching the public internet.
- Alpine Linux: Uses
musllibc, which handles DNS queries differently than standard Linux and is notorious for triggering DNS race conditions. - The Tooling: Always keep a
dnsutilsPod handy to runnslookupanddigfrom inside the cluster.
| Debugging Tool | Command Example | What it Reveals |
| dig | dig +stats google.com | Exact DNS query time in milliseconds |
| nslookup | nslookup my-svc.default | Verifies if CoreDNS can resolve internal names |
| resolv.conf | cat /etc/resolv.conf | Shows your Pod’s ndots limit and nameserver IP |
| Metrics | coredns_dns_request_duration_seconds | Prometheus metric showing server-side latency |
Mitigation vectors for the SNAT race condition:
- Application Level: Inject
options single-request-reopeninto the Pod’sdnsConfigto force the resolver to use distinct sockets for A and AAAA queries. - Image Level: Migrate base images from Alpine (
musl) to Debian/Ubuntu variants (glibc), which process queries sequentially or handle collisions more gracefully. - Architecture Level: Implement NodeLocal DNSCache.
NodeLocal DNSCache Architecture
To circumvent conntrack table exhaustion and SNAT race conditions, NodeLocal DNSCache deploys a lightweight DNS caching agent as a DaemonSet on every worker node. It listens on a dummy interface using a local IP (e.g., 169.254.20.10). Pods are configured to send UDP DNS queries to this local agent instead of the remote kube-dns ClusterIP. If a cache miss occurs, the NodeLocal agent forwards the query to the central CoreDNS pods using TCP instead of UDP. TCP establishes a reliable connection stream, completely bypassing the UDP conntrack collision bugs and eliminating the 5-second timeout retries.
Cloud-Provider ENI PPS Limits and DNS Throttling
In high-throughput environments like AWS EKS, worker nodes utilize Elastic Network Interfaces (ENIs). Cloud providers enforce strict hardware-level rate limits on Packets Per Second (PPS). For instance, traffic destined for local proxy services (like the CoreDNS ClusterIP) may be hard-limited (e.g., 1024 PPS). When a cluster experiences a burst of microservice scaling or a cronjob execution, the sheer volume of ndots:5 multiplied UDP queries can easily exceed this ENI limit. Packets are dropped at the hypervisor level. This manifests as application timeouts, yet CoreDNS metrics (coredns_dns_requests_total) will not register the queries, and CoreDNS CPU/Memory will appear perfectly healthy. Debugging this requires querying cloud-native metrics, such as the linklocal_allowance_exceeded or conntrack_allowance_exceeded metrics in AWS CloudWatch, rather than Kubernetes native metrics.
The ndots:5 Penalty POSIX resolver conventions dictate that domains containing fewer dots than the ndots threshold (default 5 in K8s) will be treated as relative names. The resolver will append the local search domains defined in /etc/resolv.conf (<namespace>.svc.cluster.local, svc.cluster.local, cluster.local) before querying the absolute domain. Querying an external API like api.stripe.com (2 dots) triggers sequential UDP queries:
api.stripe.com.<namespace>.svc.cluster.local-> NXDOMAINapi.stripe.com.svc.cluster.local-> NXDOMAINapi.stripe.com.cluster.local-> NXDOMAINapi.stripe.com-> OK This increases DNS traffic by 400% for external calls. Mitigation requires explicitly terminating FQDN strings in application code with a trailing dot (api.stripe.com.) to force an immediate absolute resolution.
–
To start debugging, you need to know how to isolate the problem.
- Check the CoreDNS Logs: Run
kubectl logs -n kube-system -l k8s-app=kube-dns. Look for errors likeSERVFAILori/o timeout. - Deploy a Debug Pod: You cannot troubleshoot DNS blindly. Deploy a Pod with networking tools:
kubectl run dnsutils --rm -it --image=nicolaka/netshoot -- bash. - Use dig: From inside your debug pod, run
dig @10.96.0.10 your-service.namespace.svc.cluster.local. (Assuming10.96.0.10is your CoreDNS IP). Check the “Query time” at the bottom of the output. If it is under 5ms, the server is healthy, and your latency is happening elsewhere.
Best practices
- Always use Fully Qualified Domain Names (FQDNs) by adding a trailing dot in your application code (e.g.,
api.stripe.com.). - Switch from Alpine base images to Debian/Ubuntu (which use
glibc) if you cannot control the DNS configuration. - Set
ndots: 2in your PoddnsConfigif you don’t need deep cluster domain searching.
Problems and solutions
- Problem: External API calls to
api.github.comare taking 100ms longer than they should. - Solution: The
ndots:5setting is forcing the Pod to queryapi.github.com.default.svc.cluster.localand other local domains first. Append a trailing dot (api.github.com.) in your code to tell the resolver it is an absolute name, bypassing the local search list instantly.
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution