Skip to main content
< All Topics

Kubernetes CoreDNS Service Discovery

CoreDNS

Hello everyone! Welcome back to DevSecOpsGuru.in. Today, we are taking a deep dive into something every Kubernetes engineer must know: CoreDNS Service Discovery. If you are learning Kubernetes, you already know that Pods are constantly dying and spinning back up with new IP addresses. So, how do your frontend microservices find your backend microservices if the IPs keep changing? That is exactly what CoreDNS solves! It acts as the automatic “address book” for your cluster, allowing your apps to talk to each other using simple names instead of confusing, ever-changing IP addresses. Let’s master this concept step-by-step!

Imagine a huge corporate office building where employees are constantly moving desks. If you want to talk to the “Finance Department,” you don’t memorize their desk numbers because they change every day. Instead, you call the friendly office receptionist. You simply say, “Connect me to Finance,” and the receptionist looks up their current desk and routes your call. In Kubernetes, CoreDNS is that friendly receptionist. The “Finance Department” is your Service, and the individual employees at the desks are your Pods.

Quick Reference

  • Default IP: The CoreDNS service (often named kube-dns) usually lives at IP 10.96.0.10.
  • DNS Name Format: <service-name>.<namespace>.svc.cluster.local
  • Headless Services: Return the IPs of all the backend Pods directly, bypassing the Service IP.
  • Corefile: The main configuration file for CoreDNS, stored as a ConfigMap in the kube-system namespace.
Record TypeWhat it resolvesFormat / Example
AService Name to ClusterIP (IPv4)my-svc.my-namespace.svc.cluster.local
AAAAService Name to ClusterIP (IPv6)my-svc.my-namespace.svc.cluster.local
A (Headless)Service Name to Pod IPsmy-svc.my-namespace.svc.cluster.local
SRVNamed Port, Protocol, and Target_http._tcp.my-svc.my-namespace.svc.cluster.local
PTRReverse Lookup (IP to Name)10.96.0.10.in-addr.arpa

CoreDNS is a fast, flexible, Go-based DNS server that runs as a Deployment within the kube-system namespace. When you create a Service, Kubernetes automatically updates CoreDNS. Simultaneously, the Kubelet on each worker node configures every new Pod’s /etc/resolv.conf file to point to the CoreDNS service IP. Whenever a Pod wants to communicate with another Service, it queries CoreDNS using the Service’s name, gets the stable ClusterIP back, and traffic is routed perfectly.

Architecture and Mechanism CoreDNS watches the Kubernetes API for changes to Services, Endpoints, and EndpointSlices. The translation from application query to network resolution occurs through a rigid chain:

  1. Kubelet provisions Pods with /etc/resolv.conf pointing the nameserver to the ClusterIP of the kube-dns service.
  2. The Pod issues a DNS query (UDP/TCP 53) intercepted by the kube-dns Service, which load-balances the request to active CoreDNS pods.
  3. CoreDNS processes the query through its plugin chain. If the query suffix matches the cluster domain (default cluster.local), the kubernetes plugin intercepts it and resolves it against the cached Kubernetes API state.
  4. Non-matching queries (external domains) are handled by the forward plugin, passing the query to the node’s upstream DNS resolvers.

To really master the foundation, you need to know exactly what happens inside a Pod:

  • The /etc/resolv.conf file: If you run cat /etc/resolv.conf inside any standard Kubernetes Pod, you will see nameserver 10.96.0.10 (or similar). This forces the container to ask CoreDNS for directions.
  • The Search Path: The file also contains a search line (e.g., default.svc.cluster.local svc.cluster.local cluster.local). This is a massive time-saver! Because of this search path, if two Pods are in the same namespace, they don’t need the full domain name. A frontend Pod can just say curl http://backend-service, and CoreDNS will automatically append the search path and find it.
  • Endpoints Object: CoreDNS doesn’t do the routing; it just gives out the IP. When a Service is created, an Endpoints object is also created, maintaining the active list of healthy Pod IPs that belong to that Service.

Record Mapping CoreDNS provisions specific DNS records based on the Service type:

  • Normal Services (ClusterIP): Yield an A record mapping <service>.<namespace>.svc.cluster.local to the Service’s stable ClusterIP.
  • Headless Services (clusterIP: None): Yield multiple A records mapping directly to the individual Pod IP addresses, allowing client-side load balancing.
  • SRV Records: Created for named ports to expose protocol and port combinations, formatted as _<port>._<proto>.<service>.<namespace>.svc.cluster.local.

For Architects designing enterprise-grade clusters, CoreDNS must be customized and highly available:

  • Plugin Chaining: CoreDNS operates on a chain of plugins. The default Corefile uses plugins like errors, health, kubernetes, prometheus, forward, and cache.
  • Custom ConfigMaps: In managed services like Azure Kubernetes Service (AKS) or Amazon EKS, you should never edit the default coredns ConfigMap directly, as it gets overwritten. Instead, you create a coredns-custom ConfigMap and use .server or .override block extensions to configure stub domains (routing on-premise DNS queries through a VPN) or custom upstream forwarders.
  • Horizontal Autoscaling: Production clusters should use the cluster-proportional-autoscaler to scale CoreDNS replicas dynamically based on the number of nodes and CPU cores in the cluster, preventing DNS timeouts during massive scaling events.

Corefile Configuration and Plugins The operational logic of CoreDNS is defined in the Corefile, injected via a ConfigMap. A standard Kubernetes Corefile leverages specific plugins:

  • errors & log: For diagnostics and stdout logging.
  • health & ready: Expose endpoints for Kubernetes liveness and readiness probes.
  • kubernetes: The core logic defining the domain zone and instructing CoreDNS to watch the API.
  • prometheus: Exposes metrics on port 9153, crucial for monitoring coredns_dns_request_duration_seconds and cache hits.
  • forward: Defines upstream resolution for non-cluster domains.
  • cache: Caches responses (default 30 seconds) to mitigate load.

Troubleshooting and Operational Edge Cases

  • The ndots:5 Impact: Kubernetes sets the ndots search parameter to 5. An external query (e.g., api.stripe.com) will trigger sequential lookups appending cluster domains (e.g., api.stripe.com.default.svc.cluster.local) before attempting the absolute domain, resulting in an NXDOMAIN storm and high CoreDNS CPU utilization. Mitigation involves overriding dnsConfig at the Pod spec level.
  • Conntrack / Source Port Exhaustion: High-throughput UDP DNS queries can cause SNAT port collisions or race conditions in Netfilter/iptables. Transitioning to NodeLocal DNSCache (running a DNS caching agent as a DaemonSet on every node) is the standard architectural fix for large-scale clusters to bypass iptables NAT overhead for DNS.

Key Components

kubelet, Corefile (ConfigMap), CoreDNS Deployment, kube-dns Service, kube-proxy, and the Kubernetes API server.

Key Characteristics

Written in Go, highly modular (plugin-based chain), real-time API watching, supports forward/stub domains, and caches responses.

Use case

A payment processing Pod needs to verify a transaction with the fraud-detection Pod. Instead of querying a database for an IP, it simply makes an HTTP POST request to http://fraud-service.security.svc.cluster.local, and the infrastructure handles the rest.

Benefits

Ensures zero-downtime rolling updates (since IP changes are hidden behind the DNS name), simplifies cross-team communication, and removes hardcoded configuration files from your application code.

Best practices

  • Always use loadbalance and cache 30 plugins in your Corefile.
  • Run at least 2 replicas of CoreDNS for high availability.
  • Configure explicit forward rules for custom corporate domains so queries don’t get lost.
  • Use FQDNs (ending with a dot, e.g., api.external.com.) in application code to bypass the ndots search penalty.

Limitations

Sticky connections. Because Kubernetes uses connection tracking (conntrack) for ClusterIPs, long-lived client connections can sometimes cause an uneven distribution of DNS queries, overloading one CoreDNS pod while the other sits idle.

Common issues

  • CrashLoopBackOff on CoreDNS pods due to syntax errors after manually editing the Corefile.
  • External domains failing to resolve because the upstream DNS server (like 8.8.8.8) is blocked by corporate firewalls.

Problems and solutions

  • Problem: An application is experiencing random 5-second timeouts when trying to reach an external API.
  • Solution: This is a classic ndots:5 race condition. Modify the application’s deployment manifest to include dnsConfig with ndots: 2 to reduce the search domain iteration, or append a trailing dot to the external domain string in the application code.

DNS Latency Debugging

DNS Latency. In Kubernetes, applications communicate via domain names. If translating those names into IP addresses takes too long, your entire microservice architecture slows down, leading to random timeouts and angry users. Let’s learn how to spot, debug, and eliminate these mysterious delays!

Quick Reference

  • The 5-Second Rule: If your application experiences random timeouts exactly 5 seconds long, it is almost always a DNS UDP packet drop issue.
  • ndots:5: The default Kubernetes setting that forces your Pod to guess a domain name up to 5 times before searching the public internet.
  • Alpine Linux: Uses musl libc, which handles DNS queries differently than standard Linux and is notorious for triggering DNS race conditions.
  • The Tooling: Always keep a dnsutils Pod handy to run nslookup and dig from inside the cluster.
Debugging ToolCommand ExampleWhat it Reveals
digdig +stats google.comExact DNS query time in milliseconds
nslookupnslookup my-svc.defaultVerifies if CoreDNS can resolve internal names
resolv.confcat /etc/resolv.confShows your Pod’s ndots limit and nameserver IP
Metricscoredns_dns_request_duration_secondsPrometheus metric showing server-side latency

Mitigation vectors for the SNAT race condition:

  1. Application Level: Inject options single-request-reopen into the Pod’s dnsConfig to force the resolver to use distinct sockets for A and AAAA queries.
  2. Image Level: Migrate base images from Alpine (musl) to Debian/Ubuntu variants (glibc), which process queries sequentially or handle collisions more gracefully.
  3. Architecture Level: Implement NodeLocal DNSCache.

NodeLocal DNSCache Architecture

To circumvent conntrack table exhaustion and SNAT race conditions, NodeLocal DNSCache deploys a lightweight DNS caching agent as a DaemonSet on every worker node. It listens on a dummy interface using a local IP (e.g., 169.254.20.10). Pods are configured to send UDP DNS queries to this local agent instead of the remote kube-dns ClusterIP. If a cache miss occurs, the NodeLocal agent forwards the query to the central CoreDNS pods using TCP instead of UDP. TCP establishes a reliable connection stream, completely bypassing the UDP conntrack collision bugs and eliminating the 5-second timeout retries.

Cloud-Provider ENI PPS Limits and DNS Throttling

In high-throughput environments like AWS EKS, worker nodes utilize Elastic Network Interfaces (ENIs). Cloud providers enforce strict hardware-level rate limits on Packets Per Second (PPS). For instance, traffic destined for local proxy services (like the CoreDNS ClusterIP) may be hard-limited (e.g., 1024 PPS). When a cluster experiences a burst of microservice scaling or a cronjob execution, the sheer volume of ndots:5 multiplied UDP queries can easily exceed this ENI limit. Packets are dropped at the hypervisor level. This manifests as application timeouts, yet CoreDNS metrics (coredns_dns_requests_total) will not register the queries, and CoreDNS CPU/Memory will appear perfectly healthy. Debugging this requires querying cloud-native metrics, such as the linklocal_allowance_exceeded or conntrack_allowance_exceeded metrics in AWS CloudWatch, rather than Kubernetes native metrics.

The ndots:5 Penalty POSIX resolver conventions dictate that domains containing fewer dots than the ndots threshold (default 5 in K8s) will be treated as relative names. The resolver will append the local search domains defined in /etc/resolv.conf (<namespace>.svc.cluster.local, svc.cluster.local, cluster.local) before querying the absolute domain. Querying an external API like api.stripe.com (2 dots) triggers sequential UDP queries:

  1. api.stripe.com.<namespace>.svc.cluster.local -> NXDOMAIN
  2. api.stripe.com.svc.cluster.local -> NXDOMAIN
  3. api.stripe.com.cluster.local -> NXDOMAIN
  4. api.stripe.com -> OK This increases DNS traffic by 400% for external calls. Mitigation requires explicitly terminating FQDN strings in application code with a trailing dot (api.stripe.com.) to force an immediate absolute resolution.

To start debugging, you need to know how to isolate the problem.

  • Check the CoreDNS Logs: Run kubectl logs -n kube-system -l k8s-app=kube-dns. Look for errors like SERVFAIL or i/o timeout.
  • Deploy a Debug Pod: You cannot troubleshoot DNS blindly. Deploy a Pod with networking tools: kubectl run dnsutils --rm -it --image=nicolaka/netshoot -- bash.
  • Use dig: From inside your debug pod, run dig @10.96.0.10 your-service.namespace.svc.cluster.local. (Assuming 10.96.0.10 is your CoreDNS IP). Check the “Query time” at the bottom of the output. If it is under 5ms, the server is healthy, and your latency is happening elsewhere.

Best practices

  • Always use Fully Qualified Domain Names (FQDNs) by adding a trailing dot in your application code (e.g., api.stripe.com.).
  • Switch from Alpine base images to Debian/Ubuntu (which use glibc) if you cannot control the DNS configuration.
  • Set ndots: 2 in your Pod dnsConfig if you don’t need deep cluster domain searching.

Problems and solutions

  • Problem: External API calls to api.github.com are taking 100ms longer than they should.
  • Solution: The ndots:5 setting is forcing the Pod to query api.github.com.default.svc.cluster.local and other local domains first. Append a trailing dot (api.github.com.) in your code to tell the resolver it is an absolute name, bypassing the local search list instantly.

https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution

Contents
Scroll to Top