What this usually means
The core issue is that the Pod's networking stack cannot reach the Service's ClusterIP and port. This can stem from several layers: the Service may have no healthy Pod endpoints (e.g., mismatched selectors), DNS resolution may fail or return a stale IP, network policies (either built-in or third-party like Calico) may be blocking the traffic, or kube-proxy may not be running correctly on the node hosting the source Pod. It's also possible that the target Pod's container is not listening on the expected port, or a NodePort issue if using NodePort access.
The first ten minutes — establish facts before touching code.
- 1From the source Pod, run `nslookup my-service.default.svc.cluster.local` to verify DNS resolution.
- 2Check Service endpoints: `kubectl get endpoints my-service` — if empty, no Pods match the Service's selector.
- 3Verify Pod-to-Pod connectivity: directly ping the target Pod's IP from the source Pod using `kubectl exec <source-pod> -- ping <target-pod-ip>`.
- 4Inspect network policies: `kubectl get networkpolicies -n <namespace>` and check for deny rules.
- 5On the node where the source Pod runs, check kube-proxy logs: `journalctl -u kube-proxy -n 50`.
- 6curl the Service IP directly from the source Pod: `curl http://<cluster-ip>:<port>` to bypass DNS.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchService definition: `kubectl get svc my-service -o yaml` — check selector, ports, type.
- searchEndpoints object: `kubectl get endpoints my-service -o yaml` — verify addresses and ports.
- searchTarget Pod logs and readiness: `kubectl logs <pod>` and `kubectl describe pod <pod>` for readinessProbe.
- searchkube-proxy config: `kubectl get configmap -n kube-system kube-proxy -o yaml`.
- searchNode iptables rules: `iptables-save | grep <cluster-ip>` on the node where the source Pod runs.
- searchCNI plugin status: e.g., `calicoctl get ippool` for Calico, or `kubectl -n kube-system logs <flannel-pod>`.
- searchNetwork policy YAML: `kubectl get networkpolicies -n <namespace> -o yaml`.
Practical causes, not theory. These are the things you will actually find.
- warningService selector does not match Pod labels — leads to zero endpoints.
- warningPod readiness probe failing, so Pod is not considered ready and removed from endpoints.
- warningkube-proxy not running or not updated — stale iptables rules.
- warningNetworkPolicy denies egress from source namespace or ingress to target namespace.
- warningDNS not resolving the Service name — coreDNS pod issues or search domain misconfiguration.
- warningTarget container not listening on the declared service port (e.g., app listens on 3000 but service port is 8080).
- warningNodePort service accessed via wrong port or firewall blocking NodePort range.
Concrete fix directions. Pick the one that matches your root cause.
- buildFix selector: update Service's 'selector' block to match Pod labels exactly.
- buildFix readiness probe: adjust probe parameters or fix the application to respond correctly.
- buildRestart kube-proxy: `kubectl delete pod -n kube-system -l k8s-app=kube-proxy` on affected nodes.
- buildAdd network policy to allow traffic: create a NetworkPolicy allowing ingress from source namespace.
- buildFix DNS: restart coreDNS pods (`kubectl rollout restart -n kube-system deployment/coredns`) or check kube-dns config.
- buildCorrect port mapping: ensure Service port and targetPort match the container's listening port.
- buildUse headless service for stateful workloads if ClusterIP behavior is not needed.
A fix you cannot prove is a guess. Close the loop.
- verifiedRun `kubectl exec <source-pod> -- curl http://my-service:8080` and get a successful response.
- verifiedCheck endpoints again: `kubectl get endpoints my-service` shows the target Pod IPs.
- verifiedTest from another namespace with a simple busybox pod: `kubectl run test --image=busybox --rm -it -- sh`.
- verifiedVerify DNS from within Pod: `nslookup my-service.default.svc.cluster.local` returns correct ClusterIP.
- verifiedCheck iptables rules on node: `iptables -t nat -L KUBE-SERVICES | grep <cluster-ip>` shows DNAT rules.
- verifiedMonitor kube-proxy logs: no errors like 'can't find endpoints'.
Things that make this bug worse or harder to find.
- warningAssuming the Service name is resolvable from Pods without specifying the full FQDN.
- warningOnly testing from the host node; Pod network isolation can mask host-level connectivity.
- warningChanging Service port without updating the container's listening port.
- warningApplying network policies without testing with a permissive default deny policy.
- warningRestarting the target Pod but not the source Pod (for DNS caching in some cases).
- warningOverlooking that the target Pod might be in a different namespace; Service must be referenced as <service>.<namespace>.svc.cluster.local.
Payment Service Unreachable from Order Service
Timeline
- 09:15Alert: Order service returning 502 errors for payment API calls.
- 09:20SSH into order-pod-5f4d7; curl http://payment-svc:8080/healthz hangs for 30s then timeout.
- 09:22kubectl exec order-pod-5f4d7 -- nslookup payment-svc returns 10.96.0.1 (wrong IP, that's kubernetes service).
- 09:25Check DNS: kubectl exec -n kube-system coredns-xxx -- nslookup payment-svc.default — fails.
- 09:30kubectl get svc payment-svc -o yaml: found selector 'app: payment, version: v1' but Pods have 'app: payment, version: v2'.
- 09:32Check endpoints: kubectl get endpoints payment-svc — empty.
- 09:35Update service selector to version: v2.
- 09:37Endpoints now show two pod IPs.
- 09:38curl from order-pod succeeds. Incident resolved.
I started my day with a PagerDuty alert: the order service was returning 502 errors when calling the payment service. I immediately jumped into the cluster. First, I exec'd into the order pod and tried curling the payment service: `curl http://payment-svc:8080/healthz`. It hung for 30 seconds and timed out. Classic symptom.
I checked DNS by doing `nslookup payment-svc` from inside the pod. It returned 10.96.0.1 — that's the kubernetes service itself, not the payment service. That meant DNS was either misconfigured or the service didn't exist. I double-checked: `kubectl get svc payment-svc` showed it was there, but I noticed it was in the default namespace. From order pod (also default), the short name should work. But the wrong IP suggested coreDNS was failing.
I then checked endpoints: `kubectl get endpoints payment-svc` — empty. That's the smoking gun. I described the service and saw the selector: 'app: payment, version: v1'. But the payment pods had labels 'app: payment, version: v2'. Someone had bumped the version label but forgot to update the service. I updated the selector to v2, endpoints populated immediately, and curl worked. Lesson: always verify selectors match when scaling or updating deployments.
Root cause
Service selector 'version: v1' did not match Pod labels 'version: v2', resulting in zero endpoints and DNS returning a fallback ClusterIP (the kubernetes service).
The fix
Updated the Service's selector from 'app: payment, version: v1' to 'app: payment, version: v2'. Endpoints appeared instantly and connectivity was restored.
The lesson
When deploying new versions of a microservice, ensure the Service selector is updated if the labels change. Automate this with CI/CD to avoid manual mismatches.
When a Pod can't resolve a Service name, the first thing to check is CoreDNS. Run `kubectl get pods -n kube-system -l k8s-app=kube-dns` to see if CoreDNS pods are running. If they are, check their logs: `kubectl logs -n kube-system <coredns-pod>`. Common errors include 'NXDOMAIN' (service not found) or 'connection refused'. In many cases, the issue is that the Service doesn't exist or the namespace is wrong.
Another common pitfall is the Pod's DNS policy. By default, Pods use 'ClusterFirst' which forwards queries to CoreDNS. If the Pod has `dnsPolicy: Default`, it will use the node's DNS and may not resolve cluster internal names. Verify with `kubectl get pod <pod> -o yaml | grep dnsPolicy`. Also check that the 'search' domain list includes <namespace>.svc.cluster.local — this is set automatically, but if you override it, you may lose resolution.
Network policies are often forgotten. They are namespace-scoped and can block traffic even if DNS and endpoints are fine. Use `kubectl get networkpolicies --all-namespaces` to list them. If any policy exists, it defaults to deny-all for traffic not explicitly allowed. For example, a policy in the 'payment' namespace might only allow ingress from pods with label 'app: order', but if the order pod doesn't have that label, traffic is dropped.
To test, temporarily delete the network policy (if safe) or add a permissive policy to allow all ingress from the source namespace. Alternatively, you can run a test pod in the same namespace with no network policy restrictions to isolate the issue. I've seen cases where a 'default-deny' policy was applied to a namespace without anyone remembering.
When a Pod sends a packet to a ClusterIP, kube-proxy (usually running as a DaemonSet) installs iptables rules to DNAT the destination to a healthy Pod IP. If kube-proxy is down or its rules are stale, the packet may be dropped. Check kube-proxy logs on the node where the source Pod runs: `journalctl -u kube-proxy -f`. Look for errors like 'can't find endpoints' or 'cannot connect to apiserver'.
You can also dump iptables rules: `iptables -t nat -L -n | grep <cluster-ip>`. If you see no rules, kube-proxy hasn't programmed them. Restart kube-proxy by deleting its pod: `kubectl delete pod -n kube-system -l k8s-app=kube-proxy`. This will force a restart. Also verify that the kube-proxy configmap has correct cluster CIDR and mode (iptables vs IPVS).
An empty endpoints list means no Pod matches the Service's selector. This is the most common cause. Use `kubectl get pods -l <selector-from-service>` to see if any pods exist. If they do but endpoints are still empty, check if the Pods are not ready due to failing readiness probes. Run `kubectl describe pod <pod>` and look at 'Conditions: Ready'. If false, the pod will be removed from endpoints. Fix the readiness probe or the application.
Also note that Services with `type: ExternalName` don't have endpoints. Headless services (clusterIP: None) have endpoints but are not load-balanced. For normal ClusterIP services, ensure `clusterIP` is not set to None.
A classic mistake: the Service declares `port: 8080` and `targetPort: 3000`, but the container listens on 3000. That's fine as long as targetPort matches. But if the container listens on 3000 and the Service's targetPort is also 3000, but the container's `containerPort` is 8080 (just informational), it still works. The real issue is when the container doesn't listen on any port. Verify with `kubectl exec <pod> -- netstat -tlnp` inside the target pod to see which ports are actually open.
Also check that the application is bound to 0.0.0.0, not 127.0.0.1. If it binds to localhost, it won't accept connections from outside the pod's network namespace. This is common in development containers.
Frequently asked questions
Why does curl to the Service hang for 30 seconds before timeout?
A 30-second timeout is typical of TCP connection timeout when the SYN packet is silently dropped. This can happen if a network policy drops the packet, or if the destination IP (ClusterIP) is not reachable because kube-proxy hasn't installed DNAT rules. It's not an application-level timeout—it's a network-level timeout.
How do I check if a network policy is blocking traffic?
First, list all network policies in the source and target namespaces: `kubectl get netpol -n <namespace>`. Then, use `kubectl describe netpol <name> -n <namespace>` to see the rules. To test, you can temporarily create a permissive policy that allows all ingress/egress, or use a tool like `kubectl netpol` (if installed) to simulate traffic flow.
What does 'no route to host' mean in this context?
'no route to host' indicates that the source Pod cannot find a network path to the destination IP. This could be because the destination IP is in a different subnet not reachable due to CNI misconfiguration, or because the target Pod is not running on any node (e.g., all replicas are pending or evicted). Check that the target Pods are running and have IP addresses assigned.
Can I use a headless service to bypass DNS issues?
A headless service (clusterIP: None) does not have a ClusterIP, so it cannot be used with a simple ClusterIP-based connection. Instead, it returns DNS A records for all Pod IPs. You would need to use a client that handles multiple IPs (like a database driver). Headless services are useful for stateful workloads but do not solve typical connectivity issues; they change the behavior.
Why does the Service work from the host node but not from a Pod?
From the host node, you are bypassing Pod network isolation. The host can directly reach the ClusterIP via the node's network stack. From a Pod, traffic goes through the Pod's network namespace, which may be subject to different iptables rules (managed by kube-proxy) and network policies. Also, the host may have direct access to the target Pod's IP if they are on the same node, bypassing the Service abstraction.