What this usually means
The pod has a requiredDuringScheduling affinity rule that cannot be satisfied by any node. For pod affinity, this means the target pods (matched by label selectors) are not running on any node, or the topology key (e.g., kubernetes.io/hostname) doesn't produce a match. For anti-affinity, it means every node has at least one pod from the anti-affinity group, so the rule prohibits placement. The scheduler calculates these constraints during predicate phase and will not schedule the pod until the constraint is met or the pod is updated. This is not a resource issue—it's a topology mismatch.
The first ten minutes — establish facts before touching code.
- 1Run `kubectl describe pod <pending-pod>` and look for Events section; specifically count the 'didn't match pod affinity/anti-affinity' lines.
- 2Run `kubectl get pods --all-namespaces -l <your-affinity-selector>` to see if the target pods actually exist and are in Running state.
- 3Check the topology key used in the affinity rule (e.g., kubernetes.io/hostname, failure-domain.beta.kubernetes.io/zone). Verify that the target pods are distributed across different topology values.
- 4If anti-affinity, run `kubectl get pods -o wide --all-namespaces -l <your-selector>` and count pods per node. The rule requires no more than one pod per node (or per topology domain).
- 5For node affinity, check node labels: `kubectl get nodes --show-labels | grep <key>=<value>`. Ensure at least one node has the matching label.
- 6Use `kubectl get events --field-selector involvedObject.name=<pod-name>` to see scheduler events with reasons like 'FailedScheduling'.
The specific files, logs, configs, and dashboards that usually own this bug.
- search`kubectl describe pod <pod>` — Event section shows exact predicate failures.
- search`kubectl get pods -o wide --all-namespaces -l <selector>` — see which nodes host the target pods.
- search`kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.kubernetes\.io/hostname}{"\n"}{end}'` — check topology key values.
- searchScheduler logs (if running as a pod): `kubectl logs -n kube-system <scheduler-pod> --tail=100 | grep <pod-name>`
- searchPod YAML: `kubectl get pod <pod> -o yaml` — view the full affinity spec under spec.affinity.
- searchCluster topology: `kubectl get nodes -o jsonpath='{.items[*].metadata.labels}' | jq` — see all labels on nodes.
Practical causes, not theory. These are the things you will actually find.
- warningTarget pods for the affinity rule don't exist or are not in Running state (e.g., CrashLoopBackOff).
- warningTopology key mismatch: rule uses 'kubernetes.io/hostname' but nodes use a different label (e.g., 'beta.kubernetes.io/instance-type').
- warningAnti-affinity too strict: requiredDuringScheduling with topologyKey 'kubernetes.io/hostname' but there are fewer nodes than desired replicas.
- warningNode affinity labels missing: required node label not present on any node due to misconfiguration or node pool change.
- warningMultiple affinity rules conflict: both pod affinity and anti-affinity rules that are mutually exclusive (e.g., require same node but also anti-affinity on same node).
- warningNamespace mismatch: pod affinity rules by default only match pods in the same namespace; cross-namespace requires labelSelector with namespace field.
Concrete fix directions. Pick the one that matches your root cause.
- buildChange requiredDuringScheduling to preferredDuringScheduling to soften the constraint.
- buildAdd more target pods (scale up the deployment) to satisfy affinity matching on different topology domains.
- buildFix the topology key to match the actual labels on nodes or use a broader key like 'kubernetes.io/hostname' for per-node spread.
- buildAdjust anti-affinity to use 'preferred' instead of 'required', or reduce the number of replicas to fit the available topology domains.
- buildAdd missing labels to nodes: `kubectl label node <node> <key>=<value>`.
- buildIf cross-namespace, add the 'namespace' field under the labelSelector in the affinity rule.
A fix you cannot prove is a guess. Close the loop.
- verifiedAfter changes, run `kubectl get pods -w <pod-name>` and see it transition to Running.
- verifiedRun `kubectl describe pod <pod>` and confirm Events show no affinity-related failures.
- verifiedCheck that the pod landed on the expected node: `kubectl get pod <pod> -o wide`.
- verifiedIf using preferred affinity, verify the pod's node matches the intended topology (e.g., same zone as target).
- verifiedDelete the pending pod and reapply the YAML to force fresh scheduling with new rules.
- verifiedUse `kubectl wait --for=condition=Ready pod/<pod>` to script verification.
Things that make this bug worse or harder to find.
- warningDon't ignore the '0/n nodes available' count breakdown—read the exact reason for each node.
- warningDon't assume all pods are in the same namespace; default affinity only matches within the same namespace.
- warningDon't use 'requiredDuringScheduling' without testing with 'preferred' first in development.
- warningDon't forget that anti-affinity works only if there are enough topology domains (e.g., nodes) for each pod.
- warningDon't overlook that the scheduler considers all pods in the cluster; a pod in another namespace with matching labels can block anti-affinity.
- warningDon't apply changes to the pod spec while the pod is still pending; delete and recreate the pod.
The case of the stuck worker: two deployments, same zone, no placement
Timeline
- 09:15Deploy new 'worker' service with 3 replicas; all remain Pending.
- 09:17Run `kubectl describe pod worker-xxxx`; see '0/5 nodes available: 5 node(s) didn't match pod affinity/anti-affinity'.
- 09:20Check target pods: `kubectl get pods -l app=queue` — only 1 replica running on node-2.
- 09:22Inspect worker YAML: pod affinity with requiredDuringScheduling, labelSelector matchLabels: {app: queue}, topologyKey: kubernetes.io/hostname.
- 09:25Realize: only 1 queue pod exists, so only node-2 matches the affinity. But anti-affinity also exists: same topologyKey, requiredDuringScheduling, max 1 per node.
- 09:27Explain: affinity wants node-2 (queue pod), anti-affinity prohibits more than 1 worker on node-2. No other node has a queue pod, so impossible.
- 09:30Fix: change both affinity and anti-affinity to preferredDuringScheduling. Scale queue to 2 replicas on different nodes.
- 09:32Delete worker pods; they schedule within 10 seconds across two nodes.
I was deploying a new worker service that needed to run on the same node as a queue pod (for low-latency communication). The YAML had a required pod affinity to match the queue's label, and also a required anti-affinity to ensure no two workers on the same node. I thought this was fine — I'd have multiple queue pods spread across nodes.
But the queue deployment had only one replica running on node-2. Every worker pod tried to schedule: affinity said 'must be on node-2', anti-affinity said 'no other worker on node-2'. That's a catch-22. The scheduler correctly rejected all nodes. The events showed 5 nodes didn't match, but the breakdown was generic; I had to manually compute the overlap.
I changed both rules to preferred, scaled the queue to 2 replicas on different nodes, and deleted the pending pods. They scheduled immediately. The lesson: required rules are absolute; always verify the target pod distribution before applying strict affinity. Also, the event messages don't show the combinatorial failure; you have to reason through the constraints yourself.
Root cause
Mutually exclusive required pod affinity and anti-affinity rules: affinity forced a single node, anti-affinity forbade that node for more than one pod.
The fix
Changed both rules to preferredDuringScheduling and scaled the target queue deployment to multiple replicas.
The lesson
RequiredDuringScheduling rules are dangerous when combined; always test with preferred first, and verify the topology distribution of target pods.
The scheduler uses two predicate functions: PodAffinity and PodAntiAffinity. For each node, it iterates over all pods in the cluster (or namespace, depending on rule) that match the labelSelector. It collects the topology key values (e.g., hostname) from those pods. For affinity, the node must have at least one existing pod with that topology value. For anti-affinity, the node must have zero pods with that value (for required rules).
The key insight: the scheduler evaluates all nodes independently. If a node has no matching target pods, it fails affinity. If it already has a pod from the same anti-affinity group, it fails anti-affinity. The combination can produce the impossible scenario where no node passes both. The event message '0/n nodes available' lists all predicates that failed, but it doesn't tell you that the same node failed both affinity and anti-affinity—you have to cross-reference.
A frequent cause is using a topology key that doesn't exist on nodes. For example, `failure-domain.beta.kubernetes.io/zone` might be deprecated or absent in newer clusters (replaced by `topology.kubernetes.io/zone`). If the key is missing, the scheduler treats it as an empty label on every node, so all nodes have the same topology value. That can cause unexpected behavior: for anti-affinity, if all nodes have the same empty value, the rule becomes 'only one pod in the entire cluster'.
Always verify your node labels with `kubectl get nodes -o yaml` and check for the exact key. In GKE, zone labels are `topology.kubernetes.io/zone`. Also, custom topology keys (like a rack label) must be consistent across all nodes.
The scheduler exposes Prometheus metrics on port 10259. Two key metrics: `scheduler_pod_scheduling_duration_seconds` and `scheduler_pod_scheduling_attempts`. For a stuck pod, you'll see high attempts and a failure reason label. You can query: `scheduler_pod_scheduling_attempts{reason="failed_plugin"}`. The plugin name is often 'InterPodAffinity' or 'NodeAffinity'.
Scheduler logs (--v=4 or higher) show detailed predicate results per node. For example: `Predicate PodAffinity failed on node node-3: node doesn't have pod matching labelSelector`. These logs are verbose but invaluable for pinpointing which node failed which rule. Use `kubectl logs -n kube-system kube-scheduler-<node> --tail=500 | grep <pod-name>`.
For many use cases, Pod Topology Spread Constraints (introduced in 1.19) are a better alternative to anti-affinity. They allow you to specify a maximum skew across topology domains, avoiding the all-or-nothing nature of required anti-affinity. For example, `maxSkew: 1, topologyKey: kubernetes.io/hostname, whenUnsatisfiable: DoNotSchedule` is similar to required anti-affinity but more flexible.
However, spread constraints can still conflict with affinity. The scheduler evaluates all constraints together. If you combine required pod affinity (must be on node with target) with a spread constraint that forces a distribution, you might again hit an impossible situation. Always simulate with `kubectl explain` and test in a non-production cluster.
Frequently asked questions
Why does my pod stay Pending even though I have plenty of CPU and memory?
The scheduler may be blocked by pod affinity or anti-affinity rules. Run `kubectl describe pod` and look for events that mention 'didn't match pod affinity/anti-affinity'. The scheduler will not schedule the pod if required rules cannot be satisfied, regardless of resource availability.
Can I see which specific node didn't match the affinity?
Not directly from `kubectl describe` events. You need to enable verbose scheduler logs (--v=4) or check the scheduler metrics. The event only lists the count per predicate. For detailed node-level info, use `kubectl get events --all-namespaces` and filter by the pod name, or inspect the scheduler pod logs.
What's the difference between requiredDuringScheduling and preferredDuringScheduling?
requiredDuringScheduling is a hard constraint: the rule must be satisfied for the pod to be scheduled. If not, the pod stays Pending. preferredDuringScheduling is a soft constraint: the scheduler tries to satisfy it but will schedule the pod elsewhere if it cannot. Use preferred for resilience and required only when absolutely necessary.
How do I check if my topology key is correct?
Run `kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.<your-key>}{"\n"}{end}'`. If the output shows empty values, the key doesn't exist. List all node labels with `kubectl get nodes --show-labels` to see available keys.
Can pod affinity rules match pods across namespaces?
Yes, but you must explicitly specify the `namespaces` field in the labelSelector. By default, affinity only matches pods in the same namespace. Example: `podAffinity: { requiredDuringScheduling: { labelSelector: { matchLabels: { app: queue }, namespaces: ["queue-ns"] }, topologyKey: "kubernetes.io/hostname" } }`.