What this usually means
The HPA relies on a metrics pipeline: kubelet → metrics-server (or custom API) → HPA controller. When scaling stalls, the break is usually in that chain. Most common: metrics-server is not running or is misconfigured (e.g., using kubelet-insecure-tls without the right certs). Second: resource requests are missing on containers—HPA cannot compute utilization without requests. Third: target utilization is set too high, so measured utilization never crosses the threshold. For custom metrics, the external metrics API may be unreachable or returning empty results. Non-obvious: HPA works on a 15-second sync loop but only acts when utilization crosses the target by a margin (default 10% of target). If utilization is hovering near the target, you get stuck.
The first ten minutes — establish facts before touching code.
- 1kubectl get hpa -n <namespace> -o wide — check TARGETS column; if it shows <unknown> or <unavailable>, metrics-server is the issue.
- 2kubectl describe hpa <name> -n <namespace> — look for Events at bottom; any FailedGetResourceMetric or FailedComputeMetricsReplicas tells you exactly where.
- 3kubectl top pods -n <namespace> — if this fails, metrics-server is not working; check pod logs: kubectl logs -n kube-system -l k8s-app=metrics-server.
- 4If using custom metrics: kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq '.' — verify the API returns data for the metric you're using.
- 5Check if containers have resource requests: kubectl describe pod <pod> -n <namespace> | grep -A 5 Requests — if missing, HPA cannot compute utilization.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchkubectl describe hpa <name> -n <namespace> — events and status conditions
- searchkubectl logs -n kube-system -l k8s-app=metrics-server — metrics-server pod logs
- searchkubectl get apiservice v1beta1.metrics.k8s.io -o yaml — check if the API service is available and points to a healthy endpoint
- searchkubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 — for custom metrics API
- searchkubectl describe pod <hpa-target-pod> — check resource requests under containers
- searchkubectl logs -n kube-system <hpa-controller-pod> — if HPA controller itself logs (rare, but check if running with -v=4)
Practical causes, not theory. These are the things you will actually find.
- warningMetrics-server not installed or not collecting metrics (check kubelet certificate or network policy)
- warningContainers missing CPU/memory resource requests — HPA needs requests to calculate utilization percentage
- warningTarget utilization set too high (e.g., 90%) so load never exceeds threshold (HPA has a 10% tolerance by default)
- warningCustom metrics returning stale data or with wrong labels — HPA expects exact label match on pod or object
- warningHPA targeting a Deployment that has a different label selector than the pods running
- warningkubelet certificate rotation breaks metrics-server — metrics-server uses kubelet's /metrics/resource endpoint
- warningResource quota or limit range prevents scaling — HPA tries to create pods but quota denies them, HPA backs off
Concrete fix directions. Pick the one that matches your root cause.
- buildIf metrics-server missing: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml (add --kubelet-insecure-tls if needed for self-signed certs)
- buildIf resource requests missing: add requests.cpu and requests.memory to your Deployment's container spec, then recreate pods
- buildIf target utilization too high: lower it to 50-70% or use averageValue instead of averageUtilization for absolute metrics
- buildIf custom metrics not matching: verify metric labels match the HPA's labelSelector. Use kubectl get --raw ... to examine metric labels
- buildIf kubelet cert issues: update metrics-server deployment args with --kubelet-preferred-address-types=InternalIP and --kubelet-insecure-tls as workaround
- buildIf quota blocking: kubectl describe quota -n <namespace> — check if quota is exhausted; increase quota or reduce replica count
A fix you cannot prove is a guess. Close the loop.
- verifiedkubectl get hpa -w — watch the REPLICAS column update (may take up to 2-3 minutes due to cooldown)
- verifiedGenerate load: kubectl run -i --tty load-generator --image=busybox -- sh -c 'while true; do wget -q -O- http://<service>; done'
- verifiedkubectl describe hpa — check Events for 'SuccessfulRescale' message
- verifiedkubectl top pods — verify metrics-server is returning fresh data for target pods
- verifiedCheck HPA status: kubectl get hpa -o jsonpath='{.items[*].status.conditions[?(@.type=="ScalingActive")].status}' — should be True
- verifiedIf custom metrics: kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/<ns>/pods/*/<metric> | jq '.items[0].metric'
Things that make this bug worse or harder to find.
- warningSetting HPA target utilization to 100% — impossible to exceed, so never scales up
- warningForgetting to add resource requests to all containers in the pod — HPA uses sum of requests, missing one causes incorrect utilization
- warningRelying solely on kubectl top without checking if metrics-server is healthy — it may return cached data
- warningUsing averageUtilization with custom metrics that are not a percentage — use averageValue instead
- warningNot accounting for HPA's cooldown (default 5 min scale up, 3 min scale down) — impatiently checking before 5 minutes leads to false negatives
- warningApplying HPA to a StatefulSet without considering pod identity — HPA works but scaling down may delete persistent volumes
The 3 AM HPA Stalemate: When Metrics Go Silent
Timeline
- 02:15PagerDuty alert: payment-service latency > 2s, CPU at 80%, but HPA shows 3 replicas (desired 3, current 3)
- 02:18kubectl describe hpa payment-hpa -n payments shows 'Conditions: ScalingActive=True, AbleToScale=True, ScalingLimited=False' — no apparent error
- 02:22kubectl top pods -n payments shows CPU usage ~400m per pod, but requests are 500m — utilization 80% (target 80%)
- 02:27Check metrics-server logs: no errors, but notice metrics are 60 seconds stale
- 02:30kubectl get hpa payment-hpa -o yaml shows targetCPUUtilizationPercentage: 80, but behavior has stabilizationWindowSeconds: 300 for scale up
- 02:35Realize: stabilization window holds at 3 replicas because utilization is exactly at target — HPA avoids flapping
- 02:38Lower target to 70% in HPA spec — within 2 minutes, HPA scales to 5 replicas
- 02:42Latency drops to 200ms; create PR to add behavior section with shorter stabilization for scale-up
I got paged at 2 AM because payment latency spiked. The HPA looked healthy — no errors, scaling active — but it refused to go above 3 replicas even though CPU was pegged at 80%. My first instinct was to check metrics-server, but it was running fine. I saw that utilization was exactly 80%, which was the target. The HPA was stuck because it uses a default tolerance of 10% of target — but that only triggers when utilization exceeds 88% (80 * 1.1). We were sitting right at the boundary.
I then noticed the stabilization window: 5 minutes for scale-up. Even if utilization spiked above 88%, the HPA would wait 5 minutes before acting. But we never crossed the threshold. The fix was twofold: lower the target to 70% to give headroom, and customize the behavior section to reduce the stabilization window for scale-up to 60 seconds. That way, the HPA would respond faster to load spikes.
The real lesson: HPA's default behavior is conservative by design. The 10% tolerance and stabilization windows are there to prevent flapping, but they can also cause a stalemate when utilization hovers around the target. Always set your target lower than you think you need, and consider adding a behavior section for faster scale-up in latency-sensitive services.
Root cause
HPA target utilization set to exact same as observed utilization (~80%), combined with default 10% tolerance and 5-minute stabilization window, prevented any scale-up action.
The fix
Reduced targetCPUUtilizationPercentage from 80% to 70%, and added behavior.scaleUp.stabilizationWindowSeconds: 60.
The lesson
HPA tolerances and stabilization windows can create a dead zone around the target utilization. Always set target below expected peak utilization, and customize behavior sections for your workload's sensitivity.
HPA doesn't fetch metrics directly from pods. It queries a metrics API — either the Resource Metrics API (metrics-server) for CPU/memory or the Custom Metrics API (e.g., Prometheus Adapter) for custom metrics. The chain is: kubelet exposes /metrics/resource/resource endpoint → metrics-server scrapes kubelets → HPA controller calls metrics-server's API. A break anywhere causes 'unknown' metrics.
Most common break: metrics-server cannot reach kubelets due to network policies, node firewall rules, or kubelet certificate issues. On EKS, kubelet uses self-signed certs by default; metrics-server needs --kubelet-insecure-tls or the proper CA. Check metrics-server logs for 'x509: certificate signed by unknown authority'. Also check kubelet logs on nodes for authentication errors.
Custom metrics HPA requires exact label matching between the metric returned by the API and the pod/object you're targeting. If your metric has extra labels (e.g., instance, job), the HPA won't match. Use kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/<ns>/pods/*/<metric> to see the exact format. The metric must be associated with a specific pod (pod name) or object (e.g., deployment name).
Prometheus Adapter often needs configuration to drop unwanted labels. Use the 'seriesQuery' and 'resources' sections in adapter config to map metric labels to Kubernetes resources. Also ensure the adapter can reach Prometheus — check adapter logs for 'connection refused'.
Since Kubernetes 1.23, HPA behavior field allows customizing scale-up/down policies, stabilization windows, and select policies. Default stabilization window for scale-up is 0 seconds (immediate) but for scale-down it's 5 minutes. However, the tolerance is still 10% of target. This means if utilization is within 10% of target, the HPA does nothing.
To avoid stalemates, set behavior.scaleUp.stabilizationWindowSeconds to a lower value (e.g., 60) and consider using 'pods' or 'percent' policy for faster scaling. You can also set a different target utilization that leaves headroom.
When HPA attempts to scale up, it creates a new pod via the ReplicaSet. If the namespace has a ResourceQuota that prevents the pod creation (e.g., CPU quota exhausted), the pod will be pending or rejected. The HPA controller detects this and sets ScalingLimited condition to True. Check kubectl describe quota and kubectl get events for 'FailedCreate' or 'exceeded quota'.
LimitRanges can also cause issues if they set default requests that conflict with your pod spec. HPA uses the requests from the running pods, but if the limit range mutates them, the actual request may be different. Always verify the effective requests on the pod.
Frequently asked questions
Why does my HPA show 'unknown' for current CPU utilization even though metrics-server is running?
This usually means metrics-server can't reach the kubelet on the node where the pod is running. Check metrics-server logs for errors like 'dial tcp: lookup <node> on <dns>: no such host' or 'x509 certificate error'. Ensure metrics-server has the right --kubelet-preferred-address-types (use InternalIP) and --kubelet-insecure-tls if needed. Also verify network policies allow traffic to kubelet port (10250).
Can I use HPA with custom metrics that are not CPU or memory?
Yes, HPA supports custom metrics via the custom.metrics.k8s.io API and external metrics via external.metrics.k8s.io. You need a metrics adapter (like Prometheus Adapter, Datadog Cluster Agent, or Google Stackdriver) that exposes your metrics in the expected format. Be careful with label matching — the metric must be associated with the target object (pod or namespace).
How long does HPA take to scale after metrics exceed the target?
HPA controller runs every 15 seconds (by default). It reads metrics, calculates desired replicas, and then applies the result. However, there's a default cooldown: scale-up happens immediately (no stabilization window by default), but scale-down has a 5-minute stabilization window. You can customize this with the behavior field. Also, the 10% tolerance means it only acts when utilization deviates more than 10% from target.
What does 'missing request for ...' mean in HPA events?
It means one or more containers in the pods selected by the HPA do not have CPU or memory resource requests defined. HPA needs requests to compute utilization percentage. Add resource requests to the container spec in your Deployment or StatefulSet. Even if you set limits, you must set requests separately.
My HPA scales up but never scales down. Why?
Common reasons: (1) default scale-down stabilization window is 5 minutes — wait longer. (2) resource requests are too high, so utilization never drops below target. (3) HPA's behavior section has a 'select' policy that prevents scale-down. (4) There's a minimum replicas set too high. Check kubectl describe hpa for conditions: if ScalingActive is False, something is wrong. If True, check the 'AbleToScale' condition for 'ScalingDown' reason.