Kubernetes HPA Not Scaling Debug Guide

What this usually means

The HPA relies on a metrics pipeline: kubelet → metrics-server (or custom API) → HPA controller. When scaling stalls, the break is usually in that chain. Most common: metrics-server is not running or is misconfigured (e.g., using kubelet-insecure-tls without the right certs). Second: resource requests are missing on containers—HPA cannot compute utilization without requests. Third: target utilization is set too high, so measured utilization never crosses the threshold. For custom metrics, the external metrics API may be unreachable or returning empty results. Non-obvious: HPA works on a 15-second sync loop but only acts when utilization crosses the target by a margin (default 10% of target). If utilization is hovering near the target, you get stuck.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

1kubectl get hpa -n <namespace> -o wide — check TARGETS column; if it shows <unknown> or <unavailable>, metrics-server is the issue.
2kubectl describe hpa <name> -n <namespace> — look for Events at bottom; any FailedGetResourceMetric or FailedComputeMetricsReplicas tells you exactly where.
3kubectl top pods -n <namespace> — if this fails, metrics-server is not working; check pod logs: kubectl logs -n kube-system -l k8s-app=metrics-server.
4If using custom metrics: kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq '.' — verify the API returns data for the metric you're using.
5Check if containers have resource requests: kubectl describe pod <pod> -n <namespace> | grep -A 5 Requests — if missing, HPA cannot compute utilization.

( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

searchkubectl describe hpa <name> -n <namespace> — events and status conditions
searchkubectl logs -n kube-system -l k8s-app=metrics-server — metrics-server pod logs
searchkubectl get apiservice v1beta1.metrics.k8s.io -o yaml — check if the API service is available and points to a healthy endpoint
searchkubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 — for custom metrics API
searchkubectl describe pod <hpa-target-pod> — check resource requests under containers
searchkubectl logs -n kube-system <hpa-controller-pod> — if HPA controller itself logs (rare, but check if running with -v=4)

( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

warningMetrics-server not installed or not collecting metrics (check kubelet certificate or network policy)
warningContainers missing CPU/memory resource requests — HPA needs requests to calculate utilization percentage
warningTarget utilization set too high (e.g., 90%) so load never exceeds threshold (HPA has a 10% tolerance by default)
warningCustom metrics returning stale data or with wrong labels — HPA expects exact label match on pod or object
warningHPA targeting a Deployment that has a different label selector than the pods running
warningkubelet certificate rotation breaks metrics-server — metrics-server uses kubelet's /metrics/resource endpoint
warningResource quota or limit range prevents scaling — HPA tries to create pods but quota denies them, HPA backs off

( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

buildIf metrics-server missing: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml (add --kubelet-insecure-tls if needed for self-signed certs)
buildIf resource requests missing: add requests.cpu and requests.memory to your Deployment's container spec, then recreate pods
buildIf target utilization too high: lower it to 50-70% or use averageValue instead of averageUtilization for absolute metrics
buildIf custom metrics not matching: verify metric labels match the HPA's labelSelector. Use kubectl get --raw ... to examine metric labels
buildIf kubelet cert issues: update metrics-server deployment args with --kubelet-preferred-address-types=InternalIP and --kubelet-insecure-tls as workaround
buildIf quota blocking: kubectl describe quota -n <namespace> — check if quota is exhausted; increase quota or reduce replica count

( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

verifiedkubectl get hpa -w — watch the REPLICAS column update (may take up to 2-3 minutes due to cooldown)
verifiedGenerate load: kubectl run -i --tty load-generator --image=busybox -- sh -c 'while true; do wget -q -O- http://<service>; done'
verifiedkubectl describe hpa — check Events for 'SuccessfulRescale' message
verifiedkubectl top pods — verify metrics-server is returning fresh data for target pods
verifiedCheck HPA status: kubectl get hpa -o jsonpath='{.items[*].status.conditions[?(@.type=="ScalingActive")].status}' — should be True
verifiedIf custom metrics: kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/<ns>/pods/*/<metric> | jq '.items[0].metric'

( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

warningSetting HPA target utilization to 100% — impossible to exceed, so never scales up
warningForgetting to add resource requests to all containers in the pod — HPA uses sum of requests, missing one causes incorrect utilization
warningRelying solely on kubectl top without checking if metrics-server is healthy — it may return cached data
warningUsing averageUtilization with custom metrics that are not a percentage — use averageValue instead
warningNot accounting for HPA's cooldown (default 5 min scale up, 3 min scale down) — impatiently checking before 5 minutes leads to false negatives
warningApplying HPA to a StatefulSet without considering pod identity — HPA works but scaling down may delete persistent volumes

( 07 )War story

The 3 AM HPA Stalemate: When Metrics Go Silent

Platform EngineerKubernetes 1.26 on AWS EKS, metrics-server v0.6.3, custom Prometheus adapter

Timeline

02:15PagerDuty alert: payment-service latency > 2s, CPU at 80%, but HPA shows 3 replicas (desired 3, current 3)
02:18kubectl describe hpa payment-hpa -n payments shows 'Conditions: ScalingActive=True, AbleToScale=True, ScalingLimited=False' — no apparent error
02:22kubectl top pods -n payments shows CPU usage ~400m per pod, but requests are 500m — utilization 80% (target 80%)
02:27Check metrics-server logs: no errors, but notice metrics are 60 seconds stale
02:30kubectl get hpa payment-hpa -o yaml shows targetCPUUtilizationPercentage: 80, but behavior has stabilizationWindowSeconds: 300 for scale up
02:35Realize: stabilization window holds at 3 replicas because utilization is exactly at target — HPA avoids flapping
02:38Lower target to 70% in HPA spec — within 2 minutes, HPA scales to 5 replicas
02:42Latency drops to 200ms; create PR to add behavior section with shorter stabilization for scale-up

I got paged at 2 AM because payment latency spiked. The HPA looked healthy — no errors, scaling active — but it refused to go above 3 replicas even though CPU was pegged at 80%. My first instinct was to check metrics-server, but it was running fine. I saw that utilization was exactly 80%, which was the target. The HPA was stuck because it uses a default tolerance of 10% of target — but that only triggers when utilization exceeds 88% (80 * 1.1). We were sitting right at the boundary.

I then noticed the stabilization window: 5 minutes for scale-up. Even if utilization spiked above 88%, the HPA would wait 5 minutes before acting. But we never crossed the threshold. The fix was twofold: lower the target to 70% to give headroom, and customize the behavior section to reduce the stabilization window for scale-up to 60 seconds. That way, the HPA would respond faster to load spikes.

The real lesson: HPA's default behavior is conservative by design. The 10% tolerance and stabilization windows are there to prevent flapping, but they can also cause a stalemate when utilization hovers around the target. Always set your target lower than you think you need, and consider adding a behavior section for faster scale-up in latency-sensitive services.

Root cause

HPA target utilization set to exact same as observed utilization (~80%), combined with default 10% tolerance and 5-minute stabilization window, prevented any scale-up action.

The fix

Reduced targetCPUUtilizationPercentage from 80% to 70%, and added behavior.scaleUp.stabilizationWindowSeconds: 60.

The lesson

HPA tolerances and stabilization windows can create a dead zone around the target utilization. Always set target below expected peak utilization, and customize behavior sections for your workload's sensitivity.

( 08 )The Metrics Pipeline: Where Data Goes Missing

HPA doesn't fetch metrics directly from pods. It queries a metrics API — either the Resource Metrics API (metrics-server) for CPU/memory or the Custom Metrics API (e.g., Prometheus Adapter) for custom metrics. The chain is: kubelet exposes /metrics/resource/resource endpoint → metrics-server scrapes kubelets → HPA controller calls metrics-server's API. A break anywhere causes 'unknown' metrics.

Most common break: metrics-server cannot reach kubelets due to network policies, node firewall rules, or kubelet certificate issues. On EKS, kubelet uses self-signed certs by default; metrics-server needs --kubelet-insecure-tls or the proper CA. Check metrics-server logs for 'x509: certificate signed by unknown authority'. Also check kubelet logs on nodes for authentication errors.

( 09 )Resource Requests: The Hidden Prerequisite

HPA computes utilization as current_usage / resource_request * 100. If a container lacks a request, the HPA cannot calculate utilization for that container and will report 'missing request for ...' in events. This is especially tricky with sidecar containers — if the main app has requests but the sidecar doesn't, the HPA may skip the pod entirely or report partial metrics.

Fix: ensure every container in the pod has CPU and memory requests. You can set requests equal to limits if you need guaranteed QoS, but HPA only needs requests. Use kubectl describe pod to verify.

( 10 )Custom Metrics: Label Selectors and API Quirks

Custom metrics HPA requires exact label matching between the metric returned by the API and the pod/object you're targeting. If your metric has extra labels (e.g., instance, job), the HPA won't match. Use kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/<ns>/pods/*/<metric> to see the exact format. The metric must be associated with a specific pod (pod name) or object (e.g., deployment name).

Prometheus Adapter often needs configuration to drop unwanted labels. Use the 'seriesQuery' and 'resources' sections in adapter config to map metric labels to Kubernetes resources. Also ensure the adapter can reach Prometheus — check adapter logs for 'connection refused'.

( 11 )HPA Behavior and Stabilization Windows

Since Kubernetes 1.23, HPA behavior field allows customizing scale-up/down policies, stabilization windows, and select policies. Default stabilization window for scale-up is 0 seconds (immediate) but for scale-down it's 5 minutes. However, the tolerance is still 10% of target. This means if utilization is within 10% of target, the HPA does nothing.

To avoid stalemates, set behavior.scaleUp.stabilizationWindowSeconds to a lower value (e.g., 60) and consider using 'pods' or 'percent' policy for faster scaling. You can also set a different target utilization that leaves headroom.

( 12 )Resource Quotas and LimitRanges: Silent Blockers

When HPA attempts to scale up, it creates a new pod via the ReplicaSet. If the namespace has a ResourceQuota that prevents the pod creation (e.g., CPU quota exhausted), the pod will be pending or rejected. The HPA controller detects this and sets ScalingLimited condition to True. Check kubectl describe quota and kubectl get events for 'FailedCreate' or 'exceeded quota'.

LimitRanges can also cause issues if they set default requests that conflict with your pod spec. HPA uses the requests from the running pods, but if the limit range mutates them, the actual request may be different. Always verify the effective requests on the pod.

Frequently asked questions

Why does my HPA show 'unknown' for current CPU utilization even though metrics-server is running?

This usually means metrics-server can't reach the kubelet on the node where the pod is running. Check metrics-server logs for errors like 'dial tcp: lookup <node> on <dns>: no such host' or 'x509 certificate error'. Ensure metrics-server has the right --kubelet-preferred-address-types (use InternalIP) and --kubelet-insecure-tls if needed. Also verify network policies allow traffic to kubelet port (10250).

Can I use HPA with custom metrics that are not CPU or memory?

Yes, HPA supports custom metrics via the custom.metrics.k8s.io API and external metrics via external.metrics.k8s.io. You need a metrics adapter (like Prometheus Adapter, Datadog Cluster Agent, or Google Stackdriver) that exposes your metrics in the expected format. Be careful with label matching — the metric must be associated with the target object (pod or namespace).

How long does HPA take to scale after metrics exceed the target?

HPA controller runs every 15 seconds (by default). It reads metrics, calculates desired replicas, and then applies the result. However, there's a default cooldown: scale-up happens immediately (no stabilization window by default), but scale-down has a 5-minute stabilization window. You can customize this with the behavior field. Also, the 10% tolerance means it only acts when utilization deviates more than 10% from target.

What does 'missing request for ...' mean in HPA events?

It means one or more containers in the pods selected by the HPA do not have CPU or memory resource requests defined. HPA needs requests to compute utilization percentage. Add resource requests to the container spec in your Deployment or StatefulSet. Even if you set limits, you must set requests separately.

My HPA scales up but never scales down. Why?

Common reasons: (1) default scale-down stabilization window is 5 minutes — wait longer. (2) resource requests are too high, so utilization never drops below target. (3) HPA's behavior section has a 'select' policy that prevents scale-down. (4) There's a minimum replicas set too high. Check kubectl describe hpa for conditions: if ScalingActive is False, something is wrong. If True, check the 'AbleToScale' condition for 'ScalingDown' reason.

Kubernetes HPA Not Scaling: Diagnosing and Fixing Stuck Autoscalers

What this usually means

Frequently asked questions