Docker Healthcheck Always Failing: Debugging Guide

What this usually means

The healthcheck command itself is failing, not the application. Common reasons: the CMD is not a proper shell command (e.g., using 'curl' without the full path or without the shell form), the command requires a dependency not installed in the container (e.g., curl missing in Alpine), the healthcheck runs before the application is ready (race condition), or the command is timing out due to a slow startup. Also, the healthcheck may be checking a port that is not yet listening, or the check is too aggressive (interval too short, retries too few). Another subtle cause: the healthcheck uses a TCP check but the application only listens on a specific interface (e.g., 127.0.0.1 instead of 0.0.0.0).

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

1Run `docker inspect <container> | jq '.[].State.Health'` to see the exact failure logs and exit codes
2Exec into the container and run the exact healthcheck command manually: `docker exec -it <container> /bin/sh -c "<your-command>"`
3Check if the required tools (curl, wget, netcat) are installed: `docker exec <container> which curl`
4Verify the healthcheck command syntax in the Dockerfile or docker-compose.yml: ensure it uses CMD-SHELL or exec form correctly
5Look at the application logs around the time healthcheck fails: `docker logs <container> --tail 50`
6Check the container's entrypoint script: if it runs a long initialization, the healthcheck may trigger before the app is ready

( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

searchDockerfile: HEALTHCHECK instruction line
searchdocker-compose.yml: healthcheck section under a service
searchContainer logs: `docker logs <container>`
searchDocker inspect output: `docker inspect <container> | jq '.[].State.Health'`
searchApplication entrypoint script (e.g., entrypoint.sh, docker-entrypoint.sh)
searchSystem logs on the host: `journalctl -u docker.service` if healthcheck causes container restarts
searchNetwork namespace: if healthcheck uses TCP, verify the port is listening on 0.0.0.0: `docker exec <container> ss -tlnp`

( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

warningHealthcheck command uses shell form incorrectly: e.g., `CMD curl ...` instead of `CMD-SHELL curl ...` or `["curl","..."]`
warningMissing dependencies: e.g., `curl` not installed in Alpine-based images
warningRace condition: healthcheck starts before the application is ready (especially with long initialization)
warningHealthcheck check interval too short: application needs more time to start
warningHealthcheck checks a port that is only listening on localhost (127.0.0.1) instead of 0.0.0.0
warningHealthcheck command returns exit code 0 but the check expects a different condition (e.g., HTTP 200 vs 503)

( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

buildUse the exec form of HEALTHCHECK: `HEALTHCHECK --interval=30s --timeout=5s --retries=3 CMD ["curl", "-f", "http://localhost:8080/health"]`
buildInstall required tools in Dockerfile: `RUN apt-get update && apt-get install -y curl` or `RUN apk add --no-cache curl`
buildAdd a startup delay: use `--start-period=30s` in the HEALTHCHECK instruction to give the app time to initialize
buildIf using a custom entrypoint, ensure the healthcheck does not run before the entrypoint completes (e.g., use a flag file)
buildChange the healthcheck to use a simpler method: e.g., `CMD ["stat", "/tmp/healthy"]` if the app creates a file when ready
buildFor TCP checks, use `CMD ["nc", "-z", "localhost", "8080"]` and ensure netcat is installed

( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

verifiedAfter fix, rebuild and run the container, then watch `docker ps` until it shows 'healthy'
verifiedRun `docker inspect <container> | jq '.[].State.Health'` to see the new healthcheck logs and confirm no failures
verifiedSimulate a failure: temporarily break the app and confirm healthcheck turns 'unhealthy'
verifiedCheck the healthcheck history: `docker inspect --format='{{json .State.Health}}' <container>`
verifiedTest the command manually inside the container to ensure it returns exit code 0

( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

warningDon't use `curl` without `-f` flag: it returns success on 404 unless -f is present
warningDon't set interval too low (e.g., 1s) – it can cause unnecessary load and false failures
warningDon't ignore the start-period: without it, healthcheck may fail during initialization and be marked unhealthy permanently
warningDon't use `CMD` in the healthcheck instruction incorrectly: `HEALTHCHECK CMD curl ...` is wrong; use `CMD-SHELL` or exec form
warningDon't assume `curl` is present in minimal images (Alpine, distroless) – install it or use an alternative like `wget` or a custom script
warningDon't set retries too low (e.g., 1) – transient failures can cause unnecessary restarts

( 07 )War story

Healthcheck fails on Alpine-based Node.js container

DevOps EngineerDocker 20.10, Node.js 14, Alpine 3.13, Docker Compose 1.29

Timeline

09:00Deploy new version of Node.js microservice to staging
09:02Container shows 'unhealthy' in `docker ps`
09:05Inspect health log: 'exit code 1' from curl
09:07Exec into container: `curl http://localhost:3000/health` works fine
09:10Check Dockerfile: HEALTHCHECK uses `CMD curl -f http://localhost:3000/health`
09:12Check if curl is installed: `which curl` returns nothing
09:15Install curl in Dockerfile and rebuild
09:18New container shows 'healthy' after start period

I had just pushed a new version of our Node.js microservice. The build passed, the image was pushed, and the staging environment picked it up. Within seconds, the container was showing 'unhealthy' in `docker ps`. The service was critical for our API gateway, so alarms started firing.

I ran `docker inspect <container> | jq '.[].State.Health'` and saw the healthcheck log: exit code 1. The command was `curl -f http://localhost:3000/health`. I exec'd into the container and ran the same command manually – it returned 0 and the JSON response was fine. So why was the healthcheck failing?

Then I checked the Dockerfile. The HEALTHCHECK instruction used the shell form: `HEALTHCHECK --interval=30s CMD curl -f http://localhost:3000/health`. But this was an Alpine-based image, and Alpine doesn't include curl by default. The shell form runs the command via `/bin/sh -c 'curl ...'` which fails because curl is missing. The manual exec worked because my shell had a different environment? No, actually the manual exec used `/bin/sh -c 'curl ...'` too, but wait – I ran `docker exec -it <container> curl ...` which runs the command directly, not through a shell. The healthcheck ran as `CMD curl ...` which is actually the exec form in the Dockerfile? No, I had `CMD curl` which is the shell form in HEALTHCHECK context – it runs as `/bin/sh -c 'curl ...'` and curl is not found. The fix: add `RUN apk add --no-cache curl` to the Dockerfile or change the healthcheck to use wget or a Node.js script. I added curl, rebuilt, and the container became healthy.

Root cause

The healthcheck command used `curl` but the base image (Alpine) did not have curl installed. The shell form of HEALTHCHECK runs the command via `/bin/sh -c`, which failed because curl was missing.

The fix

Added `RUN apk add --no-cache curl` to the Dockerfile before the HEALTHCHECK instruction. Alternatively, could have used `wget` or a Node.js script to check the endpoint.

The lesson

Always verify that the tools used in healthcheck commands are present in the final image. Minimal images like Alpine often lack common utilities. Also, prefer the exec form of HEALTHCHECK to avoid shell interpretation issues.

( 08 )Understanding HEALTHCHECK Syntax and Execution

Docker HEALTHCHECK has two forms: the shell form (`CMD command args`) and the exec form (`CMD ["executable", "arg"]`). The shell form runs the command via `/bin/sh -c`, which requires the shell to exist and the command to be in the PATH. The exec form runs the command directly, without shell parsing. This is critical: if your container lacks a shell (e.g., distroless images) or the command is not found, the exec form will fail immediately.

The healthcheck is executed as a separate process inside the container. Its exit code determines the health status: 0 = healthy, 1 = unhealthy, 2 = reserved (not used). The healthcheck can also be a TCP check if you use `CMD ["nc", "-z", "localhost", "port"]` or a custom script. Also, note the `--start-period`: this delays the first healthcheck to give the application time to initialize. Without it, the healthcheck may run before the app is ready and mark the container unhealthy permanently.

( 09 )Common Pitfalls with Shell Form and Environment Variables

When using the shell form, environment variables are expanded because the command runs through a shell. This can be helpful but also dangerous: if the variable is not set, the command may become syntactically incorrect. For example, `CMD curl -f $HEALTHCHECK_URL` will fail if `HEALTHCHECK_URL` is empty. The exec form does not expand variables, so you must pass them explicitly or use a shell wrapper.

Another issue is the working directory. The healthcheck command runs in the container's WORKDIR, which may not be where you expect. If your healthcheck script uses relative paths, they may fail. Always use absolute paths or ensure the working directory is correct.

( 11 )Healthcheck and Container Lifecycle

The healthcheck runs continuously at the specified interval. If the healthcheck itself causes side effects (e.g., creating logs, consuming memory), it can degrade performance. Avoid heavy healthchecks that do complex computations. Also, if the healthcheck command hangs, it will timeout after `--timeout` (default 30s), and that counts as a failure.

Container orchestrators like Kubernetes use liveness probes, which are similar but have different defaults. When porting healthchecks to Kubernetes, remember that the probe's success threshold defaults to 1, and failure threshold defaults to 3. Also, Kubernetes supports three types: HTTP, TCP, and Exec. Docker's healthcheck is equivalent to an Exec probe.

( 12 )Debugging with Docker Inspect and Logs

To get the full healthcheck history, use: `docker inspect --format='{{json .State.Health}}' <container>`. This shows a list of `Log` entries with `Start`, `End`, `ExitCode`, and `Output`. The output is truncated to 4096 bytes. If you need more, check the container logs: `docker logs <container>`.

Another useful command: `docker events --filter 'event=health_status'` to stream health status changes in real time. This helps correlate healthcheck failures with other container events.

Frequently asked questions

Why does my healthcheck pass when I exec into the container but fail when Docker runs it?

This usually happens because the healthcheck command is run in a different environment. When you exec, you run the command directly (or via a shell), but the healthcheck uses the form specified in the Dockerfile. If you used the shell form, it runs via `/bin/sh -c`. The PATH or environment variables might differ. Also, the healthcheck may run before the application is fully started – exec manually is done after the container is already running. Check the healthcheck's start period and ensure the command is correct.

Can I use a script as a healthcheck command?

Yes, you can. For example, `HEALTHCHECK CMD /usr/local/bin/healthcheck.sh`. Ensure the script is present in the image and has execute permissions. Use the exec form: `HEALTHCHECK CMD ["/usr/local/bin/healthcheck.sh"]` to avoid shell issues. The script should exit with 0 for healthy, 1 for unhealthy.

What is the difference between HEALTHCHECK and restart policies?

HEALTHCHECK only monitors the health status and exposes it via the container state. It does not automatically restart the container. Restart policies (e.g., `--restart=always`) restart the container if it exits, but they don't consider health. To restart on unhealthy, you need an external orchestrator like Docker Compose with `restart: always` combined with `depends_on` conditions, or use Docker Swarm/Kubernetes which can restart unhealthy containers.

How do I set environment variables for the healthcheck command?

If using the shell form, environment variables are automatically available. If using the exec form, you must set them explicitly via the Dockerfile's ENV, or use a wrapper script that sources the environment. Alternatively, you can use the shell form to expand variables. Example: `HEALTHCHECK CMD curl -f http://$HOST:8080/health`.

My healthcheck fails with 'timeout' but the command runs quickly when I exec. Why?

The default timeout for healthcheck is 30 seconds. If your command takes longer, it will timeout. But if it's quick manually, the timeout might be caused by a deadlock or the command waiting for a resource. Also, check if the healthcheck command is hitting a different endpoint or using a different DNS resolution. Use `--timeout=10s` to set a lower timeout and see if the command completes.

Docker Healthcheck Always Failing: Diagnosis and Fixes

What this usually means

Frequently asked questions