What this usually means
Docker builds are slow because the layer cache is being invalidated prematurely. Each Dockerfile instruction creates a layer. If any instruction before a heavy operation (like apt-get install) changes, all subsequent layers must be rebuilt from scratch. The most common culprit is a COPY . . early in the Dockerfile that changes every build, forcing every later step to re-execute. Another cause is inconsistent build context or build args that differ between runs, causing cache misses on ARG or ENV instructions. Also, package managers like apt update daily but if your cache is invalidated, it re-downloads everything. The key is to structure your Dockerfile so that infrequently changed layers come first.
The first ten minutes — establish facts before touching code.
- 1Run docker build with --no-cache=false (default) and note which steps show 'Using cache' vs 'Step X/Y' without cache. Compare to a fresh build with --no-cache.
- 2Inspect the Dockerfile order: list all COPY and ADD instructions. Check if COPY . . appears before RUN apt-get update. If so, that's your bug.
- 3Use docker history --no-trunc <image> to see layer sizes. Look for layers that are unexpectedly large (e.g., 500 MB for a source copy).
- 4Add echo statements in RUN commands to see if they're cached: RUN echo "build-arg=$MY_ARG" && apt-get update. If the message changes but the step is cached, you have an ARG cache issue.
- 5Run docker system df to check build cache size. If it's >10 GB, you likely have many stale layers.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchDockerfile: check order of instructions, especially COPY vs RUN
- search.dockerignore: ensure it exists and excludes node_modules, .git, etc.
- searchBuild context directory: run 'du -sh .' to see if you're sending gigabytes to the daemon
- searchCI pipeline logs: look for 'Sending build context to Docker daemon' size
- searchdocker history output: identify layers that are larger than expected
- searchdocker build output: grep for 'Using cache' to see which steps are cached
- searchdocker system df output: see total build cache size
Practical causes, not theory. These are the things you will actually find.
- warningCOPY . . placed before apt-get install, causing cache invalidation on every source change
- warningNo .dockerignore file, so the entire project (including node_modules, .git) is sent as context
- warningBuild arguments (ARG) that change frequently, causing cache miss on subsequent RUN commands
- warningUsing ADD instead of COPY for local files (ADD has extra cache invalidation rules)
- warningRUN apt-get update without pinning versions, causing cache miss when apt cache expires
- warningMultiple RUN commands instead of chaining (each RUN creates a new layer that can be invalidated)
- warningInconsistent base image tags (using :latest instead of a specific digest)
Concrete fix directions. Pick the one that matches your root cause.
- buildReorder Dockerfile: put COPY package.json and RUN npm install/apt-get update before COPY . .
- buildUse a .dockerignore file to exclude unnecessary files from build context
- buildPin base image digest: FROM node:18-alpine@sha256:abc...
- buildChain RUN commands: RUN apt-get update && apt-get install -y ... && rm -rf /var/lib/apt/lists/*
- buildUse multi-stage builds to separate build dependencies from runtime
- buildLeverage buildkit cache mounts: RUN --mount=type=cache,target=/var/cache/apt apt-get update
A fix you cannot prove is a guess. Close the loop.
- verifiedRun docker build twice with no code changes; second build should use cache for all steps
- verifiedChange a single source file then rebuild; only layers after the COPY . . should rebuild, not apt-get install
- verifiedCompare build times before and after fix: expect >50% reduction
- verifiedInspect docker build output for 'Using cache' on expected steps
- verifiedCheck docker system df before and after; build cache size should stabilize
Things that make this bug worse or harder to find.
- warningAdding RUN apt-get update && apt-get install -y after COPY . . — always move it before
- warningForgetting to add .dockerignore; it's as important as the Dockerfile
- warningUsing ADD instead of COPY for local files — ADD has extra behavior like auto-extracting archives
- warningNot cleaning up apt lists in the same RUN layer — this bloats the image
- warningAssuming --no-cache=false is default; it is, but sometimes CI explicitly sets --no-cache
CI Pipeline Times Out After Every Commit
Timeline
- 09:15Deploy to staging fails due to timeout after 15 minutes
- 09:20Check CI log: 'Sending build context to Docker daemon 1.2GB' (project is only 50MB)
- 09:25Inspect Dockerfile: COPY . . is before RUN npm install
- 09:30Check .dockerignore: doesn't exist, node_modules and .git included
- 09:35Add .dockerignore with node_modules, .git, dist
- 09:40Reorganize Dockerfile: copy package.json first, run npm install, then copy rest
- 09:45Push fix; rebuild time drops from 12 min to 2 min
- 09:50Verify: second build with no code change uses cache for npm install
I was the on-call engineer when the CI pipeline started timing out after every commit. The build step for a Node.js microservice was taking 12+ minutes, pushing us over the 15-minute limit. I dove into the GitHub Actions logs and saw the first red flag: 'Sending build context to Docker daemon 1.2GB'. Our entire repo was only 50MB of TypeScript source. The daemon was receiving a massive context file including node_modules (500MB) and .git (600MB).
I looked at the Dockerfile and found the classic mistake: COPY . . was the second instruction, right before RUN npm install. Every commit changed a source file, invalidating the COPY layer, which forced npm install to re-run and re-download all dependencies. Even worse, there was no .dockerignore. The build context was huge, and the cache was basically useless.
I added a .dockerignore file excluding node_modules, .git, and dist. Then I reordered the Dockerfile: copy package.json and package-lock.json first, run npm install, then copy the rest of the source. This way, npm install only reruns when dependencies change. Build time dropped to 2 minutes, and the next build with no dependency change used cache. The lesson: always treat your Dockerfile like a caching hierarchy—stable layers first.
Root cause
COPY . . placed before RUN npm install, combined with no .dockerignore, causing massive build context and cache invalidation on every source change.
The fix
Added .dockerignore and reordered Dockerfile to copy package.json first, run npm install, then copy the rest.
The lesson
Structure your Dockerfile to maximize layer cache reuse: put infrequently changing instructions (package installs) before frequently changing ones (source code). Always include a .dockerignore.
Docker builds each instruction in a Dockerfile as a separate layer. When you run docker build, Docker checks if a cached layer exists for each instruction. The cache key is based on the instruction text, base image, and the hash of files copied in COPY/ADD. If any of those change, the layer is invalidated and all subsequent layers are rebuilt.
The critical insight: cache invalidation cascades. If COPY . . changes, every RUN instruction after it must re-execute, even if they don't depend on the copied files. That's why you often see apt-get update re-running on every build. To avoid this, you must order instructions from least to most likely to change.
Also note that the build context (the files sent to the Docker daemon) is part of the cache key for COPY. If you have a large context, sending it takes time and the hash changes often. Use .dockerignore to trim the context to only what's needed.
A common but subtle cause of cache invalidation is the ARG instruction. When you specify a build argument (e.g., --build-arg VERSION=1.2.3), it changes the cache key for that layer. If the ARG value differs between builds, the layer after it will miss the cache. This is often seen in CI where the commit SHA is passed as a build arg.
To avoid this, only use ARG for values that should affect the build (like version numbers). For metadata like commit SHA, use ENV instead (which doesn't invalidate cache as aggressively, though it still affects layers that reference it). Better yet, use labels for metadata: LABEL commit="$(git rev-parse HEAD)".
Multi-stage builds help reduce final image size but can also improve cache efficiency. By separating the build environment from the runtime, you can cache heavy build dependencies in an intermediate stage. For example, in a Go project: first stage compiles the binary, second stage copies only the binary. The first stage's layers are cached independently.
Docker BuildKit introduces cache mounts (--mount=type=cache). These allow you to persist package manager caches across builds. For example: RUN --mount=type=cache,target=/var/cache/apt apt-get update && apt-get install -y python3. The cache mount is not part of the layer, so it doesn't bloat the image, but it speeds up subsequent builds significantly. This is especially useful for apt, npm, and pip caches.
Frequently asked questions
Why does my apt-get update run every build even if I haven't changed the Dockerfile?
This usually happens because a COPY . . instruction before the RUN apt-get update invalidates the cache. Check your Dockerfile order: ensure COPY . . comes after all RUN commands that install packages. Also verify you have a .dockerignore to avoid sending unnecessary files that change the COPY hash.
How do I check which layers are cached in Docker?
Run 'docker build .' and look for 'Using cache' in the output. Each step will show either 'Using cache' or the command output. You can also use 'docker history --no-trunc <image>' to see the layers and their sizes. For a deeper inspection, enable BuildKit with 'DOCKER_BUILDKIT=1 docker build .' and use '--progress=plain' to see more details.
Does the order of ARG and ENV matter for caching?
Yes. ARG and ENV instructions affect the cache key of the layer they are defined in and all subsequent layers. If you have an ARG that changes every build (like a build number), place it as late as possible in the Dockerfile, ideally just before the instructions that actually need it. ENV also invalidates cache if its value changes.
What is the difference between ADD and COPY for caching?
COPY simply copies files and its cache key is based on the file hash. ADD does the same but also handles URLs and auto-extracts tar archives. This extra functionality means Docker checks more conditions for cache invalidation, making ADD more likely to miss cache. Use COPY for local files unless you need the auto-extraction.
Can I speed up npm install caching in Docker?
Yes. Copy package.json and package-lock.json first, then run npm install. This way, npm install only re-runs when dependencies change. Also, use a cache mount: 'RUN --mount=type=cache,target=/root/.npm npm install'. This caches the npm cache across builds, speeding up installs even when cache invalidation happens.