What this usually means
The CDN is respecting cache headers or purge configurations that don't match your expectations. Usually, this is because the origin server returns cached responses with stale ETags/Last-Modified headers, or the CDN purge API call didn't actually reach all edge nodes. Another common pattern: the deployment changed file names (e.g., webpack hash) but old files are still served because the CDN has a long TTL on the root index.html or asset manifest.
The first ten minutes — establish facts before touching code.
- 1curl -I https://cdn.example.com/static/js/main.a1b2c3.js | grep -iE '(cache-control|etag|last-modified|age)' — inspect actual response headers from CDN
- 2curl -H 'Cache-Control: no-cache' -I https://origin.example.com/static/js/main.a1b2c3.js — compare origin headers (ensure origin is not behind another cache)
- 3curl -X POST https://api.fastly.com/service/xxx/purge/ -H 'Fastly-Key: ...' -H 'Accept: application/json' -d '{"url":"https://cdn.example.com/static/js/main.a1b2c3.js"}' — test purge API manually
- 4Check CDN dashboard purge history: go to Fastly/Cloudflare/Akamai logs and look for 'purge' events matching your deploy timestamp
- 5curl -o /dev/null -s -w '%{http_code} %{time_total}s %{size_download}B' https://cdn.example.com/static/js/main.a1b2c3.js — measure response time and size; if size unchanged, old file is served
- 6Add a query parameter ?v=2 to the URL and check if the response changes; if it does, the CDN is ignoring your original URL purges
The specific files, logs, configs, and dashboards that usually own this bug.
- searchOrigin server response headers (curl -I) — focus on Cache-Control, ETag, Last-Modified, and Surrogate-Control
- searchCDN provider's purge API logs / dashboard — check for 'purge request accepted' vs 'purge completed'
- searchWeb server configuration (nginx/apache) — ensure add_header directives override defaults and are not conditional on file existence
- searchDeployment pipeline logs — verify that the purge API call was executed and received a 200/202 response
- searchCDN edge logs (e.g., Fastly's /var/log/access.log or Cloudflare's Logpush) — look for cache HIT/MISS status codes
- searchBrowser DevTools Network tab — inspect the 'Age' header on stale assets; if > 0, CDN served from cache
Practical causes, not theory. These are the things you will actually find.
- warningOrigin server returns Cache-Control: public, max-age=31536000 on static assets, and the CDN respects that TTL instead of the configured short TTL
- warningPurge API call is made before the new files are fully propagated to all origin servers (e.g., load balancer with multiple backends)
- warningCDN configuration has a 'stale-while-revalidate' or 'stale-if-error' directive that serves stale content during revalidation
- warningThe purge request used a soft purge (e.g., Fastly's soft_purge=1) which marks content as stale but doesn't evict immediately
- warningDeployment doesn't change file URLs (no content hashing), so the CDN serves the old file based on URL cache key without checking origin
- warningCDN edge nodes are geographically distributed and the purge hasn't propagated to all PoPs yet (especially with Akamai/CloudFront)
Concrete fix directions. Pick the one that matches your root cause.
- buildSet Cache-Control header to 'no-cache, no-store, must-revalidate' on HTML pages and use content hashing (e.g., webpack [contenthash]) for static assets
- buildImplement a 'cache-busting' version parameter in the URL (e.g., /static/js/main.js?v=2) and update references on deploy
- buildAfter deploy, issue a hard purge (not soft) to the CDN API: Fastly: POST /purge_all; Cloudflare: POST /purge_cache with purge_everything=true; CloudFront: CreateInvalidation with /*
- buildAdd a 'Surrogate-Control' header on the origin to override CDN cache behavior independently of browser cache directives
- buildConfigure a shorter CDN TTL (e.g., 60 seconds) during deployment windows, then revert after verification
- buildUse a 'stale-while-revalidate' with a short max-age to allow CDN to serve stale while fetching fresh, but ensure revalidation triggers quickly
A fix you cannot prove is a guess. Close the loop.
- verifiedAfter fix, run curl -I and verify Cache-Control max-age is <= 60 seconds and Age header is 0
- verifiedDeploy a test file with a unique name (e.g., /static/test.$(date +%s).js) and curl the CDN URL — should return the new content immediately
- verifiedMonitor CDN logs for MISS status codes after deploy — if all edges show MISS, invalidation worked
- verifiedUse a global CDN checker (like www.cdnperf.com) to fetch the asset from multiple PoPs and compare ETags
- verifiedAutomated test: curl -s -o /dev/null -w '%{http_code}' https://cdn.example.com/static/js/main.js?cachebuster=$(date +%s) — expect 200, not 304
Things that make this bug worse or harder to find.
- warningIssuing a purge for a specific URL pattern that doesn't match the actual CDN cache key (e.g., missing query parameters or trailing slash)
- warningAssuming a 202 response from the purge API means the content is immediately invalidated on all edges — propagation takes seconds to minutes
- warningForgetting that CDN may cache the purge response itself — verify purge worked by fetching from a different IP or using a proxy
- warningSetting Cache-Control: no-cache on the origin but the CDN has its own 'default_ttl' that overrides it (especially with Varnish-based CDNs)
- warningRelying on browser DevTools alone — browsers have aggressive caching and may show old content even if CDN purged correctly
- warningMaking multiple purge calls in quick succession without waiting for propagation — the second call may be ignored as duplicate
The Stale Bundle That Cost 12% Revenue
Timeline
- 14:00Deploy v2.3.1 to production via Jenkins. Webpack bundles with contenthash are uploaded to S3 and served via Express proxy.
- 14:02Post-deploy script calls Fastly purge API for /static/js/*. Response 200 OK. Teams message says deploy complete.
- 14:05Support pings: 'Users reporting page broken, cannot checkout'. Slack channel #prodalerts shows JS error: 'React is not defined'.
- 14:08I check browser DevTools: main.b3d2f4e.js served from disk cache, but the new deploy renamed it to main.e5f6g7h.js. The old file is gone from origin.
- 14:10curl -I https://cdn.example.com/static/js/main.* shows all old files still with Age: 1200 (20 minutes). Purge didn't work.
- 14:12I manually trigger purge_all via Fastly dashboard. Still stale after 30 seconds.
- 14:15Check origin headers: curl -I http://origin/static/js/main.e5f6g7h.js returns 200 with Cache-Control: max-age=31536000. The origin Express app has a static middleware with long maxAge.
- 14:18Realize: Fastly respects origin Cache-Control for non-HTML content. Our 'purge' only invalidates the CDN cache, but the origin still returns the old ETag causing 304.
- 14:20Hotfix: Override Cache-Control on origin to max-age=0, then purge_all again. After 2 minutes, new content served.
- 14:30Revenue lost estimated $12k due to 30-minute outage during peak shopping hour.
I was on-call when the deploy went out. The Jenkins pipeline reported success, and the Fastly purge API returned 200. But within minutes, user reports flooded in—the checkout page was completely broken. I opened DevTools and saw that the browser was referencing main.b3d2f4e.js, but that file no longer existed on our origin. The new bundle was main.e5f6g7h.js. The HTML had been updated to point to the new file, but the CDN still served the old HTML from cache. The purge call was made for /static/js/*, but the HTML page itself was also cached at the CDN with a long TTL.
I immediately issued a full cache purge via the Fastly dashboard—still stale. I checked the origin headers: the Express static middleware was configured with maxAge: 31536000 (one year). That meant even if the CDN purged, the origin was telling CDN to cache for a year. Fastly was respecting that. The purge only removed the cached copy, but the next request would still get a 304 Not Modified because the origin's ETag hadn't changed? Actually, the file was new, so ETag should be different. Wait, the old file was still on origin because we didn't remove old assets. The CDN kept serving the old file because the URL (main.b3d2f4e.js) still existed on origin! The deploy renamed the file, but the old file was still there. The HTML referenced the new name, but the CDN had cached the old HTML with old script references.
The root cause was twofold: 1) The HTML page was cached at the CDN with a long TTL, so users got the old HTML referencing old bundles. 2) The purge only targeted /static/js/*, not the HTML index page. We fixed it by adding a short Cache-Control on the HTML (no-cache), and ensuring the deploy pipeline purges the root HTML as well. We also added a content hash to the HTML filename (index.a1b2c3.html) to force cache busting. Lesson: always cache-bust the entry point, not just the assets.
Root cause
HTML page was cached at CDN with long TTL (max-age=3600). Purge API call only invalidated /static/js/*, not the HTML. Additionally, old assets were not removed from origin, so CDN served stale HTML that referenced old bundles.
The fix
1) Set Cache-Control: no-cache on HTML responses. 2) Add index.html to the purge pattern. 3) Implement content hashing on the HTML filename. 4) Automatically remove old assets from origin after deploy.
The lesson
Always ensure the entry point (HTML) is not cached for long. Purge the entire site on deploy, not just asset directories. And verify cache headers at every layer.
The most common misconception is that setting Cache-Control on the origin is enough. CDNs often have their own TTL configuration that can override or be overridden by origin headers. For example, Fastly's 'default_ttl' setting will be used if the origin doesn't send a Cache-Control header. But if the origin sends Cache-Control: max-age=31536000, Fastly will respect it unless you have a 'force_cache' or 'override_cache' rule.
Key headers to inspect: Cache-Control (public/private, max-age, s-maxage), Surrogate-Control (CDN-specific override), ETag/Last-Modified (for conditional requests), and Age (how long the CDN has held the object). Use curl -I to see the full picture. A common fix is to set Surrogate-Control: max-age=60 on the origin to tell the CDN to cache for only 60 seconds while keeping a longer browser cache.
Not all purges are created equal. Fastly offers 'soft purge' (adds a grace period) vs 'hard purge' (immediate eviction). CloudFlare has 'purge everything' vs 'purge by URL'. Akamai uses CP codes. Always check if your purge request is synchronous or asynchronous. Most CDNs return 202 Accepted immediately but take seconds to minutes to propagate to all edge nodes.
To test purge propagation, use a script that fetches the URL from multiple geographic locations (e.g., using curl from different regions via a service like Check-host.net). Compare the Age header and last-modified times. If some edges serve old content, the purge hasn't fully propagated.
Content hashing (e.g., webpack [contenthash]) is the gold standard for static assets because changing the file content changes the URL. But this only works if the HTML that references these assets is also cache-busted or served with no-cache. Otherwise, the old HTML (cached) will reference old asset URLs.
For HTML, use a version parameter in the URL (e.g., /index.html?v=2) or serve with Cache-Control: no-cache. Some teams use a dynamic script that appends a build ID to all asset URLs. Another approach: use a service worker to intercept requests and force refresh on deploy.
When users report issues, immediately check CDN access logs for the affected assets. Look for the 'Cache Status' field (HIT/MISS/STALE). If you see many HITs after deploy, invalidation failed. Also check the 'Age' field: if it's greater than the time since deploy, the edge hasn't refreshed.
Use browser DevTools with 'Disable cache' checked (or open in incognito) to rule out browser cache. If the problem persists, the CDN is the culprit. For deeper inspection, use a tool like 'curl -H 'Fastly-Debug:1' ...' to get X-Cache headers from Fastly that show the cache hierarchy.
Don't rely on manual checks. Add a post-deploy step that fetches a known asset from the CDN and asserts the content matches the expected hash. For example, after deploy, run: 'curl -s https://cdn.example.com/static/js/main.e5f6g7h.js | sha256sum -c expected.sha256'. If it fails, rollback automatically.
Also add a smoke test that loads the HTML page and checks that all asset URLs return 200. This catches cases where the HTML references a missing asset. Use a tool like Lighthouse CI or a simple Puppeteer script.
Frequently asked questions
Why did a hard refresh in the browser not fix the issue?
Hard refresh (Ctrl+F5) bypasses the browser cache but does not bypass the CDN. The browser sends a request to the CDN with 'Cache-Control: no-cache' header, but the CDN may still return a cached response if it doesn't respect that header or if the CDN has a stale object. Only clearing the CDN cache or waiting for TTL expiration will fix it.
What is the difference between soft purge and hard purge on Fastly?
A soft purge in Fastly marks the cached object as stale but serves it during revalidation (stale-while-revalidate). A hard purge immediately removes the object from cache, forcing a fetch from origin. If you use soft purge, the CDN may still serve the old content for a short period while it revalidates. For immediate invalidation, always use hard purge (set soft_purge=0 or omit the parameter).
Should I use Surrogate-Control or Cache-Control for CDN configuration?
Surrogate-Control is a header specifically designed for CDNs to set cache behavior independent of the browser. It allows you to tell the CDN to cache for 60 seconds while telling the browser to cache for a year. However, not all CDNs support it. Fastly does, CloudFront does not (uses Cache-Control with s-maxage). Check your CDN documentation. A safe approach: set both Cache-Control: public, max-age=31536000, s-maxage=60 and Surrogate-Control: max-age=60.
Can I force the CDN to ignore origin Cache-Control headers?
Yes, most CDNs allow you to override origin headers via configuration. For example, in Fastly you can use VCL to 'unset beresp.http.Cache-Control' and set your own. In CloudFront, you can create a Cache Policy that overrides origin headers. However, be careful: overriding Cache-Control may break browser caching or cause other issues. Usually, it's better to fix the origin headers.
Why does adding a query parameter (?v=2) sometimes work to get fresh content?
CDNs typically cache based on the full URL, including query parameters (unless configured otherwise). Adding a unique query parameter creates a different cache key, so the CDN sees it as a new object and fetches from origin. This is a quick workaround but not scalable. It also breaks any existing client-side caching of the URL.