LEARN · DEBUGGING GUIDE

Debugging JavaScript Garbage Collection Pauses That Kill Responsiveness

Long garbage collection pauses can freeze your application for seconds. This guide shows exactly how to detect, diagnose, and eliminate them using real-world tools and techniques.

AdvancedMemory8 min read

What this usually means

JavaScript engines (V8, SpiderMonkey, JavaScriptCore) use a generational garbage collector. The young generation scavenges frequently with low pause times. The problem is the old generation: when it requires a major (full) GC, the engine performs a 'stop-the-world' pause that can last hundreds of milliseconds to seconds. The pause duration scales with the size of the old generation and the number of live objects that must be marked and swept. Common triggers: accumulating too many long-lived objects, large arrays or Maps that are never cleared, or excessive allocation rate that forces the GC to reclaim memory aggressively. In Node.js, a common cause is leaking object references in caches or event listeners that prevent collection.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

  • 1Record a performance trace in Chrome DevTools (Performance tab): click record, reproduce the stutter, stop. Look for a long yellow 'GC' or 'Major GC' bar. Note its duration (e.g., 400ms).
  • 2Run Node.js with --trace-gc: `node --trace-gc app.js > gc.log`. Parse the output: `grep 'Mark-sweep' gc.log | awk '{print $1,$4,$5,$13}'` to see pause times and heap sizes.
  • 3Take a heap snapshot before and after a pause event: in Chrome DevTools Memory tab, compare snapshots to find objects that accumulate between GCs.
  • 4Use `process.memoryUsage()` in Node.js to log heapUsed every 100ms: `setInterval(() => console.log(JSON.stringify(process.memoryUsage())), 100)`. Plot heapUsed vs time to see sawtooth pattern.
  • 5Check for large retained objects in the 'Containment' view of a heap snapshot: filter by shallow size > 1MB, look for arrays, Maps, or custom objects that are retained by the global object or event listeners.
  • 6Profile allocation in Node with `--prof` and `--prof-process`: `node --prof --trace-gc app.js` then `node --prof-process isolate-*.log > prof.txt`. Look for functions with high 'bytes' or 'ticks' in GC-related C++ entries.
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • searchChrome DevTools Performance recording — examine the 'Main' thread for long 'GC' tasks
  • searchChrome DevTools Memory tab — heap snapshots, allocation instrumentation, and 'GC' events in timeline
  • searchNode.js GC log (via --trace-gc) — parse with grep/awk to find major GC durations and heap sizes
  • searchNode.js process memory usage — logged via `process.memoryUsage()` to see heapUsed pattern
  • searchSource code: look for global caches (Map, Set, arrays), closures retaining large objects, and event listeners not cleaned up
  • searchDependency heap snapshots (e.g., via `heapdump` npm module) — take when memory peaks
  • searchApplication logs near pause times — correlate with GC events from --trace-gc to see if pauses cause request timeouts
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningGlobal cache (e.g., `const cache = new Map()`) that never evicts entries, growing unbounded over time
  • warningLarge objects like arrays or strings that are kept alive by a closure or a global variable
  • warningHigh allocation rate causing frequent young generation scavenges that promote objects to old space prematurely
  • warningThird-party libraries that retain large data structures (e.g., ORM query results, logging buffers)
  • warningEvent listeners attached to global objects (e.g., `window`, `process`) that are never removed, keeping a large subtree reachable
  • warningUse of `eval` or `new Function` that prevents V8's compiler from optimizing and may cause code bloat in old space
  • warningLarge JSON.parse results that are stored in a variable and never dereferenced
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildImplement cache eviction: use LRU cache (e.g., `lru-cache` npm package) with a max size and TTL
  • buildNullify references to large objects when done: `largeData = null;` so they become unreachable
  • buildAvoid keeping large data in global scope; use local variables or weak references (WeakMap, WeakSet) when possible
  • buildReduce allocation rate by reusing objects (object pooling) or using typed arrays for numeric data
  • buildDetach event listeners with `removeEventListener` or use `AbortController` to cancel them
  • buildIncrease Node.js max-old-space-size to defer GC: `node --max-old-space-size=4096 app.js` — but this is a band-aid, not a fix
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedRe-run Chrome DevTools performance recording after fix: confirm no GC bars longer than 16ms
  • verifiedRun Node.js with --trace-gc and grep 'Mark-sweep': verify pause times are now under 50ms
  • verifiedMonitor heapUsed vs time: should no longer show a sawtooth pattern with steep drops
  • verifiedLoad test with artillery or k6: compare p99 latency before and after fix (should drop from >500ms to <100ms)
  • verifiedTake heap snapshot after fix: confirm old generation size is stable and not growing unbounded
  • verifiedRun for 1 hour with production traffic: verify memory usage plateaus (e.g., around 300MB) instead of climbing
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningBlindly increasing max-old-space-size without understanding root cause — delays the problem, doesn't fix it
  • warningUsing global variables for caching without eviction — most common mistake
  • warningIgnoring young generation GCs — frequent minor GCs can also cause jank if they take >16ms (usually not, but check)
  • warningAssuming the garbage collector is the enemy — often it's your code that forces it to work hard
  • warningNot profiling in production — GC behavior differs under load; reproduce with realistic traffic
  • warningFixing the symptom (pause) by adding more memory — memory is finite, leak will eventually crash
( 07 )War story

30-second response time spikes in a Node.js chat server due to GC pauses

Backend EngineerNode.js 14, Express, MongoDB, Redis, deployed on AWS EC2 (c5.large)

Timeline

  1. 10:00Deploy new chat server that stores message history in a global Map for quick access
  2. 10:15PagerDuty alerts for p99 latency > 2s on /messages endpoint
  3. 10:20Check New Relic: p50 latency 50ms, p99 2.5s, spikes every ~10 seconds
  4. 10:25SSH into instance, run `top`: CPU ~100% every 10s for 1-2 seconds
  5. 10:30Run `node --trace-gc app.js` on staging with same traffic
  6. 10:35GC log shows Mark-sweep pauses of 800ms-1.2s, old space size 600MB
  7. 10:40Take heap snapshot with heapdump, find a Map with 500k entries retained by module global
  8. 10:45Replace Map with lru-cache (max 10000 entries, ttl 5 min)
  9. 10:50Deploy fix, latency spikes drop to <100ms p99, no more alerts

We had a global Map in a chat module that stored every message sent in the last hour. The map grew to hundreds of thousands of entries. Every 10 seconds, V8's old generation GC would mark and sweep this huge map, freezing the event loop for nearly a second. Our p99 latency shot up because requests queued behind the pause.

At first I suspected a database issue or a slow third-party API. But after seeing the CPU spike pattern and the GC trace, I knew it was a memory problem. The heap snapshot confirmed: 80% of the old space was that single Map. The fix was to limit the cache size and add a TTL. I used the lru-cache package with a max of 10,000 entries and a 5-minute TTL.

After deploying, the GC pauses dropped to under 30ms, and p99 latency went back to 80ms. The lesson: never use a global Map as an unbounded cache. Always add eviction. Also, monitor heap size trends — if it keeps growing, you've got a leak or a cache that needs limits.

Root cause

Unbounded global Map retained millions of message objects, causing major GC pauses of 800+ms.

The fix

Replaced the global Map with an LRU cache capped at 10,000 entries with a 5-minute TTL.

The lesson

Always limit data structures that can grow unbounded. Use heap snapshots to find large retainers. Profile before you optimize.

( 08 )Understanding V8's Generational GC and Stop-the-World Pauses

V8 divides heap into young (new space) and old (old space) generations. Young generation uses a semi-space scavenge algorithm: two small spaces of equal size (typically 1-8 MB each). Objects are allocated in the active semi-space; when it's full, live objects are copied to the other semi-space (promoted to old space if they survive two collections). This is fast (usually <5 ms) because only a fraction of objects are live.

Old generation uses mark-sweep (major GC). It marks all live objects starting from roots (global, stack, etc.), then sweeps dead ones. Marking is done incrementally in V8 (incremental marking) to reduce max pause, but it still can pause for hundreds of ms if the object graph is large. Concurrent marking was introduced in V8 8.0+, but it only helps if the marking phase is concurrent; sweeping is still stop-the-world. The pause time is proportional to the number of live objects in old space. A heap snapshot can show you which objects dominate.

( 09 )Using Node.js --trace-gc to Identify Pause Duration and Frequency

Run `node --trace-gc app.js > gc.log`. The output includes lines like: `[28333:0x102008000] 10.123: Mark-sweep 456.5 (789.2) -> 234.1 (678.9) MB, 1234.5 ms`. The numbers: before heap size (456.5 MB used, 789.2 total), after GC (234.1 used, 678.9 total), and pause time (1234.5 ms). Look for Mark-sweep events with >100 ms. The frequency: if they occur every 10 seconds, you have a problem.

Use `grep 'Mark-sweep' gc.log | awk '{print $1,$4,$5,$13}'` to extract pause times. If many are >200 ms, you need to reduce old space size or allocation rate. Also look for 'Scavenge' events — they should be <10 ms. If scavenge is >50 ms, your young space might be too large (set via `--max-semi-space-size`).

( 10 )Heap Snapshot Analysis: Finding Large Retained Objects

In Chrome DevTools, take a heap snapshot and switch to 'Containment' view. Expand `(window)` or `(global)` to see global objects. Look for large arrays, Maps, or Sets. Sort by 'Shallow Size' to see biggest objects by direct memory. Then look at 'Retained Size' to see total memory including children. A large retained size often indicates a cache or a data structure that holds many objects.

Allocation instrumentation timeline: record allocation samples in DevTools. After a pause, look at which functions allocated the most memory. This can pinpoint code that creates temporary objects that survive to old space. In Node, use `heapdump` module: `const heapdump = require('heapdump'); heapdump.writeSnapshot();` then load the snapshot in Chrome DevTools. Compare two snapshots taken at different times to see which objects grew.

( 11 )Mitigating GC Pauses with Code Changes and Engine Tuning

First, reduce the size of the old generation by nullifying large references when they're no longer needed. For caches, use a bounded data structure. For large arrays, consider slicing or reassigning. Use WeakMap/WeakSet for ephemeral data that should not prevent GC.

Engine tuning: set `--max-old-space-size` to a value that gives more headroom but doesn't increase pause time proportionally (larger heap = longer pauses). Consider `--max-semi-space-size=64` (in MB) to reduce young generation scavenge frequency. Use `--optimize-for-size` to favor memory over speed. But these are last resorts after fixing code.

( 12 )Monitoring GC in Production and Setting Up Alerts

In Node.js, expose GC metrics via `process.on('gc', ...)` (requires `--expose-gc` flag but not recommended for production). Better: use `performance.memory` in Chrome/V8 (experimental). Use APM tools like New Relic or Datadog that track GC pause time. Set alerts: if p99 latency spikes correlate with GC events, alert on 'major GC pause > 200ms'.

For browser apps, use `performance.now()` in requestAnimationFrame to detect frame drops. Log when a frame takes >50ms. Use Chrome's `performance.measureMemory()` API (experimental) to estimate memory usage. Integrate with Real User Monitoring (RUM) to catch GC pauses affecting real users.

Frequently asked questions

How can I tell if a freeze is caused by GC vs. other CPU-intensive work?

Use Chrome DevTools Performance recording: GC bars are yellow/orange and labeled 'GC' or 'Major GC'. In Node, run with --trace-gc and correlate pauses with latency logs. Also, GC pauses usually show a sawtooth memory pattern: usage climbs then drops sharply.

Is increasing max-old-space-size a good fix for GC pauses?

No — it's a temporary bandage. Larger heap means longer pauses because marking takes more time. The real fix is to reduce heap size by finding and eliminating large object retention. Use heap snapshots to find the culprit.

Why do GC pauses happen at regular intervals (e.g., every 10 seconds)?

V8 triggers a major GC when old generation memory usage reaches a certain threshold (e.g., 70% of max-old-space-size). After collection, usage drops, then it grows again until the next threshold. The interval depends on allocation rate and threshold. To fix, reduce allocation or increase threshold (but see above).

Can Web Workers help avoid GC pauses in the browser?

Yes. Workers run in a separate V8 isolate with its own heap and GC. If you offload heavy processing to a Worker, GC pauses in the Worker won't freeze the main thread. However, the worker's pauses affect its own responsiveness. Use Workers for long-running tasks that can tolerate jank.

What tools can I use to debug GC in Node.js without --expose-gc?

Use `--trace-gc` (no performance impact, just logging). Use heapdump to take snapshots. Use the `--prof` profiler to see time spent in GC. In production, consider using the `v8` module's `getHeapStatistics()` to monitor heap sizes, or use APM tools that hook into V8's GC callbacks (like `async_hooks`).