What this usually means
A Node.js memory leak happens when live objects are unintentionally retained, preventing garbage collection. This usually means references to data—often via closures, caches, global arrays, or event listeners—stick around across requests. Leaks are hard to catch because usage patterns often mask them in dev, and only long-running or production workloads surface the problem. Diagnosing leaks requires tracking real object retention, not just watching RSS in top.
The first ten minutes — establish facts before touching code.
- 1Monitor process.memoryUsage() output at regular intervals (e.g., log every minute)
- 2Trigger manual GC using node --expose-gc and global.gc() in a test/staging environment, then compare heapUsed before/after
- 3Capture heap snapshots with process._debugProcess(pid) or Chrome DevTools if heap size grows
- 4Run node --inspect=0.0.0.0:9229 and connect DevTools, use the Memory tab for heap diff
- 5Watch for suspicious listeners via process.listenerCount() on core EventEmitters
- 6Use lsof -p <pid> to check for file/socket handle leaks alongside memory growth
The specific files, logs, configs, and dashboards that usually own this bug.
- searchHeap snapshots (call Chrome DevTools with chrome://inspect)
- searchprocess.memoryUsage() logs for heapUsed and rss
- searchPM2, forever or systemd logs for OOM exits
- searchCustom telemetry dashboards (e.g., DataDog, New Relic, or Grafana panels for heap size)
- searchAny explicit or implicit caches in your codebase (in-memory LRU, global Maps, etc.)
- searchThird-party library usage—grep for modules with known leak issues
- searchEvent listener registration points (e.g., emitter.on) in your app code
Practical causes, not theory. These are the things you will actually find.
- warningAccumulating data in global arrays/objects that are never cleared
- warningUnremoved event listeners (e.g., emitter.on without corresponding .off/.removeListener)
- warningClosures capturing references to large objects in long-lived async callbacks
- warningLeaky in-memory caches with missing eviction logic
- warningUnresolved or stuck Promises holding references in closure scope
- warningMisbehaving third-party modules (e.g., old versions of request, ws, or mongoose)
Concrete fix directions. Pick the one that matches your root cause.
- buildAudit all global and module-scoped variables for unbounded growth—add logging for their length/size
- buildEnsure every .on/.addListener call has a matching .off/.removeListener when no longer needed
- buildUse weak references (WeakMap, WeakSet) for cache/storage of objects if possible
- buildLimit cache sizes with LRU or TTL patterns (e.g., node-lru-cache with strict max)
- buildRefactor long-lived closures to avoid capturing unnecessary objects
- buildUpgrade or patch known leaky npm dependencies—check their GitHub issues for memory leaks
A fix you cannot prove is a guess. Close the loop.
- verifiedAfter fix, run the process under production-like load for several hours and confirm heapUsed stabilizes in logs
- verifiedTake before/after heap snapshots and verify old-generation objects do not accumulate
- verifiedUse autocannon or wrk for stress-testing and monitor process.memoryUsage() output for plateauing memory
- verifiedCheck that PM2/systemd no longer restarts your service for OOM events over days
- verifiedRun leak detection tools (e.g., memwatch-next, node-memleak) and ensure they report no new leaks after fix
Things that make this bug worse or harder to find.
- warningAssuming garbage collection is broken instead of looking for lingering references
- warningIgnoring process warnings like 'Possible EventEmitter memory leak detected'
- warningUsing process.exit() to mask symptoms rather than identifying the leak
- warningFocusing only on RSS instead of heapUsed or heapTotal in metrics
- warningTrusting that 'it works locally' without running sustained or production-level load tests
- warningLeaving diagnostic code (e.g., forced GC, heavy logging) active in production
Unbounded WebSocket Event Listeners in Node.js API
Timeline
- 10:00PagerDuty alert: API response times spiking, memory usage >1GB
- 10:10Observed PM2 restarting container due to OOM every 90 minutes
- 10:20Captured heap snapshot: 700k retained WebSocket objects
- 10:35Traced accumulating listeners on each connection: wsServer.on('message', handler)
- 10:45Patched code to explicitly remove listeners on close, deployed hotfix
- 11:30Heap growth halted at 350MB under load; no further PM2 restarts
When our API started hitting 1GB+ resident memory and PM2 began auto-restarting the container hourly, I immediately suspected a leak—our typical steady state is 250MB. Heap snapshots showed a massive buildup of objects tied to WebSocket connections.
Comparing listener counts before and after each connection, it became clear we were never cleaning up event listeners. Each disconnection left behind closures holding onto the original socket and request context. This only showed up under real traffic, not our test harness.
I quickly patched the handler to ensure .removeListener was called on 'close' events. After a rapid hotfix deployment, memory plateaued, even with thousands of concurrent sockets. The lesson: always pair on/off for events, especially in high-throughput systems.
Root cause
Unremoved event listeners on the ws WebSocket server caused closures to persist connections in memory.
The fix
Added explicit .removeListener on 'close' to clean up per-connection event handlers.
The lesson
Every .on must be paired with .off; leaks hide in closures and event graphs under load, not in isolated tests.
Don't attach DevTools to your main prod process; clone traffic to a staging container instead. Enable --inspect=0.0.0.0:9229 there, and run identical load. This avoids breaking live traffic with inspector pauses.
In prod, use lightweight periodic logging: setInterval(() => log(process.memoryUsage()), 60000). Plot heapUsed over time. Look for a sawtooth or flatline pattern—flat good, diagonal bad.
Take a snapshot at startup, then after sustained load. In DevTools, sort objects by retained size. Look for app-specific classes, global arrays, or 'Closure' entries with hundreds of thousands of retained objects.
Pay attention to Detached DOM tree and Array objects—these often signal accidental retention via module-scoped variables or forgotten caches.
Node will print 'MaxListenersExceededWarning' if you attach over 10 listeners to an emitter. Don't ignore this: it's often the only warning before disaster.
Use process.on('warning', ...) to capture these at runtime (log them centrally). If you see this warning, check all .on/.addListener sites and ensure you remove listeners after use.
In-memory caches are notorious for leaks. LRU caches must have a hard size limit (e.g., new LRU({max: 10000})). Avoid unbounded Map/Object caches: even a tiny leak rate adds up over days.
If your cache should only hold per-request or session data, clear it on completion. Use WeakMap for ephemeral relationships, so objects can be collected when unused.
Frequently asked questions
Can I use global.gc() to fix my memory leak in production?
No. Forcing GC only hides the leak briefly and degrades performance; you must find and eliminate lingering references in code.
Is high RSS always a leak in Node.js?
Not necessarily. Watch heapUsed and external memory, not just RSS—large native buffers or mmap’d files can inflate RSS without a true JS leak.
What tools are best for finding leaks in Node.js?
Use heap snapshots via Chrome DevTools or clinic.js. For continuous detection, try memwatch-next or heapdump in staging. Always verify under real load.
How do third-party modules cause leaks?
Bugs in modules may retain data in closure or cache, or never remove event listeners. Always check module issue trackers if you suspect a leak after an upgrade.
How can I detect a memory leak before it becomes critical?
Set up automated memory plotting for heapUsed and heapTotal. Alert if the slope increases consistently over hours at steady load.