What this usually means
Memory leaks in long-running processes happen when memory is allocated but never released. In Node.js, the garbage collector frees memory that is no longer referenced. A leak occurs when references to objects are kept alive unintentionally: a global array that grows without bound, an event listener that is never removed, a closure that holds a reference to a large object, or a stream that is never closed. Each job processed adds a little more to the memory footprint until the process exhausts available memory.
The first ten minutes \u2014 establish facts before touching code.
- 1Check the process memory over time. Use `process.memoryUsage()` in a periodic log or the platform's memory graph.
- 2Look at the heap. If heap grows without bound, the leak is in JavaScript objects. If external memory grows, the leak is in buffers, streams, or native addons.
- 3Take a heap snapshot early and another after many jobs. Compare them in Chrome DevTools to see which objects are accumulating.
- 4Check for global state that grows with job count. Arrays that are pushed to but never cleared. Maps that accumulate keys.
- 5Check for event listeners. If jobs add listeners to a shared EventEmitter without removing them, the emitter holds references.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchProcess memory metrics — `process.memoryUsage()`, platform memory graphs
- searchHeap snapshots — take with `--inspect` and Chrome DevTools or `heapdump` module
- searchGlobal variables and module-level state — arrays, Maps, Sets that grow without bound
- searchEvent listeners — `emitter.listenerCount(event)` to check for accumulation
- searchStreams and buffers — unclosed file handles, network sockets, database connections
- searchThird-party libraries — some have known memory leaks in certain versions
- searchWorker process code — the main job processing loop and per-job cleanup
Practical causes, not theory. These are the things you will actually find.
- warningGlobal array or Map accumulates data from every job without ever being cleared
- warningEvent listener added per job but never removed — EventEmitter holds a reference to the job's scope
- warningDatabase connection or HTTP agent creates a new connection pool per job without closing old ones
- warningStream (file, network) is opened but not destroyed after the job completes
- warningClosure captures a large object that outlives the job
- warningTimer (setInterval, setTimeout) is set per job but never cleared
- warningCache or memoisation layer grows without eviction policy
Concrete fix directions. Pick the one that matches your root cause.
- buildAdd explicit cleanup per job: clear timers, close streams, remove listeners in a `finally` block
- buildUse `WeakMap` or `WeakRef` for caches that should not prevent garbage collection
- buildSet a max size on any in-memory cache with LRU eviction
- buildRestart worker processes after N jobs as a safety net (e.g. `pm2` with `--max-memory-restart`)
- buildAdd memory monitoring and alerting: log `process.memoryUsage()` periodically and alert on upward trend
- buildProfile in production with `--inspect` and take heap snapshots periodically to track object count growth
A fix you cannot prove is a guess. Close the loop.
- verifiedRun the worker with 1000 jobs and observe memory usage. It should stabilise, not grow indefinitely.
- verifiedTake heap snapshots at job 100 and job 1000. Compare — no object type should grow proportionally to job count.
- verifiedRun the worker under a memory profiler and verify no objects are retained after each job completes.
- verifiedSet a memory limit in the test environment and verify the worker does not hit it over a long run.
- verifiedAdd an automated test that processes N jobs and asserts memory is within expected bounds.
Things that make this bug worse or harder to find.
- warningRestarting the worker every hour as the only fix — the underlying leak still exists
- warningNot measuring memory usage in production — you will not know about the leak until it crashes
- warningAdding items to a global cache without an eviction policy
- warningNot cleaning up event listeners and timers in async job handlers
- warningAssuming the garbage collector will handle everything — it cannot free objects that are still referenced