All guides

LEARN \u00b7 DEBUGGING GUIDE

Worker process memory leak: how to find and fix memory leaks in background jobs

Your worker process runs fine for an hour. Then memory climbs from 100MB to 500MB. Then 1GB. Then the OOM killer terminates it. The process restarts, the cycle repeats.

AdvancedJavaScript/Node runtime debugging

What this usually means

Memory leaks in long-running processes happen when memory is allocated but never released. In Node.js, the garbage collector frees memory that is no longer referenced. A leak occurs when references to objects are kept alive unintentionally: a global array that grows without bound, an event listener that is never removed, a closure that holds a reference to a large object, or a stream that is never closed. Each job processed adds a little more to the memory footprint until the process exhausts available memory.

( 01 )Fast diagnosis

The first ten minutes \u2014 establish facts before touching code.

  • 1Check the process memory over time. Use `process.memoryUsage()` in a periodic log or the platform's memory graph.
  • 2Look at the heap. If heap grows without bound, the leak is in JavaScript objects. If external memory grows, the leak is in buffers, streams, or native addons.
  • 3Take a heap snapshot early and another after many jobs. Compare them in Chrome DevTools to see which objects are accumulating.
  • 4Check for global state that grows with job count. Arrays that are pushed to but never cleared. Maps that accumulate keys.
  • 5Check for event listeners. If jobs add listeners to a shared EventEmitter without removing them, the emitter holds references.
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • searchProcess memory metrics — `process.memoryUsage()`, platform memory graphs
  • searchHeap snapshots — take with `--inspect` and Chrome DevTools or `heapdump` module
  • searchGlobal variables and module-level state — arrays, Maps, Sets that grow without bound
  • searchEvent listeners — `emitter.listenerCount(event)` to check for accumulation
  • searchStreams and buffers — unclosed file handles, network sockets, database connections
  • searchThird-party libraries — some have known memory leaks in certain versions
  • searchWorker process code — the main job processing loop and per-job cleanup
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningGlobal array or Map accumulates data from every job without ever being cleared
  • warningEvent listener added per job but never removed — EventEmitter holds a reference to the job's scope
  • warningDatabase connection or HTTP agent creates a new connection pool per job without closing old ones
  • warningStream (file, network) is opened but not destroyed after the job completes
  • warningClosure captures a large object that outlives the job
  • warningTimer (setInterval, setTimeout) is set per job but never cleared
  • warningCache or memoisation layer grows without eviction policy
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildAdd explicit cleanup per job: clear timers, close streams, remove listeners in a `finally` block
  • buildUse `WeakMap` or `WeakRef` for caches that should not prevent garbage collection
  • buildSet a max size on any in-memory cache with LRU eviction
  • buildRestart worker processes after N jobs as a safety net (e.g. `pm2` with `--max-memory-restart`)
  • buildAdd memory monitoring and alerting: log `process.memoryUsage()` periodically and alert on upward trend
  • buildProfile in production with `--inspect` and take heap snapshots periodically to track object count growth
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedRun the worker with 1000 jobs and observe memory usage. It should stabilise, not grow indefinitely.
  • verifiedTake heap snapshots at job 100 and job 1000. Compare — no object type should grow proportionally to job count.
  • verifiedRun the worker under a memory profiler and verify no objects are retained after each job completes.
  • verifiedSet a memory limit in the test environment and verify the worker does not hit it over a long run.
  • verifiedAdd an automated test that processes N jobs and asserts memory is within expected bounds.
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningRestarting the worker every hour as the only fix — the underlying leak still exists
  • warningNot measuring memory usage in production — you will not know about the leak until it crashes
  • warningAdding items to a global cache without an eviction policy
  • warningNot cleaning up event listeners and timers in async job handlers
  • warningAssuming the garbage collector will handle everything — it cannot free objects that are still referenced