Observability12 min read

Tracking Down a 200 MB Leak with Python Memory Profilers

A production API was silently leaking 200 MB of RAM every hour. Here's how memory profilers found the culprit—a forgotten NumPy array reference—and how you can apply the same techniques.

memory profilingPythonmemraytracemallocobjgraphdebuggingperformance

A few months ago, one of our Flask APIs started gradually consuming more and more memory. After about 12 hours, the process was sitting at 2.5 GB and the OOM killer would step in. We'd restart the pod, and the cycle repeated. Standard metrics (RSS, heap size) told us something was leaking, but they didn't say what or where.

This is the point where most engineers reach for a generic 'memory profiler' and run it for a few minutes. That rarely works. The leak was slow—hundreds of MB over hours—so we needed tools that could capture allocation traces over time and then narrow down the retained objects. Here's exactly how we did it.

Step 1: Get a Baseline with tracemalloc

We turned on Python's built-in tracemalloc at application startup. This records the stack trace for every allocation. In a Flask app, you can enable it in the main block:

Enable tracemalloc with frame depth 25 to capture sufficient context.
import tracemalloc

if __name__ == '__main__':
    tracemalloc.start(25)  # 25 frames deep
    app.run()

We then used a periodic snapshot comparison—every 30 seconds, we took a snapshot and computed the difference. The top entries showed that 95% of new allocations came from a single module: our data processing pipeline. Specifically, a function called `normalize_vectors`.

Interpreting the Snapshot

Comparing two snapshots to see which lines allocated the most memory over 5 minutes.
import tracemalloc

snapshot1 = tracemalloc.take_snapshot()
time.sleep(300)
snapshot2 = tracemalloc.take_snapshot()

top_stats = snapshot2.compare_to(snapshot1, 'lineno')
for stat in top_stats[:10]:
    print(stat)

The output pointed to a line that created a list of NumPy arrays: `arrays = [np.array(...) for ...]`. That list was being appended to a global cache and never cleared. But we still didn't know why the garbage collector wasn't cleaning it up.

warning

tracemalloc's overhead is non-trivial. In production, only run it for short bursts or enable it conditionally (e.g., via a config flag). We used a 10-minute window and then disabled it.

Step 2: Visualize the Leak with memray

tracemalloc gave us line numbers, but we wanted a visual timeline. We installed memray (a C extension by Bloomberg) and ran the Flask process with memray tracking:

Run the Flask app with memray tracking. We used 1 worker to isolate the leak.
memray run --follow-fork -o output.memray gunicorn -w 1 app:app

After an hour, we hit Ctrl+C and generated a flame graph: `memray flamegraph output.memray`. The flame graph showed a massive tower under `normalize_vectors` that grew over time. Specifically, the `list.append` calls for NumPy arrays were accumulating in a module-level list that was supposed to be flushed every 1000 items—but a bug set the flush threshold to 10000, and the flush condition never triggered.

lightbulb

memray's '--live' option shows a real-time terminal UI. Use it for interactive debugging. You can sort by total memory, number of allocations, or allocation rate.

Step 3: Find the References with objgraph

Now we knew where allocations were happening, but we still needed to confirm that these objects were actually retained (i.e., not freed). We used objgraph to find back-references to the leaked objects.

Use objgraph to visualize what keeps the leaked list alive. The image will show the reference chain.
import objgraph

# Find all lists containing >100 NumPy arrays
objgraph.show_most_common_types(limit=20)

# Look at back-references for a specific list
big_list = [x for x in gc.get_objects() if isinstance(x, list) and len(x) > 100]
if big_list:
    objgraph.show_backrefs(big_list[0], max_depth=5, filename='backrefs.png')

The generated PNG showed that the list was referenced by a module-level variable `buffer` in `data_pipeline.py`. That variable was never reset after a flush, and the flush condition was incorrect. The fix was a one-liner: reset `buffer = []` after flush.

The Real Fix

The missing line that caused the leak. After fixing, memory stabilized at ~150 MB.
# Before (leaky)
if len(buffer) >= FLUSH_THRESHOLD:
    flush(buffer)
    # missing: buffer = []

# After
if len(buffer) >= FLUSH_THRESHOLD:
    flush(buffer)
    buffer = []
200 MB/hour

Memory leaked before fix

1 line

Bug causing the leak

What I Wish I Knew Earlier

  1. 1Enable tracemalloc from the start in development—retroactive profiling is much harder.
  2. 2Use memray's '--live' mode for interactive filtering by module or function.
  3. 3objgraph's `show_backrefs` is more useful than `show_chain` for finding unexpected references.
  4. 4Don't assume the garbage collector will handle circular references; check with gc.garbage.

The hardest part of memory profiling isn't finding the allocation—it's finding the reference that keeps it alive.

Memory profiling is a skill you build by encountering real leaks. The tools are straightforward, but the debug mindset—asking 'why is this object still alive'—is what matters. Next time your Python process balloons, try this trio: tracemalloc for initial line-level hints, memray for a temporal view, and objgraph to close the case.

Frequently asked questions

What's the difference between memray and tracemalloc?

memray is a C extension that tracks every allocation with low overhead, producing flame graphs and timeline reports. tracemalloc is built into Python's standard library and traces where allocations come from, but has higher overhead and doesn't track native code.

How do I profile a long-running production process without restarting it?

Use memray's attach mode (memray attach <pid>) to connect to a running process. Alternatively, use py-spy or the built-in tracemalloc with a signal handler to dump snapshots on demand.

Can memory profilers cause performance degradation?

Yes—especially tracemalloc when tracking many allocations. Use it sparingly in production. memray has lower overhead but still adds ~10-20% CPU cost. Prefer short profiling sessions.

What should I do if the profiler shows large allocations but no obvious leaks?

Check for objects that are alive but not referenced by your code, like cached objects or singletons. Use gc.get_objects() to list live objects and objgraph.show_backrefs() to visualize reference chains.