A few months ago, one of our Flask APIs started gradually consuming more and more memory. After about 12 hours, the process was sitting at 2.5 GB and the OOM killer would step in. We'd restart the pod, and the cycle repeated. Standard metrics (RSS, heap size) told us something was leaking, but they didn't say what or where.
This is the point where most engineers reach for a generic 'memory profiler' and run it for a few minutes. That rarely works. The leak was slow—hundreds of MB over hours—so we needed tools that could capture allocation traces over time and then narrow down the retained objects. Here's exactly how we did it.
Step 1: Get a Baseline with tracemalloc
We turned on Python's built-in tracemalloc at application startup. This records the stack trace for every allocation. In a Flask app, you can enable it in the main block:
Related debugging guides on Buglyst
import tracemalloc
if __name__ == '__main__':
tracemalloc.start(25) # 25 frames deep
app.run()We then used a periodic snapshot comparison—every 30 seconds, we took a snapshot and computed the difference. The top entries showed that 95% of new allocations came from a single module: our data processing pipeline. Specifically, a function called `normalize_vectors`.
Interpreting the Snapshot
import tracemalloc
snapshot1 = tracemalloc.take_snapshot()
time.sleep(300)
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
for stat in top_stats[:10]:
print(stat)The output pointed to a line that created a list of NumPy arrays: `arrays = [np.array(...) for ...]`. That list was being appended to a global cache and never cleared. But we still didn't know why the garbage collector wasn't cleaning it up.
tracemalloc's overhead is non-trivial. In production, only run it for short bursts or enable it conditionally (e.g., via a config flag). We used a 10-minute window and then disabled it.
Step 2: Visualize the Leak with memray
tracemalloc gave us line numbers, but we wanted a visual timeline. We installed memray (a C extension by Bloomberg) and ran the Flask process with memray tracking:
memray run --follow-fork -o output.memray gunicorn -w 1 app:appAfter an hour, we hit Ctrl+C and generated a flame graph: `memray flamegraph output.memray`. The flame graph showed a massive tower under `normalize_vectors` that grew over time. Specifically, the `list.append` calls for NumPy arrays were accumulating in a module-level list that was supposed to be flushed every 1000 items—but a bug set the flush threshold to 10000, and the flush condition never triggered.
memray's '--live' option shows a real-time terminal UI. Use it for interactive debugging. You can sort by total memory, number of allocations, or allocation rate.
Step 3: Find the References with objgraph
Now we knew where allocations were happening, but we still needed to confirm that these objects were actually retained (i.e., not freed). We used objgraph to find back-references to the leaked objects.
import objgraph
# Find all lists containing >100 NumPy arrays
objgraph.show_most_common_types(limit=20)
# Look at back-references for a specific list
big_list = [x for x in gc.get_objects() if isinstance(x, list) and len(x) > 100]
if big_list:
objgraph.show_backrefs(big_list[0], max_depth=5, filename='backrefs.png')The generated PNG showed that the list was referenced by a module-level variable `buffer` in `data_pipeline.py`. That variable was never reset after a flush, and the flush condition was incorrect. The fix was a one-liner: reset `buffer = []` after flush.
The Real Fix
# Before (leaky)
if len(buffer) >= FLUSH_THRESHOLD:
flush(buffer)
# missing: buffer = []
# After
if len(buffer) >= FLUSH_THRESHOLD:
flush(buffer)
buffer = []Memory leaked before fix
Bug causing the leak
What I Wish I Knew Earlier
- 1Enable tracemalloc from the start in development—retroactive profiling is much harder.
- 2Use memray's '--live' mode for interactive filtering by module or function.
- 3objgraph's `show_backrefs` is more useful than `show_chain` for finding unexpected references.
- 4Don't assume the garbage collector will handle circular references; check with gc.garbage.
The hardest part of memory profiling isn't finding the allocation—it's finding the reference that keeps it alive.
Memory profiling is a skill you build by encountering real leaks. The tools are straightforward, but the debug mindset—asking 'why is this object still alive'—is what matters. Next time your Python process balloons, try this trio: tracemalloc for initial line-level hints, memray for a temporal view, and objgraph to close the case.
Frequently asked questions
What's the difference between memray and tracemalloc?
memray is a C extension that tracks every allocation with low overhead, producing flame graphs and timeline reports. tracemalloc is built into Python's standard library and traces where allocations come from, but has higher overhead and doesn't track native code.
How do I profile a long-running production process without restarting it?
Use memray's attach mode (memray attach <pid>) to connect to a running process. Alternatively, use py-spy or the built-in tracemalloc with a signal handler to dump snapshots on demand.
Can memory profilers cause performance degradation?
Yes—especially tracemalloc when tracking many allocations. Use it sparingly in production. memray has lower overhead but still adds ~10-20% CPU cost. Prefer short profiling sessions.
What should I do if the profiler shows large allocations but no obvious leaks?
Check for objects that are alive but not referenced by your code, like cached objects or singletons. Use gc.get_objects() to list live objects and objgraph.show_backrefs() to visualize reference chains.