LEARN · DEBUGGING GUIDE

Go pprof CPU & Memory Profiling: Real-World Debugging Tactics

Stop guessing why your Go app is slow or leaking. Use pprof to find the exact function burning CPU or the allocation piling memory.

AdvancedPerformance7 min read

What this usually means

The symptoms point to a code path that is either CPU-inefficient (e.g., tight loops, excessive allocations, string concatenations) or memory-inefficient (e.g., goroutine leaks, unbounded caches, large string/byte slices retained). In Go, pprof samples the program counter for CPU and tracks live allocations for memory. When you see a function dominating samples, it's executing far more than expected—often because of O(n²) algorithms, unnecessary allocations, or contention on a mutex hidden inside a hot path. Memory profiles reveal which functions allocate the most; a growing heap over time usually indicates objects that escape the stack and are not GC'd because they're still referenced.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

  • 1Run `curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.pprof` (enable net/http/pprof in your binary).
  • 2Run `curl http://localhost:6060/debug/pprof/heap > heap.pprof` to capture a live heap snapshot.
  • 3Open the CPU profile with `go tool pprof -http=:8081 cpu.pprof` and inspect the flame graph.
  • 4Open the heap profile with `go tool pprof -http=:8082 heap.pprof` and sort by 'inuse_space'.
  • 5Check goroutine count: `curl http://localhost:6060/debug/pprof/goroutine?debug=2` | grep -c 'goroutine'
  • 6If memory grows, set `GODEBUG=gctrace=1` and watch GC logs for frequent, long STW pauses.
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • search/debug/pprof/profile (CPU profile endpoint)
  • search/debug/pprof/heap (heap profile endpoint)
  • search/debug/pprof/goroutine (goroutine stack dumps)
  • searchFlame graph output from `go tool pprof -http`
  • searchgc trace output when running with GODEBUG=gctrace=1
  • searchVendor or internal packages that allocate large byte slices or strings in a loop.
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningUnbounded string concatenation with `+` inside a hot loop (causes reallocation and O(n²) time).
  • warningGoroutine leak: a goroutine blocked on a channel that never receives, holding references to large data.
  • warningMissing `defer` for mutex unlock, causing goroutines to pile up waiting for a lock.
  • warningStoring large objects in a global cache with no eviction strategy.
  • warningExcessive use of `[]byte` to string conversion in high-throughput HTTP handlers.
  • warningUnbounded goroutine spawning from incoming requests without rate limiting or worker pool.
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildReplace string concatenation with `strings.Builder` or pre-allocate slices with known capacity.
  • buildAdd a context cancellation or timeout to goroutines that wait on channels.
  • buildUse `sync.Pool` for frequently allocated temporary objects like `bytes.Buffer`.
  • buildImplement a bounded cache with TTL or LRU eviction (e.g., using `lru` package).
  • buildReplace `string([]byte)` conversions with `unsafe` only if you're certain of immutability, or avoid conversion by using `[]byte` throughout.
  • buildAdopt a worker pool pattern (e.g., `go` with a semaphore channel) to limit concurrent goroutines.
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedRe-run CPU profile after fix: the hot function should drop from >50% to <10% of samples.
  • verifiedRe-run heap profile after fix: the top allocator should no longer be the previously problematic function.
  • verifiedMonitor memory usage with `runtime.ReadMemStats` every 10 seconds; the inuse heap should plateau.
  • verifiedLoad test with the same request rate as the incident; latency should be stable, not climbing.
  • verifiedCheck goroutine count before and after fix: it should stay flat, not increasing over time.
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningProfiling in development with unrealistic load; always profile under production traffic.
  • warningUsing `-http` flag with default port 8080 if you already have a service there; choose different ports.
  • warningIgnoring goroutine profiles when memory grows; goroutine leaks often hide behind memory leaks.
  • warningFixing the symptom (e.g., increasing memory) instead of the root cause (e.g., removing the leak).
  • warningRunning pprof for only 1 second; 30+ seconds gives representative samples.
( 07 )War story

Memory Leak in a Go Chat Service – Unbounded User Sessions

Senior Backend EngineerGo 1.18, Redis, gRPC, Kubernetes (1.22), Prometheus

Timeline

  1. 14:00PagerDuty alert: memory usage on chat-service pods exceeds 90% of 2GB limit.
  2. 14:05kubectl exec into pod, curl localhost:6060/debug/pprof/heap > heap.pprof
  3. 14:08go tool pprof -http=:8081 heap.pprof: top shows 'sessionManager.cleanup' as top allocator (40% inuse_space)
  4. 14:12Inspect source: sessionManager.cleanup runs every 5 minutes but only marks sessions as expired, doesn't free them.
  5. 14:15Check goroutine count: 15000 goroutines, most stuck in 'channel receive' on sessionManager.expireChan.
  6. 14:18Notice that expired sessions are not removed from the internal map; cleanup goroutine re-reads the map but never deletes entries.
  7. 14:25Deploy fix: add delete entry from map in cleanup loop. Also add TTL-based eviction using a priority queue.
  8. 14:30Monitor memory: after 10 minutes memory plateaus at 500MB, goroutine count drops to 200.

I got paged at 2pm on a Tuesday. The chat service had been running for a week without issues, but suddenly memory was climbing every minute. We had 150 pods and they were all hitting the 2GB limit, causing OOM kills and restarts. My first instinct was to grab a heap profile from a live pod. I used `kubectl exec` to get into a pod and curl the pprof endpoint. The heap profile showed that `sessionManager.cleanup` was responsible for 40% of all in-use memory. That function ran every 5 minutes to expire idle sessions, but looking at the code, I saw it only set an `expired` flag on the session struct. It never deleted the session from the map. So the map kept growing unboundedly.

Then I checked the goroutine count using the goroutine pprof endpoint. There were 15,000 goroutines, most waiting on `sessionManager.expireChan`. That channel was used to notify the cleanup goroutine, but because the map was never shrinking, the cleanup goroutine was spending more and more time iterating over millions of entries, blocking other goroutines that were trying to send sessions to the channel. This created a feedback loop: more sessions meant slower iteration, which meant more goroutines piling up. The fix was straightforward: after marking a session expired, I added `delete(sm.sessions, sessionID)` to remove it from the map.

I also added a bounded priority queue based on expiry time, so the cleanup goroutine only iterates over sessions that are actually expiring soon, not the entire map. After deploying the fix, memory stabilized at around 500MB and goroutines dropped to 200. The lesson: never assume maps will shrink themselves. Always pair an expiry flag with actual deletion from the data structure. And always check goroutine profiles when you see a memory leak—they often point to the same root cause.

Root cause

sessionManager.cleanup only marked sessions as expired but never deleted them from the internal map, causing unbounded growth of the map and goroutine pile-up on the expiration channel.

The fix

Added `delete(sm.sessions, sessionID)` in the cleanup loop and introduced a priority queue to limit iteration to only sessions that are close to expiry.

The lesson

Always delete entries from maps when they are considered expired, and consider using a bounded data structure (e.g., priority queue with TTL) instead of scanning the entire map periodically.

( 08 )Reading the Flame Graph: What to Look For

When you run `go tool pprof -http=:8081 cpu.pprof`, the flame graph is your best friend. Each rectangle is a function call, width proportional to CPU time. Look for wide bars at the top that are not `runtime.*`—those are your hot functions. A function that takes up >20% of the width is suspicious. Click on it to see the callers and callees.

Pay special attention to functions that allocate memory inside a loop. In the flame graph, you can also switch to 'alloc_space' view to see allocation hotspots. If a function like `json.Unmarshal` or `fmt.Sprintf` is wide, it's often because it's being called too many times or with large data. Check for allocations in hot paths by using `-alloc_space` flag on the heap profile.

( 09 )Goroutine Leaks: The Hidden Memory Leak

A goroutine leak is when goroutines are created but never exit. Each goroutine holds at least a few KB of stack space, plus references to objects on the heap. Over time, these accumulate. To detect, use `curl http://localhost:6060/debug/pprof/goroutine?debug=2` and look for goroutines stuck in `chan receive`, `chan send`, `select`, or `IO wait`. If you see thousands of goroutines all in the same state, you likely have a leak.

Common patterns: a goroutine waiting on a channel that never receives, or a goroutine blocked on a mutex that is never unlocked (missing `defer`). The fix is usually to add a context with timeout or cancel, or to ensure the channel is drained. Also, use `runtime.NumGoroutine()` in your metrics to alert when the count exceeds a threshold.

( 10 )Allocation Profiling with `-alloc_space`

The default heap profile shows `inuse_space` (memory currently live). But sometimes you want to see which functions allocate the most over time, even if they are immediately GC'd. Use `curl http://localhost:6060/debug/pprof/heap?gc=1` to force a GC before snapshot, then `go tool pprof -alloc_space heap.pprof`. This shows cumulative allocations since program start.

This is useful for identifying functions that cause high GC pressure. For example, a function that allocates a large `[]byte` per request, even if it's freed after the request, causes GC to run more often. Look for functions that allocate many small objects (like `struct` pointers) inside hot loops. Consider using `sync.Pool` to reuse them.

( 11 )Profiling in Production: Safety and Minimal Overhead

The pprof endpoints have minimal overhead—CPU profiling is enabled only during the sample period. The default sample rate is 100 Hz, which is safe for production. However, avoid running CPU profiles for longer than 30 seconds unless necessary. For memory, the heap profile is a snapshot and has negligible cost.

Expose pprof on a separate port (e.g., 6060) that is not publicly accessible. Use Kubernetes network policies or internal load balancers to restrict access. Alternatively, enable pprof only when needed via a signal handler (e.g., `SIGUSR1` to start profile). For Go 1.18+, you can also use `runtime.SetCPUProfileRate` programmatically.

( 12 )Interpreting Mutex Profiles

Mutex contention can masquerade as CPU or memory issues. Enable mutex profiling with `curl http://localhost:6060/debug/pprof/mutex?debug=1`. To get a profile, you must first enable mutex profiling in code: `runtime.SetMutexProfileFraction(1)`. Then capture with `go tool pprof -http=:8081 mutex.pprof`.

A mutex that is held for a long time will cause goroutines to pile up waiting for it, increasing goroutine count and potentially memory if each goroutine holds large objects. The fix is to reduce the critical section: move I/O outside the lock, use `sync.RWMutex` for read-heavy workloads, or replace with atomic operations.

Frequently asked questions

How do I enable pprof in my Go service?

Import `net/http/pprof` in your main package (usually via a blank import: `import _ "net/http/pprof"`). Then start an HTTP server on a separate port, e.g., `go http.ListenAndServe(":6060", nil)`. This registers the pprof endpoints. For production, use a dedicated mux and restrict access.

What's the difference between `inuse_space` and `alloc_space`?

`inuse_space` shows memory that is currently live (not GC'd). `alloc_space` shows total memory allocated over the lifetime of the program, including what was freed. Use `inuse_space` to find memory leaks (objects that stay alive). Use `alloc_space` to find allocation-heavy code paths that increase GC overhead.

Can I profile a running binary without modifying its code?

If the binary already imports `net/http/pprof` and exposes an HTTP endpoint, yes. If not, you can use `SIGQUIT` to trigger a goroutine dump (not a CPU profile) or use `go tool pprof` with a remote endpoint if the binary has the profiling endpoints built-in but not exposed. Alternatively, recompile with pprof enabled.

Why does my CPU profile show `runtime.mallocgc` as the top function?

This means your code is spending a significant portion of time allocating memory (calling `malloc` and waiting for GC). It indicates that you have too many allocations. Reduce allocations by reusing objects, using `sync.Pool`, or pre-allocating slices with capacity. Also consider using `-alloc_space` profile to find the allocating functions.

How do I profile a short-lived CLI tool?

For CLI tools, you can use `runtime/pprof` directly. In your `main` function, start a CPU profile with `pprof.StartCPUProfile(f)` and defer stop. Run the tool, then analyze the profile. For memory, call `pprof.WriteHeapProfile(f)` at the point of interest. You can also use `testing.Benchmark` for microbenchmarks.