Node.js CPU Profiling Debug Guide

What this usually means

A synchronous JavaScript operation is hogging the event loop. Unlike I/O-bound delays (waiting for DB, files, network), CPU-bound Node.js issues are caused by heavy computation, tight loops, or excessive JSON serialization/deserialization, string manipulation, or regex backtracking. The single-threaded event loop cannot yield to handle other requests until that synchronous work completes. Profiling reveals exactly which function is consuming the CPU — not your Express route middleware, but the specific `JSON.stringify` or `Array.sort` inside it.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

1Run `top -H -p <pid>` to confirm a single thread is at 100% CPU (Node.js is single-threaded, so one thread will be maxed).
2Use `node --prof <app.js>` to enable V8 sampling profiler, then reproduce the issue for 30-60 seconds.
3Process the log with `node --prof-process isolate-*.log > processed.txt` and look at the bottom-up (bottoms up) summary — the hot function name will be at the top.
4If you can't restart the process, attach `perf` to an existing process: `perf record -F 99 -p <pid> -g -- sleep 30` then `perf script | node --prof-process --perf-basic-prof`.
5Generate a flame graph using `npx flamebearer <profile.log>` or upload to speedscope.app — look for a wide flat plateau at the top of a stack.
6For async-heavy apps, enable async hooks: `NODE_OPTIONS='--async-stack-traces' node app.js` and re-profile — this shows the async initiator of the CPU work.

( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

searchV8 profiling output: `isolate-*.log` generated by `--prof`, then `processed.txt` after `--prof-process`.
search`perf.data` and `perf script` output when using Linux perf.
searchFlame graph SVG or speedscope JSON from `0x` or `flamebearer`.
searchApplication code: the function names appearing in the top of the profile — likely a route handler, middleware, or utility function.
searchEvent loop lag metrics: `process.hrtime()` based check in your monitoring (e.g., `event-loop-lag` npm package).
searchCPU/memory dashboards: Datadog, New Relic, or Grafana showing process CPU and event loop delay.
searchSource maps: if using transpiled code (TypeScript, Babel), ensure source maps are loaded so the profile shows original `.ts` lines.

( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

warningSynchronous JSON serialization of large objects (e.g., `JSON.stringify` on a 50MB response) while holding a lock.
warningInefficient regex with catastrophic backtracking (e.g., `/(a+)+b/.test(longString)`).
warningDeep recursion in a user-facing endpoint (e.g., tree traversal without memoization).
warningHeavy `Array.sort` with a complex comparator on a large array (e.g., sorting 100k objects by multiple fields).
warningSynchronous crypto operations (e.g., `pbkdf2Sync` or `randomBytesSync`) called in a request handler.
warningThird-party library doing CPU work on the main thread (e.g., `moment` parsing many dates in a loop).

( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

buildMove CPU-intensive work to a separate worker thread using `worker_threads` or a child process.
buildRefactor to break large synchronous operations into chunks with `setImmediate` or `queueMicrotask` to yield to the event loop.
buildUse streaming JSON serialization (e.g., `JSONStream.stringify`) instead of `JSON.stringify` for large payloads.
buildReplace regex with string methods (e.g., `indexOf`, `startsWith`) or use a DFA-based library like `re2`.
buildCache the result of expensive computations with memoization or a TTL cache (e.g., `lru-cache`).
buildOffload crypto operations to Node.js's asynchronous variants (e.g., `crypto.pbkdf2` instead of `crypto.pbkdf2Sync`) or worker threads.

( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

verifiedRe-run the profiling command (e.g., `node --prof` or `perf`) under the same load and confirm the hot function no longer appears.
verifiedCheck event loop lag metrics: should return to normal (<100ms) under load.
verifiedRun a load test (e.g., `autocannon -c 100 -d 30 http://localhost:3000/api`) and verify p99 latency drops below 500ms.
verifiedMonitor CPU usage: should no longer pin at 100% on a single core.
verifiedVerify no new regressions: run existing unit/integration tests that cover the changed code.
verifiedDeploy to a canary and monitor for 10 minutes with production traffic before full rollout.

( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

warningJumping to 'add more servers' without profiling — horizontal scaling doesn't fix a synchronous CPU hog, it just multiplies the problem.
warningAdding `setImmediate` calls without understanding how V8's microtask queue works — you can still starve the event loop if the chunk is too large.
warningAssuming the problem is database queries when CPU is high — check event loop lag first; DB queries are async and yield.
warningRestarting the process before capturing a profile — you lose the evidence. Always profile while the issue is happening.
warningUsing `--prof` on a production server without understanding the overhead (1-2% CPU) — it's safe for short captures, but don't leave it on permanently.
warningIgnoring async stack traces — the hot function may be called from an async context; `--async-stack-traces` helps trace back to the origin.

( 07 )War story

The JSON.stringify That Killed Our API

Senior Backend EngineerNode.js 18, Express 4, MongoDB via Mongoose, deployed on AWS ECS (2 vCPU, 4GB).

Timeline

14:02PagerDuty alerts: p99 latency for `/api/reports/daily` jumps from 50ms to 12s. CPU on all 4 ECS tasks spikes to 100%.
14:05Checked Datadog dashboard: event loop lag > 8s, heap flat at 200MB, no error rate increase.
14:08SSH into one ECS instance, run `top -H -p $(pgrep -f 'node')` — one thread at 99% CPU.
14:12Attach `perf` to the process: `perf record -F 99 -p <pid> -g -- sleep 30`.
14:15Generate flame graph: `perf script | node --prof-process --preprocess -j isolate*.log > flame.json` and load into speedscope.
14:18Flame graph shows a wide plateau: `JSON.stringify` > `ReportService.generateDailyReport` > `FormatUtils.formatCurrency`.
14:25Check code: `formatCurrency` iterates over an array of 10k transactions and calls `JSON.stringify` on each object individually inside a loop.
14:30Hotfix: replace individual `JSON.stringify` with a single `JSON.stringify` of the whole array after aggregation. Also add streaming response.
14:35Deploy fix to one task, confirm event loop lag drops to 20ms. Roll out to all tasks.
14:45p99 latency back to 45ms. Incident resolved.

I got paged for a p99 latency jump from 50ms to 12 seconds on the daily reports endpoint. CPU on all four ECS tasks was pegged at 100%. My first instinct was a database problem — maybe a missing index or a slow aggregation pipeline. But Datadog showed event loop lag over 8 seconds and zero increase in database query time. That's classic event loop starvation: the CPU is busy doing JavaScript, not waiting for I/O.

I SSH'd into one box and ran `top -H` to confirm a single thread was maxed. Then I used `perf record` for 30 seconds — the go-to when you can't restart the process with `--prof`. I generated a flame graph with speedscope and saw a massive plateau for `JSON.stringify` inside a function called `formatCurrency`. The code was iterating over 10,000 transaction objects and calling `JSON.stringify` on each one individually. That's 10,000 synchronous serializations, each of which blocks the event loop for a few microseconds — but together they added up to seconds.

The fix was straightforward: build the entire response object in memory and call `JSON.stringify` once. I also added streaming via `res.json()` to avoid buffering the whole response. I deployed the fix to one task, confirmed the event loop lag dropped to 20ms, then rolled out to all tasks. Within 15 minutes, p99 latency was back to 45ms. The lesson: always profile before you assume the bottleneck is I/O. A flame graph tells you exactly where the CPU is burning, and in Node.js, a synchronous hot loop is the #1 cause of production latency spikes.

Root cause

A loop calling `JSON.stringify` on each of 10,000 objects individually, causing synchronous CPU work that starved the event loop.

The fix

Aggregated all objects into a single array and called `JSON.stringify` once. Also switched to streaming response.

The lesson

Always profile CPU contention with a flame graph before scaling horizontally. In Node.js, a synchronous hot loop is a single-thread bottleneck that more instances can't fix.

( 08 )Reading a V8 Profiling Log

When you run `node --prof app.js`, V8 takes samples of the call stack at 1ms intervals. After processing with `node --prof-process`, you get a flat file with ticks per function. The key columns are `[Bottom up (heavy)]` and `[Top down (heavy)]`. Focus on `Bottom up` — it shows the total time spent in a function including its callees. The top entry is your hottest function.

Example output: ` ticks total nonlib name 1125 56.3% 65.4% Function: JSONSerialize` means `JSONSerialize` consumed 56.3% of all samples. Look for functions from your application code (not node internals). If you see `LazyCompile` or `*` prefix, those are V8-optimized functions — normal. The real culprit is usually a function you wrote that appears high with a wide flame graph plateau.

( 09 )Async Hooks for CPU Profiling

Standard CPU profiling shows synchronous stacks, but in an async-heavy Node.js app, the CPU work may be triggered by an async operation (e.g., a `setTimeout` callback or a Promise `.then`). Without async stack traces, you see the hot function but not how it was scheduled. Use `NODE_OPTIONS='--async-stack-traces'` to enable async stack traces in V8. Then run `node --prof` as usual.

When processing the profile, async frames appear with `async` prefix. This helps trace back to the original request or timer that spawned the CPU work. For example, you might see `async <anonymous> -> onread -> ReadStream._read -> myHotFunction`. This tells you the hot function was triggered by a stream read, not a direct request handler.

( 10 )Using perf on Linux for Production Profiling

When you can't restart the Node.js process with `--prof`, use Linux `perf` to sample the running process. Install perf (usually in `linux-tools-common` package) and run: `perf record -F 99 -p $(pgrep -f 'node app') -g -- sleep 30`. This captures 99 samples per second for 30 seconds with call graphs. Then convert the data: `perf script --no-inline > perf.script`.

Node.js provides a script to convert perf output: `node --prof-process --preprocess -j perf.script > flame.json`. This produces a JSON file you can load into speedscope.app or `npx flamebearer`. Note: `perf` requires `--perf-basic-prof` or `--perf-prof` flags at Node.js startup. If you didn't start with those, the symbols won't be resolved, but you'll still see raw addresses — still useful to identify hot code paths by address range.

( 11 )Flame Graph Interpretation

A flame graph shows stack depth on the y-axis and sample count on the x-axis. The top of each stack is the function that was running when sampled. The width of a function's bar indicates its total sample count (self + descendants). A wide plateau at the top means that function is consuming CPU directly. A tall, narrow spike means deep call stacks but less time.

The key is to find the widest top-level function that is not a system or library function. Hover over bars to see the function name. If you see `JSON.stringify` wide at the top, that's your culprit. If you see `RegExp.test` wide, suspect regex backtracking. Look for loops: functions with `for`, `while`, or `forEach` in the name often appear. Also note that V8 may inline functions — if you see an unexpected function, check if it's inlined by looking at the source map.

( 12 )Common Misconceptions: CPU vs I/O Bound

Many engineers assume high CPU means the database is slow (because waiting for DB causes high CPU in a busy loop). In Node.js, that's wrong: asynchronous I/O waits do not consume CPU — they release the event loop. If CPU is at 100%, the event loop is actively running JavaScript. So if you see high CPU and slow requests, the bottleneck is CPU, not I/O. Check event loop lag — if it's high, you have synchronous CPU work.

Another misconception: 'add more instances' fixes CPU issues. But if the CPU work is synchronous, each instance will still max out its single thread. The correct fix is to either make the work asynchronous (chunking, workers) or reduce the amount of work (caching, algorithm improvement). Horizontal scaling only helps if the CPU work is spread across many instances, but each instance still hits the same limit. Profile first, then decide.

Frequently asked questions

What's the difference between `--prof` and Linux `perf`?

`--prof` is V8's built-in sampling profiler. It's easy to use but requires restarting the process with the flag. Linux `perf` samples the CPU directly without restarting the process, but you need to start Node.js with `--perf-basic-prof` (or similar) to get symbol resolution. In production where you can't restart, use `perf` if your Node.js was started with the right flags; otherwise, use a lightweight profiler like `0x` that can attach to a running process.

How do I profile a Node.js process that is already running?

You can use `0x` (npm package) to attach to a running process: `0x -p <pid>`. This uses V8's built-in sampling via a USR2 signal. Alternatively, use `perf` on Linux if the process was started with `--perf-basic-prof`. Another option: `node --prof` requires restart, but you can use `kill -USR2 <pid>` to start/stop profiling on a process started with `--prof` (V8's built-in).

Why do I see `[unknown]` frames in my flame graph?

`[unknown]` frames appear when the profiler cannot resolve function names. This often happens with JIT-compiled code (V8's Crankshaft/TurboFan) or when using `perf` without the correct symbol flags. For `--prof` results, `[unknown]` can occur if the process was stopped before flush. For `perf`, ensure you started Node.js with `--perf-basic-prof` and that the binary has symbols not stripped. Also, try using `perf script --no-inline` to get more raw entries.

Can CPU profiling catch async/await bottlenecks?

Yes, but you need async stack traces enabled. Without `--async-stack-traces`, the profile shows the synchronous portion of the async function but loses the context of how it was called. With async stack traces, you'll see the full async chain (e.g., `request handler -> async function -> await -> synchronous hot loop`). This is critical for debugging CPU spikes that happen inside async functions (e.g., a for loop with `await` that actually runs synchronously because the awaited promise resolves immediately).

What if my CPU spike is intermittent?

For intermittent spikes, you need always-on profiling with low overhead. Use V8's built-in profiler with a rotating buffer: start Node with `--prof --prof-logfile=/tmp/node.log --prof-interval=1000` to sample every second and write to a log. Then use a script to tail the log and trigger a capture when CPU exceeds a threshold. Alternatively, use a tool like `clinic` (Node.js clinic) that can be left running and captures a profile on demand. For production, consider using a continuous profiling agent like `pyroscope` or `gprofiler` that samples periodically and uploads profiles.

Node.js CPU Profiling: Fixing a Stuck Event Loop in Production

What this usually means

Frequently asked questions