LEARN · DEBUGGING GUIDE

Node.js Graceful Shutdown Not Completing: Debugging Dirty Handles and Stuck Timers

When process.exit() hangs or SIGTERM is ignored, it's almost always an unclosed handle, an active timer, or a persistent connection. Here's exactly how to find and fix them.

IntermediateNode.js7 min read

What this usually means

Node.js's event loop keeps running as long as there are active handles—TCP sockets, file descriptors, timers, child processes, or worker threads. A graceful shutdown implementation should close all these handles, but commonly misses one because of oversight, race conditions, or third-party libraries that register handles internally. The shutdown logic itself might throw an unhandled rejection or never resolve a promise, causing the exit to stall. The root cause is almost always a handle that wasn't closed or a dangling callback that prevents the loop from draining.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

  • 1Send SIGTERM to the process and note whether it exits within 5 seconds
  • 2Run `strace -p <PID> -e trace=write -f` to see if the process is stuck on a write syscall
  • 3Use `lsof -i -P -n | grep <PID>` to list open sockets and file descriptors
  • 4Check for active timers with `process._getActiveHandles()` and `process._getActiveRequests()` in a debugger or via SIGUSR2 handler
  • 5Add a timeout in your shutdown handler: set a 5-second timer that forces process.exit(1) if cleanup doesn't complete
  • 6Run the app with `--trace-warnings` and check for unhandled promise rejections during shutdown
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • searchYour shutdown handler code (where you listen to SIGTERM/SIGINT and call server.close())
  • searchThird-party library initialization (e.g., database clients that keep connection pools alive)
  • searchActive timers or intervals that are never cleared (setInterval, setTimeout without clear)
  • searchLong-lived HTTP/WebSocket connections that are not explicitly destroyed
  • searchChild processes spawned with `spawn` or `fork` that are not killed
  • searchFile descriptors opened with `fs.createReadStream` or `net.createServer` that are not closed
  • searchDockerfile HEALTHCHECK configuration that interferes with signal handling
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningServer's `close()` callback never fires because there's an active keep-alive connection that didn't drain
  • warningA database or cache client (Redis, PostgreSQL) maintains persistent connections not closed on shutdown
  • warningAn open file handle from a stream that is not properly destroyed
  • warningA long-running timer or interval that was never cleared (e.g., monitoring heartbeat)
  • warningA Promise in the shutdown sequence that never resolves or rejects (e.g., awaiting a response that never comes)
  • warningThird-party library that binds to `process.on('exit')` and adds its own handle, preventing exit
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildWrap shutdown logic in a Promise with a timeout that calls `process.exit(1)` if exceeded
  • buildExplicitly destroy all active connections: `server.close(cb)` and iterate over `server.connections` (or `server.getConnections()`) to destroy each
  • buildFor database clients, call `client.quit()` or `pool.end()` with a timeout
  • buildClear all active timers using `clearInterval` and `clearTimeout` on a tracked list
  • buildUse `process.exit(0)` only after all cleanup promises have settled, but always have a fallback timeout
  • buildUse `why-is-node-running` module in development to list active handles at shutdown
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedSend SIGTERM and confirm process exits within 2 seconds (use `timeout 2 kill -TERM <PID>`)
  • verifiedCheck logs: all cleanup callbacks should fire sequentially before exit
  • verifiedRun with `NODE_ENV=production` and verify Docker stop succeeds in <5 seconds
  • verifiedUse `strace` to confirm no lingering write/read syscalls after shutdown signal
  • verifiedMonitor `process._getActiveHandles()` before and after shutdown: count should drop to zero
  • verifiedTest with multiple concurrent connections and ensure all are cleaned up
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningCalling `process.exit()` inside the shutdown handler before closing handles—it just bypasses cleanup
  • warningUsing `server.close()` without a callback and assuming it's synchronous
  • warningIgnoring unhandled promise rejections during shutdown (they prevent exit)
  • warningSetting a shutdown timeout too long (default Docker stop grace period is 10s)
  • warningAssuming SIGTERM is always sent—some orchestration tools send SIGKILL directly
  • warningNot handling `EADDRINUSE` if the app restarts quickly after a non-graceful shutdown
( 07 )War story

The 30-Second Kubernetes Pod That Wouldn't Die

Senior Backend EngineerNode.js 18, Express 4.18, Redis (ioredis), PostgreSQL (pg), Kubernetes on GKE, Docker

Timeline

  1. 09:15PagerDuty alert: pod 'api-v2-85f7b' stuck in Terminating state for >2 minutes
  2. 09:17kubectl describe pod shows last state was Running, then SIGTERM sent 90s ago
  3. 09:20Ssh into node, run `docker inspect <container>` — Status is 'running', no exit
  4. 09:22Attach to container with `docker exec -it <container> bash`, then `ps aux` shows node process with PID 1
  5. 09:25Run `node -e "process._getActiveHandles().forEach(h=>console.log(h.constructor.name))"` inside container — outputs: Socket, Timer, Socket
  6. 09:28Check codebase: shutdown handler calls server.close() and Redis.quit(), but doesn't destroy active HTTP connections
  7. 09:30Fix: iterate over server.connections and destroy each before calling server.close()
  8. 09:35Deploy fix, send SIGTERM manually — process exits in <1 second

We had just rolled out a new microservice for user preferences. Everything looked fine until we tried to scale down the deployment. Pods would get stuck in Terminating state for over a minute, eventually killed by Kubernetes with a SIGKILL. The application logs showed the shutdown handler was invoked, but it never completed. I started by confirming the signal was received: added a log right after the server.close() call. That log appeared, but then nothing—no exit, no error.

I attached to the container and used Node's internal diagnostic tools. `process._getActiveHandles()` revealed two Sockets and a Timer. One socket was the PostgreSQL pool, which we closed properly. The other was an open HTTP connection from a keep-alive client that hadn't sent a request in 30 seconds. The timer was a setInterval for a health check that we forgot to clear. The shutdown handler was stuck waiting for server.close() to finish, but that couldn't happen because the open connection prevented the server from draining.

The fix was straightforward: before calling server.close(), we iterate over all active connections using `server.getConnections()` and call `socket.destroy()` on each. We also cleared the health check interval in the shutdown sequence. After deploying, we verified by sending SIGTERM and watching the process exit within milliseconds. We also added a timeout wrapper that forces `process.exit(1)` after 5 seconds as a safety net. The lesson: always track handles explicitly and use Node's debugging APIs to find the ones that linger.

Root cause

An active keep-alive HTTP connection and an uncleared setInterval timer prevented the event loop from draining, causing server.close() to never complete its callback.

The fix

Destroy all open connections before calling server.close() and clear all timers/intervals in the shutdown handler. Added a forced exit timeout of 5 seconds.

The lesson

Always use `process._getActiveHandles()` during shutdown debugging. Never assume server.close() will complete if there are persistent connections. Implement a forced exit timeout as a safety net.

( 08 )How Node.js Event Loop Keeps the Process Alive

Node's event loop runs as long as there is at least one active handle—a reference to a timer, socket, child process, etc. When you call `process.exit()`, Node checks if there are any active handles; if yes, it waits for them to be closed (unless you call `process.exit()` with an exit code, which forces termination but skips cleanup). The common mistake is to assume that closing the HTTP server automatically closes all connections. In reality, `server.close()` stops accepting new connections but does not destroy existing ones—they must be destroyed explicitly.

You can inspect active handles at any time using `process._getActiveHandles()` and `process._getActiveRequests()`. These are undocumented but stable internal APIs that list every object keeping the loop alive. In production, I've seen developers add a SIGUSR2 handler that dumps these lists to a file for post-mortem analysis. This is the single most effective debugging technique for stuck shutdowns.

( 09 )The server.close() Misconception

Many developers expect `server.close(callback)` to immediately close all connections. In reality, it stops accepting new connections and waits for existing connections to close naturally (i.e., when the client disconnects or the socket times out). If a client uses keep-alive and holds the connection open indefinitely, the server will never close. This is especially common behind load balancers that maintain long-lived connections.

The fix is to track all active sockets and destroy them: maintain an array of sockets in the 'connection' event, and in the shutdown handler, iterate and call `socket.destroy()`. Alternatively, use `server.closeIdleConnections()` (Node 19+) which closes idle keep-alive connections. But even then, active connections (in the middle of a request) need to be handled—usually by waiting for them to finish with a timeout.

( 10 )Third-Party Libraries and Hidden Handles

Database clients, message brokers, and monitoring agents often create their own handles that you might not be aware of. For example, the Redis client (ioredis) creates a TCP socket and a timer for keep-alive. If you call `redis.quit()`, it sends a QUIT command and waits for a response, which may never come if the server is unreachable. Similarly, the `pg` pool creates multiple sockets that are reused; calling `pool.end()` will close them but may hang if there are pending queries.

The safest approach is to wrap every third-party cleanup call in a Promise with a timeout. For example, `Promise.race([redis.quit(), timeout(2000)])`. Also, check the library's documentation for specific shutdown methods—some provide a `force` option to close immediately without waiting.

( 11 )Docker and Kubernetes Signal Handling Pitfalls

In Docker, the process running as PID 1 receives signals directly. But if your Node app is started via a shell script (e.g., `CMD node app.js`), the shell is PID 1 and may not forward signals to Node. Always use the `exec` form in Dockerfile: `CMD ["node", "app.js"]`. Alternatively, use a lightweight init system like `tini` to handle signal forwarding.

Kubernetes sends SIGTERM and waits for the terminationGracePeriodSeconds (default 30s). If the process doesn't exit, it sends SIGKILL. Your shutdown handler must complete within that window. Add a forced exit after 25 seconds to avoid data corruption from the SIGKILL. Also, ensure your readiness probe stops returning success before shutdown begins, otherwise the kubelet may keep sending traffic.

Frequently asked questions

How do I find what's keeping the event loop alive during shutdown?

Run `process._getActiveHandles()` inside your shutdown handler (or via a signal handler). It returns an array of handle objects. Check the constructor name to identify sockets, timers, etc. For a persistent solution, use the `why-is-node-running` npm module which provides a stack trace for each handle.

Should I call process.exit() after cleanup?

Yes, but only as a last resort after all cleanup promises have settled or a timeout has fired. Calling `process.exit(0)` without closing handles may cause data loss (e.g., unsent database writes). However, it's better to have a forced exit after a timeout than to hang forever. Use a pattern like: `Promise.race([cleanup(), timeout(5000)]).finally(() => process.exit(0))`.

Why does my server.close() never call its callback?

The callback will not fire until all connections are closed. If there's an active keep-alive connection, it will wait indefinitely. You must either destroy all connections before calling close or set a timeout. In Node 19+, use `server.closeIdleConnections()` to close idle ones first.

How do I handle WebSocket connections during shutdown?

For WebSocket servers (e.g., ws), listen to the 'connection' event and store each socket. In shutdown, iterate over them and call `socket.terminate()` or `socket.close()`. Then close the WebSocket server. For Socket.IO, call `io.close()` which handles cleanup internally, but still may hang if there are active transports—use a timeout.

Can unhandled promise rejections prevent graceful shutdown?

Yes, if a rejection occurs during shutdown and there's no `.catch()`, Node logs a warning but does not exit. However, if the rejection causes an error in your shutdown promise chain, it may prevent the chain from completing. Always add `.catch()` to every promise in your shutdown sequence, or use `process.on('unhandledRejection', (err) => { /* log and force exit */ })`.