WebSocket debugging is a different beast from HTTP. With HTTP, you get request-response pairs, status codes, and clear error messages. With WebSocket, you get a persistent stream of frames, silent disconnects, and a spec that allows servers to close connections with status codes that browsers ignore.
I've spent the last three years building real-time features — chat, live notifications, collaborative editing — and I've collected a set of debugging patterns that go beyond console.log. This post covers packet capture, reconnect storms, and the silent failures that don't raise exceptions but break user experience.
War Story: The Silent Disconnect That Killed Notifications
We had a production incident: users stopped getting real-time notifications. The WebSocket connection appeared healthy — no error events, no close events in the browser. The status showed "connected". But no messages arrived.
I started by checking the server logs: no errors, and the server thought the client was still connected. Then I looked at the network tab. The WebSocket frames showed outgoing pings, but no pongs coming back. The browser's WebSocket API doesn't expose a timeout for pings — it just sits there waiting forever. The connection was essentially dead but not closed.
The WebSocket spec defines Ping and Pong frames, but the browser's JavaScript API does not expose a callback for them. A server can stop responding to pings, and the client will never know unless you implement your own heartbeat layer.
The root cause: a load balancer had a 60-second idle timeout for WebSocket connections. The server was sending pings every 55 seconds, but the timeout was exactly 60 seconds. A slight network jitter made the pings arrive at 61 seconds, the load balancer dropped the connection, but neither side got a FIN frame. The connection became a zombie.
We fixed it by reducing the server ping interval to 25 seconds and added a client-side library that monitors message gaps. Now we log a warning if no message is received within 35 seconds.
Packet Capture: Seeing What the Browser Hides
Browser DevTools show WebSocket frames nicely, but they hide low-level details like TCP retransmissions, fragmented frames, and the exact timing of close frames. When debugging network-level issues, I reach for tcpdump and Wireshark.
Capture on the server side to see exactly what the server sends and receives. For example, to see all WebSocket traffic on port 8080:
sudo tcpdump -i eth0 -w websocket.pcap port 8080Open the pcap in Wireshark and apply the filter `websocket` to see only WebSocket frames. Each frame shows FIN bit, opcode (1 for text, 2 for binary, 8 for close, 9 for ping, 10 for pong), and payload length. Look for frames with FIN=0 — these are fragmented frames, which can cause issues if not reassembled correctly.
Also check the TCP stream for retransmissions. If you see many duplicate ACKs or retransmitted segments, the network is dropping packets. WebSocket runs over TCP, so packet loss means your messages are delayed or lost silently.
Using mitmproxy for HTTPS WebSocket Debugging
If your WebSocket runs over WSS (wss://), tcpdump captures encrypted traffic. Use mitmproxy to decrypt it. I run mitmproxy in transparent mode on a staging environment:
mitmproxy --listen-port 8080 --mode transparentThen configure the client to use mitmproxy as a proxy. mitmproxy's WebSocket flow view shows each frame, its direction, and timing. You can also modify or replay frames — useful for testing edge cases like malformed messages.
Reconnect Storms: A Cascading Failure Pattern
When a server restarts, all connected clients get a close event and immediately try to reconnect. If they all retry at the same fixed interval (e.g., 1 second), the server gets hammered with connection requests. This is a reconnect storm.
I've seen this take down a service: the server is trying to start up, but it's overwhelmed by 10,000 incoming connections in the first second. Each connection requires TLS handshake, memory allocation, and authentication. The server becomes unresponsive, clients time out and retry, making things worse.
Implement exponential backoff with random jitter. Start with a base delay of 1 second, multiply by 2 each attempt, and add a random offset up to 50% of the current delay. Cap the maximum delay at 30 seconds. This spreads reconnection attempts over time.
function reconnectDelay(attempt) {
const base = 1000; // 1 second
const max = 30000; // 30 seconds
const exponential = Math.min(base * Math.pow(2, attempt), max);
const jitter = exponential * Math.random() * 0.5;
return exponential + jitter;
}Also add a server-side rate limiter for WebSocket connections. Reject connections with a 429 status if the rate exceeds a threshold per IP. This protects the server during a storm.
Monitoring State Transitions
The WebSocket API has four states: CONNECTING, OPEN, CLOSING, CLOSED. Most developers only log open and close events. I log every state change along with the timestamp and reason. This helps identify patterns like rapid connect-disconnect cycles (a sign of connection thrashing) or long periods in CONNECTING (network issues).
Here's a quick example of logging state transitions in a client:
const ws = new WebSocket('wss://example.com');
ws.onopen = () => console.log('WS state:', ws.readyState, 'OPEN');
ws.onclose = (e) => console.log('WS state:', ws.readyState, 'CLOSED', 'code:', e.code, 'reason:', e.reason);
ws.onerror = (e) => console.log('WS state:', ws.readyState, 'ERROR', e);
// Monitor state changes manually
const originalClose = ws.close.bind(ws);
ws.close = function(code, reason) {
console.log('WS state:', ws.readyState, 'CLOSING (manual)');
return originalClose(code, reason);
};Silent Failures: When No Error Is the Worst Error
Silent failures happen when the connection appears open but messages stop flowing. Common causes: load balancer timeouts (like my war story), server-side uncaught exceptions that close the connection without a close frame, or network partitions where both sides think the connection is alive.
The only reliable way to detect silent failures is an application-level heartbeat. Send a ping message at a fixed interval, expect a pong reply within a timeout. If the pong doesn't arrive, close the connection and reconnect. Do not rely on the WebSocket protocol's ping/pong because the browser doesn't expose it.
Implement a heartbeat with sequence numbers. Include a counter in each ping message. The server echoes the counter in the pong. If the client receives a pong with an unexpected counter (e.g., if a pong arrives after a reconnection), it can detect stale responses.
Tools Summary
- arrow_rightWireshark: packet-level analysis, especially for TCP retransmissions and close frame inspection.
- arrow_rightmitmproxy: decrypt and modify WSS traffic in staging.
- arrow_rightBrowser DevTools: quick frame inspection and state logging.
- arrow_rightCustom heartbeat library: essential for production monitoring.
- arrow_rightPrometheus + Grafana: track WebSocket connection counts, message latency, and reconnection rates.
Debugging WebSocket issues requires a layered approach: from packet capture at the network level to application-level heartbeats. The most elusive bugs are the ones that don't throw errors — they just degrade the user experience silently. Build your monitoring to detect those early.
Next time you see a WebSocket that says "connected" but doesn't deliver messages, don't trust the green dot. Check the packet log.
Frequently asked questions
How do I capture WebSocket frames with tcpdump?
Run `sudo tcpdump -i eth0 -w websocket.pcap port 8080` to capture traffic on port 8080. Then open the .pcap in Wireshark and apply the filter `websocket` to see individual frames. Look for FIN and opcode to identify text/binary frames.
What causes WebSocket reconnection storms?
When a server restarts, all clients disconnect simultaneously and attempt to reconnect at the same time if using a fixed retry interval. This floods the server with connection requests, leading to resource exhaustion and cascading failures. Exponential backoff with random jitter spreads out the retries.
How can I detect silent WebSocket failures?
Implement an application-level heartbeat: the client sends a ping every 30 seconds, and the server must reply with a pong within 5 seconds. If a pong is missed, the client closes the connection and reconnects. Also, track sequence numbers on messages to detect gaps.
What tools can I use to debug WebSocket traffic?
Wireshark for packet-level analysis, mitmproxy for intercepting HTTPS WebSocket connections in development, and browser DevTools for inspecting frames and state changes. For programmatic debugging, add logging for every state transition (open, message, error, close).