Go HTTP Client Timeout Errors: Debugging Guide

What this usually means

Go's net/http client has a default timeout of zero, meaning no timeout at all. When you see a timeout error, either you've set a client-level timeout, a context deadline, or a per-request timeout. The underlying cause can be a slow server, network congestion, DNS resolution delays, TLS handshake latency, or connection pool exhaustion. Often the timeout is triggered by the cumulative time of multiple retries or redirects, not just a single round trip.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

1Check the exact error message: `curl -v --connect-timeout 5 --max-time 10 $URL` to reproduce externally
2Log the time before and after http.Do() to confirm which phase exceeds the threshold: `t := time.Now(); resp, err := client.Do(req); log.Printf("took %v", time.Since(t))`
3Inspect client.Timeout, Transport.DialContext timeout, and context deadline with `fmt.Printf("%+v\n", client)`
4Look at the number of idle connections in the pool: `transport.CloseIdleConnections()` temporarily to see if it helps
5Check DNS resolution time with `dig +stats $HOSTNAME` or `nslookup $HOSTNAME`
6Monitor goroutine count: `pprof.Lookup("goroutine").WriteTo(os.Stdout, 1)` to detect goroutine leaks in the transport

( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

search/etc/hosts and /etc/resolv.conf for DNS misconfiguration
search`net/http/httptrace` output to instrument each HTTP phase
search`GODEBUG=netdns=go` to force Go DNS resolver and log lookups
search`pprof` goroutine profile for stuck goroutines in `net/http.(*persistConn).writeLoop`
search`ss -tnp` or `netstat -tnp` to inspect TCP connections in TIME_WAIT or CLOSE_WAIT
searchApplication logs around the timeout timestamp, especially upstream service logs
search`transport.MaxIdleConnsPerHost` and `transport.MaxConnsPerHost` settings

( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

warningDefault zero timeout: `http.Client{}` with no Timeout or Transport fields
warningContext deadline shorter than the actual request round-trip time
warningDNS resolver slow or unresponsive (e.g., /etc/resolv.conf points to unreachable DNS server)
warningTLS handshake delay due to OCSP stapling or certificate revocation checks
warningConnection pool exhaustion: `MaxIdleConnsPerHost` too low causing serial connection creation
warningServer-side slow response or hanging after receiving request body

( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

buildAlways set a client-level Timeout, e.g., `&http.Client{Timeout: 30 * time.Second}`
buildUse context.WithTimeout for granular per-request deadlines, not just client timeout
buildTune Transport: `&http.Transport{DialContext: (&net.Dialer{Timeout: 5 * time.Second}).DialContext, MaxIdleConns: 100, MaxIdleConnsPerHost: 10, MaxConnsPerHost: 0}`
buildDisable keep-alives for short-lived clients: `Transport.DisableKeepAlives: true`
buildImplement retry logic with exponential backoff and jitter, but respect context cancellation
buildSet `ResponseHeaderTimeout` and `ExpectContinueTimeout` in Transport for finer control

( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

verifiedRun `curl -w "@curl-format.txt" -o /dev/null -s $URL` to measure timing breakdown
verifiedUse `httptrace` client trace to log each event: `gotconn`, `gotfirstresponsebyte`, etc.
verifiedLoad test with `wrk -t2 -c50 -d30s $URL` and monitor p99 latency and error rate
verifiedCheck `net/http/pprof` debug endpoint for connection pool stats
verifiedDeploy a canary with the fix and compare p99 latency and timeout rate in dashboards

( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

warningSetting only a context deadline but no client timeout, causing goroutine leaks on slow connections
warningCreating a new http.Client on every request, bypassing connection reuse entirely
warningIgnoring the value of `Transport.ResponseHeaderTimeout` when server sends headers slowly
warningUsing `defer resp.Body.Close()` without reading the body, which prevents connection reuse
warningSetting `Timeout` at client level but not accounting for time spent in redirects or retries
warningForgetting to cancel the context after the response is processed, leading to resource leaks

( 07 )War story

The Phantom 30-Second Timeout in a High-Throughput API Gateway

Backend Platform EngineerGo 1.18, net/http, Kubernetes, Envoy sidecar, PostgreSQL

Timeline

09:00PagerDuty alert: p99 latency spikes to 30s for POST /api/orders
09:05Check Grafana: error rate 5%, timeout errors 'Client.Timeout exceeded'
09:10Review code: http.Client{Timeout: 30 * time.Second} – seems fine
09:20SSH into pod, curl endpoint: always succeeds under 200ms
09:35Check upstream service logs: no slow queries, response times normal
09:50Examine goroutine dump: 500 goroutines stuck in 'writeLoop'
10:00Found it: Transport.MaxIdleConnsPerHost defaulted to 2, causing queueing
10:05Hotfix: set MaxIdleConnsPerHost=100, rolled out canary
10:15p99 drops to 200ms, errors vanish

We were running a Go API gateway proxying requests to a dozen internal services. Around 9 AM, alerts fired: p99 latency for one endpoint jumped from 200ms to 30s. The error message was 'net/http: request canceled (Client.Timeout exceeded)'. Our client had a 30-second timeout, so that made sense. But why were requests taking that long?

I checked the upstream service – it was responding in under 100ms. The network was fine. I tried curling the endpoint from within the pod – instant success. So the problem wasn't the network or the upstream. I grabbed a goroutine dump and saw hundreds of goroutines stuck in `net/http.(*persistConn).writeLoop`. That's the goroutine that writes requests to the TCP connection. They were all waiting for a free connection.

The default `http.Transport` sets `MaxIdleConnsPerHost` to 2. With 50 concurrent requests to the same host, only 2 connections were reused; the rest had to create new connections serially, and the connection creation was bottlenecked by the DNS resolver and TLS handshake. The fix was to increase `MaxIdleConnsPerHost` to a reasonable value (100) and also set `MaxConnsPerHost` to 0 (unlimited) to allow more concurrent connections. After the change, latency dropped back to normal.

Root cause

Default `MaxIdleConnsPerHost=2` in http.Transport caused connection pool exhaustion under high concurrency, forcing serial connection creation and queueing.

The fix

Set Transport.MaxIdleConnsPerHost=100 and MaxConnsPerHost=0 on the http.Client.

The lesson

Always tune the transport defaults for production workloads. The Go net/http defaults are safe for single-user tools but not for high-concurrency servers.

( 08 )Go HTTP Client Timeout Layering

Go's net/http client has multiple timeout knobs that interact in non-obvious ways. At the top level is `Client.Timeout`, which sets a cumulative timeout for the entire request (including DNS, dial, TLS, headers, and body). Under the hood, the Transport has its own dial timeout (`DialContext`), `TLSHandshakeTimeout`, `ResponseHeaderTimeout`, and `ExpectContinueTimeout`. Additionally, a `context.Context` can impose a deadline that overrides everything.

The critical nuance: `Client.Timeout` works by creating a context internally, but it does NOT cancel the underlying transport goroutines. If the timeout fires while the request is still in flight, the context cancels, but the goroutine doing the dial or read may continue running until it hits its own timeout. This can cause goroutine leaks if you have many timeouts.

To get deterministic behavior, set explicit timeouts at every layer: dial timeout (e.g., 5s), TLS handshake (e.g., 5s), response header timeout (e.g., 10s), and overall client timeout (e.g., 30s). Also use `context.WithTimeout` for per-request deadlines, and always call `cancel()` to release resources.

( 09 )Connection Pool Exhaustion: The Silent Killer

The default `http.Transport` allows up to 100 idle connections total, but only 2 idle connections per host (`MaxIdleConnsPerHost`). When you have many concurrent requests to the same host, the pool can only reuse 2 connections; the rest must create new connections. Creating a new connection involves DNS resolution, TCP handshake, and TLS handshake, which are slow and can queue up.

If `MaxConnsPerHost` is also set (default is 0 meaning unlimited), you can limit the total number of connections per host, which can prevent resource exhaustion but also cause queuing. The key is to set `MaxIdleConnsPerHost` high enough to match your concurrency, and optionally set `MaxConnsPerHost` to a reasonable limit like 100.

Check connection pool stats via `net/http/pprof`: `curl http://localhost:6060/debug/pprof/goroutine?debug=2 | grep -E "(persistConn|writeLoop)"` to see stuck goroutines. Also instrument `httptrace` for `gotconn` and `putidleconn` events.

( 10 )DNS Resolution Delays and Caching

Go's default DNS resolver uses the system resolver (cgo) on Unix, which respects `/etc/resolv.conf` but can block on a slow DNS server. The Go pure Go resolver (`GODEBUG=netdns=go`) uses its own caching with a hardcoded TTL of 30 seconds (not configurable). This can cause repeated lookups within a short window.

A slow DNS server adds latency to every new connection. If your connections are not reused (e.g., due to low idle conns), each request pays the DNS penalty. Use `dig +stats` to measure resolver performance. Consider using a local DNS cache like `dnsmasq` or a dedicated resolver.

Also check for DNS timeouts: if the resolver times out, Go may retry with a different server, doubling the delay. Set `DialContext` with a custom dialer that has a short timeout to fail fast.

( 11 )Goroutine Leaks from Timeout Errors

When `Client.Timeout` fires, Go cancels the context but the underlying transport goroutine (e.g., `writeLoop`, `readLoop`) may still be blocked on I/O. If the server never closes the connection, those goroutines can live forever, causing a slow leak. Over hours, the number of goroutines grows, consuming memory and eventually causing OOM.

To prevent this, always set a deadline on the underlying dialer and transport timeouts. Also ensure that `resp.Body` is fully read and closed, even on error. Use `io.Copy(ioutil.Discard, resp.Body)` before closing if you don't need the body.

Monitor goroutine count with `runtime.NumGoroutine()` or via pprof. A steady increase over time indicates a leak. The stuck goroutines typically appear as `readLoop` or `writeLoop` in the stack trace.

( 12 )Debugging with httptrace and Custom RoundTripper

`net/http/httptrace` lets you hook into each phase of an HTTP request: DNS lookup, TCP dial, TLS handshake, connection reuse, request write, response read. Wrapping the transport with a custom `RoundTripper` that logs timing can pinpoint where time is spent.

Example: create a `loggingTransport` that wraps the default transport and logs the duration of each `RoundTrip` call. Then use `httptrace.ClientTrace` to get granular timings. This is especially useful when the error is intermittent.

Another trick: set `GODEBUG=http2debug=1` to get verbose HTTP/2 logging, which can reveal stream-level timeouts.

Frequently asked questions

What is the difference between http.Client.Timeout and context.WithTimeout?

`Client.Timeout` is a cumulative timeout that covers the entire request (including redirects and retries). It works by internally creating a context with that deadline. `context.WithTimeout` gives you per-request control and can be canceled explicitly. If both are set, the shorter one wins. However, `Client.Timeout` does not cancel underlying transport goroutines, so a context is safer for avoiding leaks.

Why does my http.Client timeout even when the server responds quickly?

This often happens due to connection pool exhaustion. If `MaxIdleConnsPerHost` is low, new requests queue up waiting for a free connection. The timeout includes the queueing time. Check the number of idle connections and increase the pool size. Also, slow DNS or TLS handshake on new connections can add up.

How do I set a timeout for just the connection establishment?

Set a custom `DialContext` on the Transport: `DialContext: (&net.Dialer{Timeout: 5 * time.Second}).DialContext`. This limits TCP dial time. For TLS, set `TLSHandshakeTimeout` on the Transport. For response headers, use `ResponseHeaderTimeout`. These are independent of `Client.Timeout`.

Can ioutil.ReadAll cause a timeout?

Yes, if the server sends a large response slowly, `ReadAll` blocks until all data is received. This counts toward `Client.Timeout`. To avoid this, read the body with a timeout using `io.LimitReader` or a custom reader that respects a context. Alternatively, use `resp.Body` with a `context.Context` and `io.Copy` with a deadline.

Why do I get 'connection refused' instead of a timeout?

'Connection refused' means the server is not listening on the port, so the TCP handshake fails immediately (RST packet). This is different from a timeout, which occurs when the server doesn't respond at all. Make sure the host and port are correct and the server is running.

Debugging Go HTTP Client Timeout Errors: Beyond the Obvious

What this usually means

Frequently asked questions