What this usually means
Go's net/http client has a default timeout of zero, meaning no timeout at all. When you see a timeout error, either you've set a client-level timeout, a context deadline, or a per-request timeout. The underlying cause can be a slow server, network congestion, DNS resolution delays, TLS handshake latency, or connection pool exhaustion. Often the timeout is triggered by the cumulative time of multiple retries or redirects, not just a single round trip.
The first ten minutes — establish facts before touching code.
- 1Check the exact error message: `curl -v --connect-timeout 5 --max-time 10 $URL` to reproduce externally
- 2Log the time before and after http.Do() to confirm which phase exceeds the threshold: `t := time.Now(); resp, err := client.Do(req); log.Printf("took %v", time.Since(t))`
- 3Inspect client.Timeout, Transport.DialContext timeout, and context deadline with `fmt.Printf("%+v\n", client)`
- 4Look at the number of idle connections in the pool: `transport.CloseIdleConnections()` temporarily to see if it helps
- 5Check DNS resolution time with `dig +stats $HOSTNAME` or `nslookup $HOSTNAME`
- 6Monitor goroutine count: `pprof.Lookup("goroutine").WriteTo(os.Stdout, 1)` to detect goroutine leaks in the transport
The specific files, logs, configs, and dashboards that usually own this bug.
- search/etc/hosts and /etc/resolv.conf for DNS misconfiguration
- search`net/http/httptrace` output to instrument each HTTP phase
- search`GODEBUG=netdns=go` to force Go DNS resolver and log lookups
- search`pprof` goroutine profile for stuck goroutines in `net/http.(*persistConn).writeLoop`
- search`ss -tnp` or `netstat -tnp` to inspect TCP connections in TIME_WAIT or CLOSE_WAIT
- searchApplication logs around the timeout timestamp, especially upstream service logs
- search`transport.MaxIdleConnsPerHost` and `transport.MaxConnsPerHost` settings
Practical causes, not theory. These are the things you will actually find.
- warningDefault zero timeout: `http.Client{}` with no Timeout or Transport fields
- warningContext deadline shorter than the actual request round-trip time
- warningDNS resolver slow or unresponsive (e.g., /etc/resolv.conf points to unreachable DNS server)
- warningTLS handshake delay due to OCSP stapling or certificate revocation checks
- warningConnection pool exhaustion: `MaxIdleConnsPerHost` too low causing serial connection creation
- warningServer-side slow response or hanging after receiving request body
Concrete fix directions. Pick the one that matches your root cause.
- buildAlways set a client-level Timeout, e.g., `&http.Client{Timeout: 30 * time.Second}`
- buildUse context.WithTimeout for granular per-request deadlines, not just client timeout
- buildTune Transport: `&http.Transport{DialContext: (&net.Dialer{Timeout: 5 * time.Second}).DialContext, MaxIdleConns: 100, MaxIdleConnsPerHost: 10, MaxConnsPerHost: 0}`
- buildDisable keep-alives for short-lived clients: `Transport.DisableKeepAlives: true`
- buildImplement retry logic with exponential backoff and jitter, but respect context cancellation
- buildSet `ResponseHeaderTimeout` and `ExpectContinueTimeout` in Transport for finer control
A fix you cannot prove is a guess. Close the loop.
- verifiedRun `curl -w "@curl-format.txt" -o /dev/null -s $URL` to measure timing breakdown
- verifiedUse `httptrace` client trace to log each event: `gotconn`, `gotfirstresponsebyte`, etc.
- verifiedLoad test with `wrk -t2 -c50 -d30s $URL` and monitor p99 latency and error rate
- verifiedCheck `net/http/pprof` debug endpoint for connection pool stats
- verifiedDeploy a canary with the fix and compare p99 latency and timeout rate in dashboards
Things that make this bug worse or harder to find.
- warningSetting only a context deadline but no client timeout, causing goroutine leaks on slow connections
- warningCreating a new http.Client on every request, bypassing connection reuse entirely
- warningIgnoring the value of `Transport.ResponseHeaderTimeout` when server sends headers slowly
- warningUsing `defer resp.Body.Close()` without reading the body, which prevents connection reuse
- warningSetting `Timeout` at client level but not accounting for time spent in redirects or retries
- warningForgetting to cancel the context after the response is processed, leading to resource leaks
The Phantom 30-Second Timeout in a High-Throughput API Gateway
Timeline
- 09:00PagerDuty alert: p99 latency spikes to 30s for POST /api/orders
- 09:05Check Grafana: error rate 5%, timeout errors 'Client.Timeout exceeded'
- 09:10Review code: http.Client{Timeout: 30 * time.Second} – seems fine
- 09:20SSH into pod, curl endpoint: always succeeds under 200ms
- 09:35Check upstream service logs: no slow queries, response times normal
- 09:50Examine goroutine dump: 500 goroutines stuck in 'writeLoop'
- 10:00Found it: Transport.MaxIdleConnsPerHost defaulted to 2, causing queueing
- 10:05Hotfix: set MaxIdleConnsPerHost=100, rolled out canary
- 10:15p99 drops to 200ms, errors vanish
We were running a Go API gateway proxying requests to a dozen internal services. Around 9 AM, alerts fired: p99 latency for one endpoint jumped from 200ms to 30s. The error message was 'net/http: request canceled (Client.Timeout exceeded)'. Our client had a 30-second timeout, so that made sense. But why were requests taking that long?
I checked the upstream service – it was responding in under 100ms. The network was fine. I tried curling the endpoint from within the pod – instant success. So the problem wasn't the network or the upstream. I grabbed a goroutine dump and saw hundreds of goroutines stuck in `net/http.(*persistConn).writeLoop`. That's the goroutine that writes requests to the TCP connection. They were all waiting for a free connection.
The default `http.Transport` sets `MaxIdleConnsPerHost` to 2. With 50 concurrent requests to the same host, only 2 connections were reused; the rest had to create new connections serially, and the connection creation was bottlenecked by the DNS resolver and TLS handshake. The fix was to increase `MaxIdleConnsPerHost` to a reasonable value (100) and also set `MaxConnsPerHost` to 0 (unlimited) to allow more concurrent connections. After the change, latency dropped back to normal.
Root cause
Default `MaxIdleConnsPerHost=2` in http.Transport caused connection pool exhaustion under high concurrency, forcing serial connection creation and queueing.
The fix
Set Transport.MaxIdleConnsPerHost=100 and MaxConnsPerHost=0 on the http.Client.
The lesson
Always tune the transport defaults for production workloads. The Go net/http defaults are safe for single-user tools but not for high-concurrency servers.
Go's net/http client has multiple timeout knobs that interact in non-obvious ways. At the top level is `Client.Timeout`, which sets a cumulative timeout for the entire request (including DNS, dial, TLS, headers, and body). Under the hood, the Transport has its own dial timeout (`DialContext`), `TLSHandshakeTimeout`, `ResponseHeaderTimeout`, and `ExpectContinueTimeout`. Additionally, a `context.Context` can impose a deadline that overrides everything.
The critical nuance: `Client.Timeout` works by creating a context internally, but it does NOT cancel the underlying transport goroutines. If the timeout fires while the request is still in flight, the context cancels, but the goroutine doing the dial or read may continue running until it hits its own timeout. This can cause goroutine leaks if you have many timeouts.
To get deterministic behavior, set explicit timeouts at every layer: dial timeout (e.g., 5s), TLS handshake (e.g., 5s), response header timeout (e.g., 10s), and overall client timeout (e.g., 30s). Also use `context.WithTimeout` for per-request deadlines, and always call `cancel()` to release resources.
The default `http.Transport` allows up to 100 idle connections total, but only 2 idle connections per host (`MaxIdleConnsPerHost`). When you have many concurrent requests to the same host, the pool can only reuse 2 connections; the rest must create new connections. Creating a new connection involves DNS resolution, TCP handshake, and TLS handshake, which are slow and can queue up.
If `MaxConnsPerHost` is also set (default is 0 meaning unlimited), you can limit the total number of connections per host, which can prevent resource exhaustion but also cause queuing. The key is to set `MaxIdleConnsPerHost` high enough to match your concurrency, and optionally set `MaxConnsPerHost` to a reasonable limit like 100.
Check connection pool stats via `net/http/pprof`: `curl http://localhost:6060/debug/pprof/goroutine?debug=2 | grep -E "(persistConn|writeLoop)"` to see stuck goroutines. Also instrument `httptrace` for `gotconn` and `putidleconn` events.
Go's default DNS resolver uses the system resolver (cgo) on Unix, which respects `/etc/resolv.conf` but can block on a slow DNS server. The Go pure Go resolver (`GODEBUG=netdns=go`) uses its own caching with a hardcoded TTL of 30 seconds (not configurable). This can cause repeated lookups within a short window.
A slow DNS server adds latency to every new connection. If your connections are not reused (e.g., due to low idle conns), each request pays the DNS penalty. Use `dig +stats` to measure resolver performance. Consider using a local DNS cache like `dnsmasq` or a dedicated resolver.
Also check for DNS timeouts: if the resolver times out, Go may retry with a different server, doubling the delay. Set `DialContext` with a custom dialer that has a short timeout to fail fast.
When `Client.Timeout` fires, Go cancels the context but the underlying transport goroutine (e.g., `writeLoop`, `readLoop`) may still be blocked on I/O. If the server never closes the connection, those goroutines can live forever, causing a slow leak. Over hours, the number of goroutines grows, consuming memory and eventually causing OOM.
To prevent this, always set a deadline on the underlying dialer and transport timeouts. Also ensure that `resp.Body` is fully read and closed, even on error. Use `io.Copy(ioutil.Discard, resp.Body)` before closing if you don't need the body.
Monitor goroutine count with `runtime.NumGoroutine()` or via pprof. A steady increase over time indicates a leak. The stuck goroutines typically appear as `readLoop` or `writeLoop` in the stack trace.
`net/http/httptrace` lets you hook into each phase of an HTTP request: DNS lookup, TCP dial, TLS handshake, connection reuse, request write, response read. Wrapping the transport with a custom `RoundTripper` that logs timing can pinpoint where time is spent.
Example: create a `loggingTransport` that wraps the default transport and logs the duration of each `RoundTrip` call. Then use `httptrace.ClientTrace` to get granular timings. This is especially useful when the error is intermittent.
Another trick: set `GODEBUG=http2debug=1` to get verbose HTTP/2 logging, which can reveal stream-level timeouts.
Frequently asked questions
What is the difference between http.Client.Timeout and context.WithTimeout?
`Client.Timeout` is a cumulative timeout that covers the entire request (including redirects and retries). It works by internally creating a context with that deadline. `context.WithTimeout` gives you per-request control and can be canceled explicitly. If both are set, the shorter one wins. However, `Client.Timeout` does not cancel underlying transport goroutines, so a context is safer for avoiding leaks.
Why does my http.Client timeout even when the server responds quickly?
This often happens due to connection pool exhaustion. If `MaxIdleConnsPerHost` is low, new requests queue up waiting for a free connection. The timeout includes the queueing time. Check the number of idle connections and increase the pool size. Also, slow DNS or TLS handshake on new connections can add up.
How do I set a timeout for just the connection establishment?
Set a custom `DialContext` on the Transport: `DialContext: (&net.Dialer{Timeout: 5 * time.Second}).DialContext`. This limits TCP dial time. For TLS, set `TLSHandshakeTimeout` on the Transport. For response headers, use `ResponseHeaderTimeout`. These are independent of `Client.Timeout`.
Can ioutil.ReadAll cause a timeout?
Yes, if the server sends a large response slowly, `ReadAll` blocks until all data is received. This counts toward `Client.Timeout`. To avoid this, read the body with a timeout using `io.LimitReader` or a custom reader that respects a context. Alternatively, use `resp.Body` with a `context.Context` and `io.Copy` with a deadline.
Why do I get 'connection refused' instead of a timeout?
'Connection refused' means the server is not listening on the port, so the TCP handshake fails immediately (RST packet). This is different from a timeout, which occurs when the server doesn't respond at all. Make sure the host and port are correct and the server is running.