What this usually means
Retry logic assumes the first request failed. But in distributed systems, 'no response' does not mean 'failed'. The server might have processed the request successfully but the response was lost to a network blip, a load balancer timeout, or a slow database. When your client retries, the server sees a second, identical request and processes it as a new one. The fix is not to remove retries — it is to make the operation idempotent so a duplicate is harmless.
The first ten minutes \u2014 establish facts before touching code.
- 1Check if the server has an idempotency mechanism. Does it accept an idempotency key? If so, the client must generate and send one.
- 2Check the retry condition. Are you retrying on any error, or only on specific transient errors (408, 429, 5xx)? Retrying on 400 or 409 creates duplicates.
- 3Look at the timing. If the retry fires before the first request's timeout, both requests can succeed. Retry delay should exceed the expected request duration.
- 4Search server logs for two identical requests within a short window (same payload, same user, sub-second apart).
- 5Check if the retry library uses exponential backoff with jitter. Without jitter, many clients retry simultaneously after a throttling event, creating a thundering herd.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchClient retry configuration — max retries, retry delay, backoff strategy, retryable status codes
- searchIdempotency key implementation — is the client sending one? Is the server checking it?
- searchServer logs — search for duplicate payloads within the retry window
- searchDatabase — look for duplicate records with near-identical timestamps
- searchLoad balancer or API gateway timeout settings — if shorter than the app's processing time, the LB returns 502 while the app succeeds
- searchNetwork monitoring — packet loss between client and server
Practical causes, not theory. These are the things you will actually find.
- warningNo idempotency key on the request — the server cannot detect duplicates
- warningRetry fires on timeout, but the server already processed the request
- warningLoad balancer timeout is shorter than the application processing time
- warningRetry condition is too broad — retrying on 400-level errors that should not be retried
- warningFailed response parsing is treated as a failed request
- warningRetry library does not use jitter, causing synchronised retry storms after rate limiting
Concrete fix directions. Pick the one that matches your root cause.
- buildAdd an idempotency key to every mutating request. The client generates a unique key per operation. The server stores it and returns the cached response on duplicates.
- buildSet the retry condition to only retry on 408, 429, 500, 502, 503, 504 — not on 400, 401, 403, 404, 409.
- buildIncrease the load balancer or proxy timeout to exceed the maximum expected request processing time.
- buildUse exponential backoff with jitter for retry delays.
- buildOn the server, use a database unique constraint as a safety net — it catches duplicates even if the idempotency layer fails.
A fix you cannot prove is a guess. Close the loop.
- verifiedSimulate a network timeout (e.g. add a 10-second sleep to the server but have the client timeout at 5 seconds). Confirm only one operation is processed.
- verifiedSend two identical idempotency keys. Confirm the second request returns the cached response, not a duplicate operation.
- verifiedCheck the database for duplicate records after a load test with simulated timeouts.
- verifiedMonitor idempotency cache hit rate — a non-zero rate means retries are happening and being handled correctly.
- verifiedWrite an integration test that retries with the same idempotency key and asserts idempotent behaviour.
Things that make this bug worse or harder to find.
- warningRemoving retry logic entirely instead of adding idempotency
- warningUsing a random UUID as idempotency key for the same logical operation — the key must be deterministic
- warningNot storing idempotency results long enough — if the client retries after the key expires, duplicates happen
- warningRetrying without a maximum retry count — infinite retries can overwhelm the server
- warningAssuming all HTTP clients or SDKs handle retries safely out of the box