LEARN · DEBUGGING GUIDE

Debugging HTTP 429 Too Many Requests – API Rate Limit Guide

HTTP 429 isn't just 'slow down'—it's a signal to inspect retry logic, header parsing, and distributed rate counters. This guide covers the real-world traps.

IntermediateHTTP / Networking6 min read

What this usually means

An upstream API or service has a configured rate limit that your client is exceeding. But it's rarely just 'too many requests.' Common causes include missing rate limit headers in your retry logic, improper use of exponential backoff, misconfigured burst limits, or a shared rate limit pool across multiple services. In distributed systems, you may also see cascading 429s when one service's retries amplify load on another. The Retry-After header is your primary guide, but many clients ignore it or parse it incorrectly (e.g., treating seconds as milliseconds).

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

  • 1Run `curl -v https://api.example.com/endpoint` and inspect response headers for `X-RateLimit-Remaining`, `X-RateLimit-Reset`, `Retry-After`.
  • 2Check application logs for 429 status code and timestamp patterns—look for spikes every few seconds.
  • 3Use `tcpdump -i any port 443 -w capture.pcap` and filter for HTTP 429 responses to see exact timing.
  • 4In APM (Datadog/New Relic), filter by `http.status_code:429` and group by client IP or API key.
  • 5Review your retry library config: if using `requests` in Python, check `Retry` object for `status_forcelist` and `backoff_factor`.
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • searchApplication logs: grep for '429' or 'rate limit' across your service logs
  • searchAPI gateway logs (Kong, AWS API Gateway, NGINX): look for `upstream_status` 429
  • searchClient-side retry config: code files that define `max_retries`, `backoff_factor`, `status_forcelist`
  • searchRate limiter configuration: Redis keys or middleware settings (e.g., `nginx limit_req_zone`)
  • searchAPM traces: waterfall view to see if 429s correlate with high latency upstream
  • searchInfrastructure metrics: CPU/memory on the server—sometimes 429 is a proxy for overload, not rate limit
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningRetry logic doesn't respect Retry-After header—uses fixed delay instead
  • warningExponential backoff with too small multiplier (e.g., 0.1 instead of 1-2) causing rapid retries
  • warningBurst limit exceeded: client sends N requests in parallel right before a window reset
  • warningShared API key across multiple services: one service's load triggers 429 for all
  • warningRate limit configured per IP but requests come from a load balancer with single source IP
  • warningClock skew between client and server causes inaccurate rate window calculation
  • warningRetry storm: failed requests spawn retries that also fail, amplifying load
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildImplement exponential backoff with jitter: base_delay * (2^attempt) + random(0, base_delay)
  • buildRead `Retry-After` header (HTTP-date or seconds) and wait exactly that duration
  • buildUse a distributed semaphore (Redis) to coordinate rate across service instances
  • buildAdd circuit breaker: after N consecutive 429s, stop sending for a cooldown period
  • buildMonitor rate limit headers (`X-RateLimit-Remaining`) and throttle before hitting zero
  • buildIf burst is the issue, enable queuing or spread requests across multiple API keys
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedSend a burst of requests and confirm no 429 responses for 10 minutes
  • verifiedCheck `X-RateLimit-Remaining` never drops below 0 in logs
  • verifiedRun load test with `wrk -c 20 -d 30s` and assert 0% 429 rate
  • verifiedVerify in APM that 429 error count stays at zero during peak traffic
  • verifiedTest retry logic by temporarily reducing rate limit on server and ensuring client backs off correctly
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningIgnoring Retry-After header and retrying immediately
  • warningUsing linear backoff (e.g., wait 1s, 2s, 3s) without jitter—causes thundering herd
  • warningNot logging the rate limit headers—makes debugging impossible
  • warningSetting retry count too high (e.g., 10+) which can overwhelm the server
  • warningAssuming rate limits are per-endpoint when they are per-account or per-IP
  • warningDeploying rate limit changes without monitoring the error rate dashboard
( 07 )War story

The Retry Storm That Took Down Our Payment Gateway

Senior Backend EngineerPython 3.9, Flask, Redis, Stripe API, Kubernetes

Timeline

  1. 09:15PagerDuty alert: payment service error rate > 5% (HTTP 429)
  2. 09:18Checked logs: thousands of 429s from Stripe API with Retry-After: 2
  3. 09:22Found retry library using fixed 1-second delay instead of Retry-After
  4. 09:25Discovered our Kubernetes replicas share a single Stripe API key—rate limit per key
  5. 09:30Deployed hotfix: read Retry-After header and wait that long
  6. 09:35Errors dropped to 0%; but noticed another spike—retry storm from other services
  7. 09:40Identified inventory service also calls Stripe with same key, no backoff
  8. 09:50Coordinated deployment of exponential backoff across all services
  9. 10:00Incident resolved; added circuit breaker for Stripe calls

It started with a routine deploy—a minor change to the payment flow. Within minutes, PagerDuty lit up: payment service error rate above 5%. I jumped into the logs and saw a wall of HTTP 429 from Stripe. The Retry-After header said 2 seconds, but our retry library (a wrapper around `requests`) was using a fixed 1-second delay. So we were retrying too fast, getting another 429, and compounding the issue.

I dug deeper and realized we had 12 Kubernetes pods all using the same Stripe API key. Stripe's rate limit is per key, not per pod. Each pod's retries were independent, so the aggregate load far exceeded the limit. Our retry library also lacked jitter—every pod retried at the same second, creating a thundering herd.

The hotfix was simple: rewrite the retry logic to respect the Retry-After header and add exponential backoff with jitter. But the real lesson came when another service (inventory) also started getting 429s because it shared the same key. We had to coordinate a cross-team fix and add a circuit breaker to fail fast. In the end, we moved to per-pod API keys and centralized rate limit monitoring.

Root cause

Retry logic ignored the Retry-After header and used fixed 1-second delay; combined with shared API key across 12 pods, it caused a rate limit storm.

The fix

Updated retry logic to read Retry-After header and wait exactly that duration; added exponential backoff with jitter; moved to per-pod API keys and circuit breakers.

The lesson

Always respect the Retry-After header in rate limit responses. Never assume retries are isolated—distributed systems amplify failures. Monitor rate limit headers proactively.

( 08 )Anatomy of a 429 Response

A standard HTTP 429 response includes a `Retry-After` header indicating how long to wait before retrying. The value can be a decimal integer (seconds) or an HTTP-date. Many clients incorrectly parse this as milliseconds or ignore it entirely. Always log the raw header value for debugging.

Other common headers: `X-RateLimit-Limit` (max requests per window), `X-RateLimit-Remaining` (requests left), `X-RateLimit-Reset` (Unix timestamp when the window resets). Not all APIs include these, but when they do, you can throttle proactively rather than reacting to 429s.

( 09 )Backoff Strategies That Actually Work

Exponential backoff with jitter is the gold standard. Formula: `sleep = min(cap, base * 2^attempt) + random(0, base * 2^attempt)`. The random jitter prevents synchronized retries (thundering herd). Common mistakes: setting base too low (e.g., 0.1 seconds) or cap too high (e.g., 300 seconds).

If the server provides a `Retry-After` header, use that value directly instead of computed backoff. For multiple 429s, consider circuit breaking: after N consecutive 429s, enter a cooldown period with no retries. This protects the server and reduces wasted work.

( 10 )Distributed Rate Limit Coordination

When multiple instances share a rate limit, you need a distributed token bucket or semaphore. Redis is ideal: use `INCR` with expiry to track usage across pods. Each pod checks the current count before sending a request. If the limit is near, it can queue or backoff.

Another approach: assign each pod a unique API key if the provider supports it. This isolates failures and simplifies debugging. For shared keys, implement a local rate limiter that reserves tokens based on estimated capacity per pod.

( 11 )Monitoring and Alerting for Rate Limits

Don't wait for 429s to become an incident. Monitor `X-RateLimit-Remaining` as a metric: alert when it drops below 20% of the limit. Use APM to track 429 response times and correlate with upstream latency spikes.

Log every 429 with full context: request path, headers, Retry-After value, and client IP. This helps identify patterns—like a specific endpoint being hit too often or a misbehaving client. Set up a dashboard with rate limit error rate by service and time.

Frequently asked questions

What is the difference between 429 and 503?

429 means you've sent too many requests in a given time window (rate limit). 503 means the server is temporarily unavailable (often due to overload). Both may include a Retry-After header, but the fix differs: 429 requires you to slow down, while 503 may require you to wait for the server to recover.

Should I always respect the Retry-After header?

Yes, absolutely. The Retry-After header is the server's explicit instruction on when to retry. Ignoring it will likely result in more 429s and potentially get your client blocked. If the header is missing, use exponential backoff with jitter.

Can rate limits be applied per endpoint?

Yes, many APIs have different limits per endpoint (e.g., GET /users might have a higher limit than POST /payments). Check the documentation or response headers like X-RateLimit-Limit per endpoint. Your retry logic should be endpoint-aware.

How do I test rate limit handling?

Use a tool like `wrk`, `ab`, or a simple script to send burst requests. Temporarily reduce your server's rate limit to a low value (e.g., 1 request per second) and verify that your client backs off correctly. Check logs for Retry-After usage and that no retry storm occurs.

What is a retry storm?

A retry storm happens when many clients retry simultaneously after a failure, overwhelming the server. This can cascade across services. Mitigation: use jitter in backoff, circuit breakers, and coordinated retry limits (e.g., per-client token bucket).