All guides

LEARN \u00b7 DEBUGGING GUIDE

Rate limit bug: how to debug unexpected rate limiting issues

A user makes three normal requests and gets a 429 Too Many Requests. Your rate limit is set to 100 per minute. Something is counting wrong.

IntermediateObservability/performance debugging

What this usually means

Rate limiters track request counts per key (usually IP, user ID, or API key) over a time window. When the count exceeds the limit, further requests are rejected. Bugs happen when: the key is wrong (all users share one key), the counter is stored locally (each app instance counts separately), the window algorithm has an off-by-one error (request accepted when it should be rejected or vice versa), or the counter storage fails silently (Redis is down, fallback allows all requests).

( 01 )Fast diagnosis

The first ten minutes \u2014 establish facts before touching code.

  • 1Check the rate limit key. Log what key the limiter is using for the request. Is it the user's ID, or is it a shared value like the server hostname?
  • 2Check the rate limit counter state. If using Redis, `GET <rate-limit-key>`. Does the count match expectations?
  • 3Check if the counter is local (in-memory) or shared (Redis). If local and you have 4 app instances, each allows 100 requests — total 400.
  • 4Check the rate limit window. Is the window sliding or fixed? A fixed window resets at the top of the minute, which can allow 2x bursts at window boundaries.
  • 5Check the rate limit headers in the response. `X-RateLimit-Remaining`, `X-RateLimit-Reset`, `Retry-After`. Do they make sense?
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • searchRate limit library configuration — algorithm, window size, max requests
  • searchRate limit key generator — what request property is used as the key?
  • searchRate limit counter storage — in-memory (local) vs Redis (shared)
  • searchResponse headers — `X-RateLimit-*`, `Retry-After`
  • searchMulti-instance deployment — how many app instances are running?
  • searchLoad balancer or reverse proxy — is it adding its own rate limiting?
  • searchClient retry behaviour — are clients retrying on 429 and making the problem worse?
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningRate limit key is the same for all users — e.g. using the server hostname instead of user ID
  • warningRate limit counter is stored in-memory — each app instance has its own counter
  • warningRate limit window boundary allows double bursts — fixed window resets at :00, request at :59 and :01 both count as 0
  • warningRedis connection for counter storage is failing — fallback allows all or blocks all
  • warningRate limit is configured per-second but the implementation treats it as per-minute
  • warningLoad balancer has its own rate limit that is stricter than the application limit
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildUse a distributed counter store (Redis) for rate limits when running multiple app instances
  • buildUse a sliding window algorithm instead of a fixed window to prevent burst doubling at boundaries
  • buildLog rate limit decisions (key, count, limit, decision) for debugging without exposing to users
  • buildReturn standard rate limit headers (`X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Reset`) so clients can self-throttle
  • buildAdd a circuit breaker: if the counter store is down, fail open (allow) or fail closed (block) based on the use case
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedSend 5 requests within the window. The first N (below limit) should succeed. Request N+1 should get 429.
  • verifiedCheck the rate limit headers on each response. Remaining should decrease to 0.
  • verifiedWait for the window to reset. The counter should reset and requests should succeed again.
  • verifiedTest with multiple app instances — the rate limit should be shared, not per-instance.
  • verifiedSimulate Redis failure and verify the fallback behaviour is what you intended.
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningUsing in-memory rate limiting with multiple app instances
  • warningNot returning rate limit headers in responses
  • warningSetting a rate limit without testing what 'normal' usage looks like
  • warningNot having a fallback for when the counter storage is unavailable
  • warningRate limiting by IP in a world of shared IPs (corporate networks, mobile carriers)