What this usually means
Rate limiters track request counts per key (usually IP, user ID, or API key) over a time window. When the count exceeds the limit, further requests are rejected. Bugs happen when: the key is wrong (all users share one key), the counter is stored locally (each app instance counts separately), the window algorithm has an off-by-one error (request accepted when it should be rejected or vice versa), or the counter storage fails silently (Redis is down, fallback allows all requests).
The first ten minutes \u2014 establish facts before touching code.
- 1Check the rate limit key. Log what key the limiter is using for the request. Is it the user's ID, or is it a shared value like the server hostname?
- 2Check the rate limit counter state. If using Redis, `GET <rate-limit-key>`. Does the count match expectations?
- 3Check if the counter is local (in-memory) or shared (Redis). If local and you have 4 app instances, each allows 100 requests — total 400.
- 4Check the rate limit window. Is the window sliding or fixed? A fixed window resets at the top of the minute, which can allow 2x bursts at window boundaries.
- 5Check the rate limit headers in the response. `X-RateLimit-Remaining`, `X-RateLimit-Reset`, `Retry-After`. Do they make sense?
The specific files, logs, configs, and dashboards that usually own this bug.
- searchRate limit library configuration — algorithm, window size, max requests
- searchRate limit key generator — what request property is used as the key?
- searchRate limit counter storage — in-memory (local) vs Redis (shared)
- searchResponse headers — `X-RateLimit-*`, `Retry-After`
- searchMulti-instance deployment — how many app instances are running?
- searchLoad balancer or reverse proxy — is it adding its own rate limiting?
- searchClient retry behaviour — are clients retrying on 429 and making the problem worse?
Practical causes, not theory. These are the things you will actually find.
- warningRate limit key is the same for all users — e.g. using the server hostname instead of user ID
- warningRate limit counter is stored in-memory — each app instance has its own counter
- warningRate limit window boundary allows double bursts — fixed window resets at :00, request at :59 and :01 both count as 0
- warningRedis connection for counter storage is failing — fallback allows all or blocks all
- warningRate limit is configured per-second but the implementation treats it as per-minute
- warningLoad balancer has its own rate limit that is stricter than the application limit
Concrete fix directions. Pick the one that matches your root cause.
- buildUse a distributed counter store (Redis) for rate limits when running multiple app instances
- buildUse a sliding window algorithm instead of a fixed window to prevent burst doubling at boundaries
- buildLog rate limit decisions (key, count, limit, decision) for debugging without exposing to users
- buildReturn standard rate limit headers (`X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Reset`) so clients can self-throttle
- buildAdd a circuit breaker: if the counter store is down, fail open (allow) or fail closed (block) based on the use case
A fix you cannot prove is a guess. Close the loop.
- verifiedSend 5 requests within the window. The first N (below limit) should succeed. Request N+1 should get 429.
- verifiedCheck the rate limit headers on each response. Remaining should decrease to 0.
- verifiedWait for the window to reset. The counter should reset and requests should succeed again.
- verifiedTest with multiple app instances — the rate limit should be shared, not per-instance.
- verifiedSimulate Redis failure and verify the fallback behaviour is what you intended.
Things that make this bug worse or harder to find.
- warningUsing in-memory rate limiting with multiple app instances
- warningNot returning rate limit headers in responses
- warningSetting a rate limit without testing what 'normal' usage looks like
- warningNot having a fallback for when the counter storage is unavailable
- warningRate limiting by IP in a world of shared IPs (corporate networks, mobile carriers)