What this usually means
Nginx's rate limiting module (ngx_http_limit_req_module) enforces a max request rate per key (e.g., client IP, URI). When the burst queue fills up and nodelay is not used, subsequent requests are immediately rejected with 503. The default limit_req_status is 503, but it can be changed to 429 or another code. The core issue is that the configured rate is too low for the incoming request pattern, or the burst size is too small, causing legitimate traffic to be dropped.
The first ten minutes — establish facts before touching code.
- 1Check /var/log/nginx/error.log for lines containing 'limiting requests' to confirm rate limiting is the cause
- 2Run 'curl -I http://your-site.com' and look for the HTTP status code; if 503, check if limit_req is active on that path
- 3Inspect the nginx configuration for limit_req_zone and limit_req directives: 'grep -r limit_req /etc/nginx/'
- 4Test by temporarily commenting out limit_req lines in a test environment to see if 503s stop
- 5Use 'tail -f /var/log/nginx/access.log' and watch for 503 responses correlated with request timestamps
The specific files, logs, configs, and dashboards that usually own this bug.
- search/var/log/nginx/error.log (default error log, look for 'limiting requests')
- search/etc/nginx/nginx.conf or /etc/nginx/conf.d/*.conf (limit_req_zone definitions)
- search/etc/nginx/sites-enabled/* (limit_req directives in server/location blocks)
- searchAccess log: /var/log/nginx/access.log (correlate 503 with request rate)
- searchNginx status page (if enabled) at /nginx_status (check Active connections etc.)
- searchApplication-level logs (to rule out upstream 503s)
- searchSystem metrics like 'netstat -an | grep :80 | wc -l' to see connection count
Practical causes, not theory. These are the things you will actually find.
- warningRate limit set too low for legitimate traffic (e.g., 1r/s on a popular API endpoint)
- warningBurst parameter omitted or set too small, causing immediate rejection on slight overshoot
- warningnodelay not used, so requests are queued and rejected when burst exhausted
- warningLimit key too broad (e.g., $binary_remote_addr for all traffic behind a NAT)
- warningMultiple limit_req rules stacked incorrectly, applying twice to same request
- warningMisconfigured limit_req_zone with wrong rate syntax (e.g., '5r/m' instead of '5r/s')
- warningRate limiting applied to static assets or CDN traffic unintentionally
Concrete fix directions. Pick the one that matches your root cause.
- buildIncrease the rate in limit_req_zone (e.g., from '10r/s' to '50r/s') after analyzing max throughput
- buildAdd a burst parameter to allow short spikes: 'limit_req zone=one burst=20 nodelay;'
- buildUse nodelay to serve requests at full speed up to burst limit, then return 503
- buildRefine the key to be more specific: limit by URI + IP instead of just IP
- buildSet a separate zone for different endpoints with appropriate rates
- buildUse limit_req_status 429 to align with HTTP semantics (or 503 if preferred)
- buildImplement exponential backoff on client side to reduce retry storms
A fix you cannot prove is a guess. Close the loop.
- verifiedAfter a fix, simulate high traffic with 'ab -n 1000 -c 10 http://your-site.com/' and check for 0% 503s
- verifiedMonitor error log for 'limiting requests' messages; they should disappear or reduce drastically
- verifiedCheck access log for 503 count: 'awk "$9 == 503" /var/log/nginx/access.log | wc -l'
- verifiedGradually increase load and confirm that 503s only appear at expected thresholds
- verifiedTest with a single client sending rapid requests: 'for i in {1..100}; do curl -w "%{http_code}\n" -o /dev/null -s http://your-site.com/; done' and verify no 503s
Things that make this bug worse or harder to find.
- warningSetting rate limit too high and negating its purpose (DDoS protection fails)
- warningForgetting to reload Nginx after config change: 'nginx -s reload'
- warningApplying rate limit globally without excluding internal or health check endpoints
- warningUsing nodelay without understanding it: it discards queued requests immediately after burst
- warningNot logging the reason for 503: ensure error log level is at least 'error' for limit_req messages
- warningRelying solely on client IP behind a proxy without setting real_ip_header
- warningCopying configs from examples without adjusting rates to your traffic pattern
Midnight 503 Spike After Feature Launch
Timeline
- 01:15Alert: 503 error rate exceeds 5% on /api/checkout endpoint
- 01:17Checked Nginx error log: 'limiting requests, excess: 0.500' per second
- 01:20Found limit_req_zone with rate=5r/s and burst=5 in conf.d/api.conf
- 01:25Noticed new feature launched at midnight increased traffic 3x
- 01:30Changed rate to 20r/s, burst to 20, added nodelay, reloaded Nginx
- 01:32503 rate dropped to 0% within 2 minutes
- 01:40Confirmed via access log that 503s stopped
- 02:00Updated rate limit documentation and added monitoring for zone capacity
At 1 AM, our pager went off: the /api/checkout endpoint was returning 503s for 5% of requests. We had just launched a flash sale feature at midnight, and traffic to checkout had tripled. I pulled up Nginx error log and immediately saw 'limiting requests' messages with excess values around 0.5. The limit_req_zone was set to 5 requests per second with a burst of 5, which was fine for normal loads but not for the sale.
I quickly edited /etc/nginx/conf.d/api.conf, changing rate to 20r/s, burst to 20, and added nodelay. The nodelay was critical to avoid queuing. After nginx -s reload, the 503 rate dropped to zero within minutes. I also checked that the real client IP was being picked up correctly since we're behind an ELB — the set_real_ip_from directive was already there, so no issue there.
Post-incident, we added a dashboard tracking the current rate limit zone usage via the Nginx stub_status module. We also set up alerts when the zone utilization exceeds 80%. The root cause was simply underestimating peak traffic. Now we regularly review rate limits before feature launches.
Root cause
Rate limit configured too low for a traffic spike from a new feature launch, with insufficient burst and no nodelay causing immediate 503.
The fix
Increased rate from 5r/s to 20r/s, burst from 5 to 20, and added nodelay. Reloaded Nginx.
The lesson
Always estimate peak traffic with a safety margin, and use nodelay for APIs to avoid queuing. Monitor zone usage proactively.
Nginx rate limiting is implemented via the ngx_http_limit_req_module. It uses a leaky bucket algorithm: requests are processed at a configured rate (e.g., 10r/s), and excess requests are queued up to the burst size. Without nodelay, queued requests are delayed to match the rate. With nodelay, queued requests are served immediately as long as the bucket (burst) is not empty. When the burst queue is full, new requests are rejected immediately with the configured status code (default 503).
The zone (shared memory) stores state per key. The key is typically $binary_remote_addr but can be any variable. The rate is defined per second (r/s) or per minute (r/m). The burst parameter defines the maximum number of excess requests to queue. Setting burst=0 means no queuing: any request beyond the rate is immediately rejected. The nodelay flag disables the delay for queued requests, effectively allowing a quick burst up to burst size at full speed.
One common mistake is using limit_req in a location block without considering that multiple locations may apply. For example, if you have limit_req in both the server context and a location, the request is checked twice, effectively halving the allowed rate. The fix is to structure rules carefully: use limit_req_zone at http level and limit_req only in the desired contexts, avoiding duplication.
Another issue is misidentifying the client IP behind proxies. If your Nginx sits behind a reverse proxy (like AWS ELB), $remote_addr is the proxy IP. You must configure set_real_ip_from and real_ip_header to get the true client IP. Otherwise, rate limiting may apply to the proxy IP, causing all traffic to be lumped together. Example: set_real_ip_from 10.0.0.0/8; real_ip_header X-Forwarded-For;
Start by analyzing your access logs to determine the 95th percentile request rate per key. Use commands like 'awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -20' to see top IPs and their request counts. Then set the rate to that peak value plus a buffer (e.g., 2x). The burst should be set to accommodate short spikes, often 2-5 times the rate. Always test with load testing tools like 'ab' or 'wrk'.
Consider using different zones for different endpoints. For example, a login endpoint may have a lower rate than a search endpoint. Use separate limit_req_zone directives with different names. Also, use limit_req_status 429 to return a proper HTTP status for rate limiting, which is more semantic and allows clients to retry after a suitable delay (Retry-After header).
Enable Nginx stub_status or use a commercial module like NGINX Plus to expose zone usage metrics. For stub_status, add a location: location /nginx_status { stub_status; allow 127.0.0.1; deny all; } Then monitor 'curl http://127.0.0.1/nginx_status' for 'Active connections' and 'Requests per second'. For rate limiting specifically, you can parse error log counts of 'limiting requests' over time.
Set up alerts when the number of 503s due to rate limiting exceeds a threshold. Use log monitoring tools like ELK or Grafana with Loki. For example, a Prometheus metric can be derived from Nginx logs using a exporter. Alert when rate limit rejections exceed 1% of total requests over 5 minutes.
Frequently asked questions
How do I change the HTTP status code from 503 to 429 for rate limiting?
Add 'limit_req_status 429;' inside the http, server, or location block where the limit_req is defined. This changes the response code to 429 (Too Many Requests), which is more semantically correct. You can also add a custom error page with 'error_page 429 /custom_429.html;'.
Why am I seeing 503 even when the rate is not exceeded?
Check if multiple limit_req directives apply to the same request (e.g., from server and location). Also verify that the key is correctly identifying unique clients. If all traffic comes from a single IP (e.g., behind a VPN), the limit will be hit quickly. Use a broader key like $http_x_forwarded_for or a combination of IP and URI.
What is the difference between burst and nodelay?
Burst defines the maximum number of excess requests that can be queued. Without nodelay, queued requests are delayed to the configured rate (e.g., if rate=10r/s and burst=20, requests beyond 10 are delayed). With nodelay, queued requests are served immediately up to the burst limit. After the burst is exhausted, new requests are rejected. nodelay is useful for APIs where delay is unacceptable.
How can I test rate limiting without affecting production?
Create a separate test server with the same rate limits. Use tools like 'ab', 'wrk', or 'siege' to simulate traffic. For example, 'ab -n 100 -c 10 http://test-server/api/' and check the count of 503s. Alternatively, use a staging environment with production traffic mirroring.
Can I exclude certain IPs from rate limiting?
Yes, use the 'geo' module or 'map' to define a whitelist. For example: 'geo $whitelist { default 0; 192.168.1.0/24 1; }' then in the location block: 'if ($whitelist = 0) { limit_req zone=one burst=5 nodelay; }'. Note that using 'if' inside location is generally frowned upon, but for this purpose it's acceptable. Alternatively, use a separate zone without limit for whitelisted IPs.