What this usually means
Lambda timeouts fall into three buckets: 1) code runtime exceeds configured timeout (default 3s, max 900s), 2) cold start latency pushes total time over limit, especially for Java/.NET runtimes or large deployment packages, and 3) downstream dependency latency—typically from VPC functions hitting NAT gateways or RDS proxy cold connections. The non-obvious part: many timeouts are actually *concurrency throttles* masked as timeouts when Lambda queues requests and the queue wait pushes the function over its timeout.
The first ten minutes — establish facts before touching code.
- 1Run 'aws lambda invoke --function-name myFunc --payload fileb://test-event.json out.txt' and check the 'ExecutedVersion' and 'StatusCode' in stderr.
- 2In CloudWatch Logs, grep for 'Task timed out' with 'grep -r "Task timed out" /var/log/aws/lambda*' or via console.
- 3Check Lambda concurrency: 'aws lambda get-function-concurrency --function-name myFunc' and compare to account limits.
- 4Enable X-Ray active tracing: 'aws lambda update-function-configuration --function-name myFunc --tracing-config Mode=Active' then examine segments.
- 5If in VPC, check subnet route tables and NAT gateway metrics (CloudWatch 'BytesOutToSource', 'ErrorPortAllocation').
- 6Measure cold start: invoke function twice with a 10-minute pause between, compare 'Init' duration in X-Ray or CloudWatch Logs 'REPORT' line.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchCloudWatch Logs: /aws/lambda/<function-name> — search for 'Task timed out', 'REPORT Duration:', 'Init duration:'
- searchX-Ray traces: Service map and segments showing 'Overhead', 'Init', 'Invocation', 'Response' phases
- searchLambda console monitoring tab: 'Duration', 'Throttles', 'ConcurrentExecutions' metrics
- searchVPC flow logs (if function is VPC-enabled): check for rejected connections or high latency to NAT gateway
- searchLambda function configuration: timeout value, memory size, VPC config, reserved concurrency
- searchAWS Trusted Advisor Lambda limits check: account-level concurrency and burst concurrency limits
Practical causes, not theory. These are the things you will actually find.
- warningCold start too long due to large deployment package (>50MB zipped) or heavy initialization code
- warningVPC function hitting NAT gateway connection limits or routing to private subnet without NAT
- warningDownstream service (RDS, HTTP API) has high latency under load, causing Lambda to wait on network I/O
- warningFunction timeout set too low (default 3s) for actual execution duration, especially during cold starts
- warningConcurrency limit reached: throttled invocations queue and exceed timeout waiting for execution slot
- warningRecursive or re-entrant calls (e.g., S3 trigger -> Lambda -> S3 -> Lambda again) causing exponential backoff and timeout
Concrete fix directions. Pick the one that matches your root cause.
- buildIncrease function timeout to a realistic value based on CloudWatch p99 Duration + 5s buffer
- buildReduce cold start: use Python/Node.js, keep deployment package under 10MB, use provisioned concurrency for critical functions
- buildFor VPC functions: use VPC endpoints for AWS services (S3, DynamoDB) to bypass NAT, or move to non-VPC if not needed
- buildImplement idempotent retry with exponential backoff in client or DLQ for asynchronous invocations
- buildIncrease memory (which also increases CPU) to reduce execution time; test with different memory settings (128MB to 10GB)
- buildSet reserved concurrency to ensure function always has available execution slots and won't be throttled by account limit
A fix you cannot prove is a guess. Close the loop.
- verifiedInvoke function with same test event and confirm new Duration is below p95 threshold
- verifiedRun a load test: 'aws lambda invoke --function-name myFunc --invocation-type Event' with 100 concurrent invokes and check Throttles metric is 0
- verifiedCheck CloudWatch Logs for absence of 'Task timed out' over 24-hour period
- verifiedX-Ray trace shows Init phase under 100ms and total duration under configured timeout
- verifiedFor VPC: verify NAT gateway 'ActiveConnectionCount' stays within limits and flow logs show no RST packets
- verifiedUse Lambda Insights to confirm memory utilization and CPU credits are not exhausted
Things that make this bug worse or harder to find.
- warningBlindly increasing timeout to max (15 min) without addressing root cause—you'll mask issues and increase cost
- warningAdding provisioned concurrency before checking if cold start is actually the problem (costly mistake)
- warningRemoving VPC without validating that function doesn't access internal resources (security hole)
- warningForgetting to update client-side timeouts (API Gateway, ALB) after increasing Lambda timeout
- warningNot setting a DLQ for async invocations—retries will keep timing out with no trace
- warningAssuming all timeouts are the same—the fix for cold start is different from concurrency throttle
Payment processing Lambda times out under peak traffic
Timeline
- 09:151% of payment API calls return 504 after timeout
- 09:18CloudWatch shows 'Task timed out' errors on payment-processing Lambda
- 09:22Checked concurrency: 1000 concurrent invocations, account limit 1000, reserved concurrency 500
- 09:25X-Ray shows Init durations spiking to 5s for cold functions
- 09:30Discovered deployment package is 180MB (includes full SDK + Sharp library)
- 09:35Temporarily increased timeout from 10s to 30s to stop errors
- 09:40Reduced package size to 5MB by removing unused dependencies and using Lambda layers
- 09:50Set provisioned concurrency to 200 for steady-state traffic
- 10:00Monitored CloudWatch: p99 Duration dropped from 12s to 800ms, zero timeouts
We were processing credit card payments through a Lambda function behind API Gateway. At 9:15 AM, dashboards showed a spike in 504 errors. The function had a 10s timeout, but typical execution was under 2s. The error rate was low (<1%) but growing. I first checked CloudWatch logs and found 'Task timed out after 10.00 seconds' on a handful of invocations. The pattern: they all had Init durations >8s.
I checked X-Ray and saw that cold starts were taking 5s due to a massive deployment package we'd accidentally bloated with a full AWS SDK and image processing lib. The problem compounded because the function's reserved concurrency was 500, but account limit was 1000, so during traffic spikes, cold starts were frequent. The queue wait + cold start pushed total time over 10s.
I temporarily increased the timeout to 30s to stop the bleeding, then optimized the package: removed unused modules, moved to Lambda layers for shared libs, and reduced the package to 5MB. I also set provisioned concurrency to 200 to eliminate cold starts for peak traffic. After deploying, p99 latency dropped to 800ms. We kept the timeout at 15s as a safety buffer. Lesson: cold start is often the hidden culprit behind timeouts, especially in Java/Node with large packages.
Root cause
Cold start duration of 5s combined with concurrency throttling queue wait pushed total execution time over the 10s timeout.
The fix
Reduced deployment package from 180MB to 5MB, increased timeout to 15s, set provisioned concurrency to 200.
The lesson
Always check Init duration in X-Ray first when debugging timeouts. Package size and concurrency limits are the two most common non-obvious causes.
Cold starts happen when Lambda needs to spin up a new execution environment. For Node.js/Python, it's typically 100-500ms. For Java/.NET, it can be 5-10s. But the non-obvious part: cold starts are amplified when combined with VPC networking. A function in a VPC must create an ENI, which adds 5-15s to Init duration. I've seen production incidents where a VPC Lambda with Java runtime consistently timed out on first invocation after 10 minutes of inactivity.
To diagnose, look at the 'REPORT' line in CloudWatch Logs: 'Init Duration: 5432.34 ms'. If Init exceeds 1s and your timeout is under 10s, cold start is your problem. Fixes: use provisioned concurrency, switch to a lighter runtime, minimize deployment package (use Lambda layers for SDK), and consider moving out of VPC if not strictly necessary. For VPC functions, use VPC endpoints for AWS services to avoid NAT gateway latency.
When Lambda receives more invocations than available concurrency, it throttles them. For synchronous invocations (API Gateway), it returns 429 or 503 immediately *unless* the function has reserved concurrency and the account limit is hit. For asynchronous invocations, Lambda queues the event and retries for up to 6 hours. The queue wait time counts toward the function's timeout? No—actually, the total wall clock time from invocation to response includes queue wait. I've seen cases where a function with 30s timeout times out because it sat in a queue for 25s waiting for an execution slot.
Check the 'Throttles' metric in CloudWatch. If you see throttles alongside timeouts, the fix is to increase reserved concurrency or account limit. Also check 'ConcurrentExecutions' vs account limit. A common mistake: setting reserved concurrency too low, causing throttled invocations to queue and eventually timeout. Best practice: set reserved concurrency to expected peak concurrent invocations + 20% buffer.
Often the Lambda function itself doesn't timeout—the *client* times out waiting for a response. API Gateway has a 29-second integration timeout (cannot be increased). ALB has 60s. If your Lambda takes 30s and uses API Gateway, you'll get a 504 even if Lambda completes successfully. I've debugged cases where 'Task timed out' was absent from Lambda logs, but API Gateway returned 504. The fix: ensure client timeout > Lambda timeout + network latency.
Also check retries: Lambda SDK has default retries (3 for Node.js). If the first attempt times out, the SDK may retry, compounding the issue. Use idempotency keys and exponential backoff. For async invocations, configure a DLQ to capture failed events instead of retrying indefinitely. In one incident, an SQS-triggered Lambda had 5 retries with 30s timeout each, causing a 2.5-minute processing delay per message.
Common code mistakes: synchronous HTTP calls without timeout, infinite loops, or waiting on a promise that never resolves. In Node.js, if you forget to call 'callback' or 'context.succeed', the function will run until timeout even after the response is sent. Use 'context.getRemainingTimeInMillis()' to log or short-circuit before timeout. In Python, database connections without 'pool_timeout' or 'connect_timeout' can hang indefinitely.
I've seen a function that connected to RDS using a connection pool with no timeout—when RDS hit max connections, the pool waited forever, and the function timed out. The fix: set 'connectionTimeout' and 'query_timeout' in the database driver. Also, use AWS SDK's 'maxRetries' and 'httpOptions.timeout'. For external APIs, always set a timeout on the HTTP client (e.g., 'axios' timeout option).
Frequently asked questions
What is the maximum timeout for AWS Lambda?
The maximum timeout is 900 seconds (15 minutes). To increase it, update the function configuration via CLI: 'aws lambda update-function-configuration --function-name myFunc --timeout 900'. Note that API Gateway integration timeout is fixed at 29 seconds, so if you need longer, use asynchronous invocation or ALB.
Does increasing memory fix timeout errors?
Increasing memory also increases CPU allocation proportionally, which can reduce execution time for compute-bound tasks. However, it won't fix cold start issues or network latency. Use the 'Duration' metric to see if execution time drops linearly with memory. A rule of thumb: double memory, halve duration, but only if the function is CPU-bound.
How do I check if my Lambda is being throttled?
In CloudWatch, look at the 'Throttles' metric for the function. Also check 'ConcurrentExecutions' vs. account concurrency limit. The CLI command 'aws lambda get-account-settings' shows your account-level concurrency limit. If throttles are high, increase reserved concurrency or request a limit increase from AWS Support.
Can a VPC Lambda cause timeouts even if it doesn't access the internet?
Yes, if the function is in a private subnet without a NAT gateway, and it tries to access any external service (e.g., an AWS service without a VPC endpoint), the connection will hang until timeout. Also, if the function needs to access an internal resource (like RDS) and the security group blocks traffic, you'll see connection timeouts. Always check VPC flow logs.
What's the difference between 'Task timed out' and 'Endpoint request timed out'?
'Task timed out' comes from Lambda itself—the function's execution exceeded its configured timeout. 'Endpoint request timed out' comes from API Gateway or ALB—the client's timeout expired before Lambda responded. Both can happen simultaneously, but fixing one may not fix the other. Always check both Lambda logs and client-side logs.