What this usually means
Most boto3 errors fall into four buckets: credential misconfiguration (missing or wrong permissions), API throttling (exceeding service limits), network issues (wrong endpoint, proxy, or timeout), and parameter validation (bad data types or missing required fields). The error message from botocore is usually explicit, but the root cause can be buried in IAM policies, region mismatches, or retry logic. The key is to read the full traceback and the error code — 'AccessDenied' vs 'UnauthorizedOperation' vs 'ThrottlingException' each point to different fixes.
The first ten minutes — establish facts before touching code.
- 1Run `aws sts get-caller-identity` with the same credentials to confirm they're valid.
- 2Check the full error traceback for the specific operation name and error code.
- 3Enable botocore debug logging: `import logging; logging.getLogger('botocore').setLevel(logging.DEBUG)`
- 4Inspect the boto3 session's region and credentials: `print(session.region_name, session.get_credentials().access_key)`
- 5Test the call with the AWS CLI using the same parameters to isolate client vs. permission issues.
- 6Check service quotas and recent API usage in CloudWatch Metrics for the offending resource.
The specific files, logs, configs, and dashboards that usually own this bug.
- search~/.aws/credentials and ~/.aws/config for profile settings
- searchIAM policy simulator: https://policysim.aws.amazon.com/
- searchCloudWatch Logs for the specific API call (enable CloudTrail)
- searchboto3 retry configuration: botocore.config.Config(retries={'max_attempts': 10, 'mode': 'adaptive'})
- searchEnvironment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION
- searchService Quotas console for throttle limits
- searchNetwork connectivity: `curl -v https://<service>.region.amazonaws.com`
Practical causes, not theory. These are the things you will actually find.
- warningIAM policy missing required action or resource ARN typo
- warningIncorrect region or endpoint (e.g., us-east-1 for a service only in eu-west-1)
- warningMissing or expired temporary credentials (STS tokens)
- warningDefault retry config insufficient for rate-limited APIs
- warningParameter validation: passing string instead of list, or missing required fields
- warningNetwork issues: VPC endpoints, proxy settings, or DNS resolution failures
- warningClient side timeout too short for long-running operations
Concrete fix directions. Pick the one that matches your root cause.
- buildAdd the missing IAM action to the policy, and verify with `aws iam simulate-principal-policy`
- buildUse `botocore.config.Config(retries={'max_attempts': 5, 'mode': 'adaptive'})` for throttling
- buildSwitch to explicit region in boto3 client: `boto3.client('s3', region_name='us-west-2')`
- buildFor temporary credentials, ensure the token is included: `session = boto3.Session(aws_access_key_id=..., aws_secret_access_key=..., aws_session_token=...)`
- buildUse `botocore.exceptions.ClientError` exception handling with specific error codes
- buildImplement exponential backoff with jitter using `tenacity` or `backoff` library
- buildSet environment variable `AWS_DEFAULT_REGION` to match the service's region
A fix you cannot prove is a guess. Close the loop.
- verifiedRun the same boto3 call after fix and confirm no exception is raised
- verifiedCheck the return status code (200 for success) and response metadata
- verifiedRun `aws <service> <operation> --region <region>` with same parameters
- verifiedMonitor CloudWatch Metrics for the operation and ensure no ThrottlingException
- verifiedWrite a unit test that mocks the boto3 call with expected behavior
- verifiedCheck the IAM policy simulator output for the specific action and resource
Things that make this bug worse or harder to find.
- warningHardcoding credentials in source code — use IAM roles or env vars
- warningCatching all exceptions with a bare `except:` — handle specific botocore exceptions
- warningIgnoring the retry count in the error message — it means the default retry policy failed
- warningMisinterpreting AccessDenied as a credential problem — it's a permission problem
- warningForgetting to refresh STS tokens when using assume_role
- warningAssuming all AWS services are available in all regions
Incident: S3 ListObjectsV2 Suddenly Failing with AccessDenied
Timeline
- 14:03PagerDuty alert: S3 list operation failing for multiple customers
- 14:05Check Lambda logs: botocore.exceptions.ClientError: AccessDenied on ListObjectsV2
- 14:08Run aws sts get-caller-identity → returns correct role ARN
- 14:12Check IAM policy for the Lambda role → policy looks correct
- 14:18Use policy simulator → shows Allow, but with condition: 'aws:SourceIp' mismatch
- 14:22Discover that the team had added a condition key 'aws:SourceIp' for a new VPC endpoint
- 14:25Update the policy to remove the condition or add the Lambda's VPC CIDR
- 14:30Redeploy Lambda, test passes, alert resolves
At 14:03, we got a PagerDuty alert that our S3 list operation was failing for multiple customers. I jumped into the Lambda logs and saw botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation. My first thought was that someone rotated the access keys, but we were using an IAM role, not keys.
I ran aws sts get-caller-identity from the Lambda and it returned the correct role ARN, so credentials were fine. Then I checked the IAM policy attached to the role. It had the s3:ListBucket action on the bucket ARN. The policy simulator said 'Allow', but when I clicked 'Show details', it showed a condition was failing.
The condition was aws:SourceIp. A teammate had added it yesterday to restrict access to a new VPC endpoint, but the Lambda runs in a different VPC with a different IP range. The condition blocked all requests. I removed the condition and redeployed. The fix took 27 minutes because we didn't check the policy simulator early enough.
Root cause
IAM policy had an aws:SourceIp condition that blocked the Lambda's VPC CIDR.
The fix
Removed the aws:SourceIp condition from the IAM policy, or updated it to include the Lambda's VPC CIDR range.
The lesson
Always use the IAM policy simulator with the actual request context (principal, action, resource, conditions). Don't assume a policy that says 'Allow' actually works — conditions can silently deny.
botocore exceptions inherit from botocore.exceptions.BotoCoreError. The most common are ClientError (service returns an error), EndpointConnectionError (network), ParamValidationError (client-side validation), and NoCredentialsError. Each has a specific cause. ClientError contains a response dict with 'Error' key containing 'Code' and 'Message'. For example, 'Code': 'ThrottlingException' means you hit a rate limit.
Always catch specific exceptions rather than a generic Exception. Use `except botocore.exceptions.ClientError as e:` and inspect `e.response['Error']['Code']`. This avoids masking unrelated issues like network timeouts or import errors.
boto3 checks credentials in this order: 1) Explicit kwargs to client/resource, 2) Environment variables (AWS_ACCESS_KEY_ID, etc.), 3) Shared credential file (~/.aws/credentials), 4) Config file (~/.aws/config), 5) IAM role for EC2/Lambda. If you have multiple profiles, the default is 'default'. Use `boto3.Session(profile_name='myprofile')` to explicitly choose.
A common mistake is having an expired token in environment variables that overrides a valid IAM role. Debug by printing the session's credentials method: `session.get_credentials().method` — it will say 'env', 'assume-role', 'iam-role', etc. If you see 'env' and expect 'iam-role', unset the env vars.
Default retry mode is 'legacy' with max_attempts=5. For throttling-sensitive APIs (DynamoDB, EC2), switch to 'adaptive' mode which uses client-side rate limiting. Example: `config = botocore.config.Config(retries={'max_attempts': 10, 'mode': 'adaptive'})`. This is especially important for Lambda functions that may retry on failure.
The error message 'reached max retries: 4' means the default retry policy exhausted. Check the Retry-After header in the response (available in `e.response['ResponseMetadata']['RetryAfter']`). Implement your own backoff with the `backoff` library: `@backoff.on_exception(backoff.expo, botocore.exceptions.ClientError, max_time=60)`.
EndpointConnectionError usually means DNS resolution failure or a firewall blocking the request. First, verify the region: the endpoint is `https://<service>.<region>.amazonaws.com`. Some services have global endpoints (e.g., `iam.amazonaws.com`). Use `curl -v https://s3.us-east-1.amazonaws.com` to test connectivity.
If you're in a VPC, ensure you have VPC endpoints configured for the service, or the Lambda function has a NAT gateway. Check the security group and NACL rules. Also, set the `AWS_STS_REGIONAL_ENDPOINTS` environment variable to 'regional' to avoid global endpoint issues.
ParamValidationError is raised before the API call is made. It catches type mismatches, missing required fields, and invalid enum values. For example, passing a string instead of a list for `s3.put_object_tagging(TagSet=...)` will fail. Use the AWS CLI to test the exact same parameters: `aws s3api put-object-tagging --bucket ... --key ... --tagging 'TagSet=[{Key=...,Value=...}]'`.
Another common issue is forgetting to convert datetime objects to the expected format. Use `datetime.isoformat()` or `boto3.utils.parse_to_epoch` for timestamps. Always check the API documentation for parameter types.
Frequently asked questions
What is the difference between AccessDenied and UnauthorizedOperation?
Both are ClientError with code 'AccessDenied' or 'UnauthorizedOperation'. 'AccessDenied' is more common and means the IAM policy explicitly denies or doesn't allow the action. 'UnauthorizedOperation' is used by some services (like EC2) for actions that require special authorization (e.g., purchasing reserved instances). The fix is the same: adjust the IAM policy.
How do I debug a 'NoCredentialsError' when running in Lambda?
In Lambda, the IAM role is automatically used. If you see NoCredentialsError, it means the environment variables AWS_ACCESS_KEY_ID etc. might be set to empty or incorrect values. Check the Lambda's execution role in the console. Also, ensure the Lambda has permissions to call STS to get the role's credentials. You can test by adding a dummy env var and seeing if it overrides the role.
Why does boto3 raise 'ThrottlingException' even with retries?
The default retry policy uses exponential backoff with jitter, but it stops after 5 attempts. If the API is still throttling after 5 retries (total time ~20 seconds), you'll see the error. Increase max_attempts or use adaptive mode. Also, check if you're hitting a service quota; request a limit increase if needed.
What should I do when I get 'EndpointConnectionError'?
First, verify the region name is correct. Some services like IAM and Route53 have global endpoints (us-east-1). Check if your network allows outbound HTTPS to AWS endpoints. If using VPC, ensure there's a VPC endpoint for the service, or a NAT gateway. Use `curl -v` to test connectivity from the same environment.
How can I test boto3 code without making real API calls?
Use the `moto` library to mock AWS services. For example, `pip install moto` and then use decorators like `@mock_s3` to simulate S3. This is useful for unit tests. However, moto may not support all services or edge cases, so always test against real AWS in a sandbox environment.