Python boto3 AWS API Error Debugging Guide

What this usually means

Most boto3 errors fall into four buckets: credential misconfiguration (missing or wrong permissions), API throttling (exceeding service limits), network issues (wrong endpoint, proxy, or timeout), and parameter validation (bad data types or missing required fields). The error message from botocore is usually explicit, but the root cause can be buried in IAM policies, region mismatches, or retry logic. The key is to read the full traceback and the error code — 'AccessDenied' vs 'UnauthorizedOperation' vs 'ThrottlingException' each point to different fixes.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

1Run `aws sts get-caller-identity` with the same credentials to confirm they're valid.
2Check the full error traceback for the specific operation name and error code.
3Enable botocore debug logging: `import logging; logging.getLogger('botocore').setLevel(logging.DEBUG)`
4Inspect the boto3 session's region and credentials: `print(session.region_name, session.get_credentials().access_key)`
5Test the call with the AWS CLI using the same parameters to isolate client vs. permission issues.
6Check service quotas and recent API usage in CloudWatch Metrics for the offending resource.

( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

search~/.aws/credentials and ~/.aws/config for profile settings
searchIAM policy simulator: https://policysim.aws.amazon.com/
searchCloudWatch Logs for the specific API call (enable CloudTrail)
searchboto3 retry configuration: botocore.config.Config(retries={'max_attempts': 10, 'mode': 'adaptive'})
searchEnvironment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION
searchService Quotas console for throttle limits
searchNetwork connectivity: `curl -v https://<service>.region.amazonaws.com`

( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

warningIAM policy missing required action or resource ARN typo
warningIncorrect region or endpoint (e.g., us-east-1 for a service only in eu-west-1)
warningMissing or expired temporary credentials (STS tokens)
warningDefault retry config insufficient for rate-limited APIs
warningParameter validation: passing string instead of list, or missing required fields
warningNetwork issues: VPC endpoints, proxy settings, or DNS resolution failures
warningClient side timeout too short for long-running operations

( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

buildAdd the missing IAM action to the policy, and verify with `aws iam simulate-principal-policy`
buildUse `botocore.config.Config(retries={'max_attempts': 5, 'mode': 'adaptive'})` for throttling
buildSwitch to explicit region in boto3 client: `boto3.client('s3', region_name='us-west-2')`
buildFor temporary credentials, ensure the token is included: `session = boto3.Session(aws_access_key_id=..., aws_secret_access_key=..., aws_session_token=...)`
buildUse `botocore.exceptions.ClientError` exception handling with specific error codes
buildImplement exponential backoff with jitter using `tenacity` or `backoff` library
buildSet environment variable `AWS_DEFAULT_REGION` to match the service's region

( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

verifiedRun the same boto3 call after fix and confirm no exception is raised
verifiedCheck the return status code (200 for success) and response metadata
verifiedRun `aws <service> <operation> --region <region>` with same parameters
verifiedMonitor CloudWatch Metrics for the operation and ensure no ThrottlingException
verifiedWrite a unit test that mocks the boto3 call with expected behavior
verifiedCheck the IAM policy simulator output for the specific action and resource

( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

warningHardcoding credentials in source code — use IAM roles or env vars
warningCatching all exceptions with a bare `except:` — handle specific botocore exceptions
warningIgnoring the retry count in the error message — it means the default retry policy failed
warningMisinterpreting AccessDenied as a credential problem — it's a permission problem
warningForgetting to refresh STS tokens when using assume_role
warningAssuming all AWS services are available in all regions

( 07 )War story

Incident: S3 ListObjectsV2 Suddenly Failing with AccessDenied

Senior Backend EngineerPython 3.9, boto3 1.26, AWS Lambda, S3, CloudTrail

Timeline

14:03PagerDuty alert: S3 list operation failing for multiple customers
14:05Check Lambda logs: botocore.exceptions.ClientError: AccessDenied on ListObjectsV2
14:08Run aws sts get-caller-identity → returns correct role ARN
14:12Check IAM policy for the Lambda role → policy looks correct
14:18Use policy simulator → shows Allow, but with condition: 'aws:SourceIp' mismatch
14:22Discover that the team had added a condition key 'aws:SourceIp' for a new VPC endpoint
14:25Update the policy to remove the condition or add the Lambda's VPC CIDR
14:30Redeploy Lambda, test passes, alert resolves

At 14:03, we got a PagerDuty alert that our S3 list operation was failing for multiple customers. I jumped into the Lambda logs and saw botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation. My first thought was that someone rotated the access keys, but we were using an IAM role, not keys.

I ran aws sts get-caller-identity from the Lambda and it returned the correct role ARN, so credentials were fine. Then I checked the IAM policy attached to the role. It had the s3:ListBucket action on the bucket ARN. The policy simulator said 'Allow', but when I clicked 'Show details', it showed a condition was failing.

The condition was aws:SourceIp. A teammate had added it yesterday to restrict access to a new VPC endpoint, but the Lambda runs in a different VPC with a different IP range. The condition blocked all requests. I removed the condition and redeployed. The fix took 27 minutes because we didn't check the policy simulator early enough.

Root cause

IAM policy had an aws:SourceIp condition that blocked the Lambda's VPC CIDR.

The fix

Removed the aws:SourceIp condition from the IAM policy, or updated it to include the Lambda's VPC CIDR range.

The lesson

Always use the IAM policy simulator with the actual request context (principal, action, resource, conditions). Don't assume a policy that says 'Allow' actually works — conditions can silently deny.

( 08 )Understanding botocore Error Hierarchies

botocore exceptions inherit from botocore.exceptions.BotoCoreError. The most common are ClientError (service returns an error), EndpointConnectionError (network), ParamValidationError (client-side validation), and NoCredentialsError. Each has a specific cause. ClientError contains a response dict with 'Error' key containing 'Code' and 'Message'. For example, 'Code': 'ThrottlingException' means you hit a rate limit.

Always catch specific exceptions rather than a generic Exception. Use `except botocore.exceptions.ClientError as e:` and inspect `e.response['Error']['Code']`. This avoids masking unrelated issues like network timeouts or import errors.

( 09 )Credentials Resolution Order

boto3 checks credentials in this order: 1) Explicit kwargs to client/resource, 2) Environment variables (AWS_ACCESS_KEY_ID, etc.), 3) Shared credential file (~/.aws/credentials), 4) Config file (~/.aws/config), 5) IAM role for EC2/Lambda. If you have multiple profiles, the default is 'default'. Use `boto3.Session(profile_name='myprofile')` to explicitly choose.

A common mistake is having an expired token in environment variables that overrides a valid IAM role. Debug by printing the session's credentials method: `session.get_credentials().method` — it will say 'env', 'assume-role', 'iam-role', etc. If you see 'env' and expect 'iam-role', unset the env vars.

( 10 )Throttling and Retry Configuration

Default retry mode is 'legacy' with max_attempts=5. For throttling-sensitive APIs (DynamoDB, EC2), switch to 'adaptive' mode which uses client-side rate limiting. Example: `config = botocore.config.Config(retries={'max_attempts': 10, 'mode': 'adaptive'})`. This is especially important for Lambda functions that may retry on failure.

The error message 'reached max retries: 4' means the default retry policy exhausted. Check the Retry-After header in the response (available in `e.response['ResponseMetadata']['RetryAfter']`). Implement your own backoff with the `backoff` library: `@backoff.on_exception(backoff.expo, botocore.exceptions.ClientError, max_time=60)`.

( 11 )Network Debugging for Endpoint Errors

EndpointConnectionError usually means DNS resolution failure or a firewall blocking the request. First, verify the region: the endpoint is `https://<service>.<region>.amazonaws.com`. Some services have global endpoints (e.g., `iam.amazonaws.com`). Use `curl -v https://s3.us-east-1.amazonaws.com` to test connectivity.

If you're in a VPC, ensure you have VPC endpoints configured for the service, or the Lambda function has a NAT gateway. Check the security group and NACL rules. Also, set the `AWS_STS_REGIONAL_ENDPOINTS` environment variable to 'regional' to avoid global endpoint issues.

( 12 )Parameter Validation Gotchas

ParamValidationError is raised before the API call is made. It catches type mismatches, missing required fields, and invalid enum values. For example, passing a string instead of a list for `s3.put_object_tagging(TagSet=...)` will fail. Use the AWS CLI to test the exact same parameters: `aws s3api put-object-tagging --bucket ... --key ... --tagging 'TagSet=[{Key=...,Value=...}]'`.

Another common issue is forgetting to convert datetime objects to the expected format. Use `datetime.isoformat()` or `boto3.utils.parse_to_epoch` for timestamps. Always check the API documentation for parameter types.

Frequently asked questions

What is the difference between AccessDenied and UnauthorizedOperation?

Both are ClientError with code 'AccessDenied' or 'UnauthorizedOperation'. 'AccessDenied' is more common and means the IAM policy explicitly denies or doesn't allow the action. 'UnauthorizedOperation' is used by some services (like EC2) for actions that require special authorization (e.g., purchasing reserved instances). The fix is the same: adjust the IAM policy.

How do I debug a 'NoCredentialsError' when running in Lambda?

In Lambda, the IAM role is automatically used. If you see NoCredentialsError, it means the environment variables AWS_ACCESS_KEY_ID etc. might be set to empty or incorrect values. Check the Lambda's execution role in the console. Also, ensure the Lambda has permissions to call STS to get the role's credentials. You can test by adding a dummy env var and seeing if it overrides the role.

Why does boto3 raise 'ThrottlingException' even with retries?

The default retry policy uses exponential backoff with jitter, but it stops after 5 attempts. If the API is still throttling after 5 retries (total time ~20 seconds), you'll see the error. Increase max_attempts or use adaptive mode. Also, check if you're hitting a service quota; request a limit increase if needed.

What should I do when I get 'EndpointConnectionError'?

First, verify the region name is correct. Some services like IAM and Route53 have global endpoints (us-east-1). Check if your network allows outbound HTTPS to AWS endpoints. If using VPC, ensure there's a VPC endpoint for the service, or a NAT gateway. Use `curl -v` to test connectivity from the same environment.

How can I test boto3 code without making real API calls?

Use the `moto` library to mock AWS services. For example, `pip install moto` and then use decorators like `@mock_s3` to simulate S3. This is useful for unit tests. However, moto may not support all services or edge cases, so always test against real AWS in a sandbox environment.

Debugging Python boto3 AWS API Errors: A Field Guide

What this usually means

Frequently asked questions