LEARN · DEBUGGING GUIDE

AWS Secrets Manager Secret Retrieval Failing: Debug Guide

If your application can't fetch a secret from AWS Secrets Manager, it's almost always an IAM policy, KMS key, or VPC endpoint misconfiguration. Here's how to find out which one in under five minutes.

IntermediateCloud9 min read

What this usually means

The root cause is almost always one of three things: the IAM principal (user/role) lacks the `secretsmanager:GetSecretValue` permission on that specific secret; the KMS key used to encrypt the secret is not accessible to the caller; or the request is blocked by a VPC endpoint policy or network ACL. A less common but real cause is that the secret was deleted and recreated, and the application is still referencing the old ARN or name. I've also seen cases where the secret's resource policy explicitly denies the principal, overriding an allow in IAM.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

  • 1Run `aws secretsmanager get-secret-value --secret-id <secret-name-or-arn> --region <region>` from the same account. Check the error message — is it AccessDenied or ResourceNotFoundException?
  • 2If AccessDenied, run `aws sts get-caller-identity` to confirm you're using the correct IAM role/user.
  • 3Check the secret's resource policy: `aws secretsmanager get-resource-policy --secret-id <secret-arn>` — look for explicit Deny statements.
  • 4Test the KMS key: `aws kms describe-key --key-id <alias/secret-key-id>` and verify the key policy grants `kms:Decrypt` to the caller.
  • 5If the caller is in a VPC, verify the VPC endpoint for Secrets Manager exists and has a policy that allows the action: `aws ec2 describe-vpc-endpoints --filters Name=service-name,Values=com.amazonaws.<region>.secretsmanager`.
  • 6Check CloudTrail for the exact error: `aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=GetSecretValue --start-time <5-min-ago>`.
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • searchIAM policy attached to the principal (user/role) — look for `secretsmanager:GetSecretValue` with the correct secret ARN in the Resource element.
  • searchKMS key policy — ensure the principal has `kms:Decrypt` permission on the key that encrypts the secret.
  • searchSecrets Manager resource policy — `aws secretsmanager get-resource-policy --secret-id <secret-arn>`.
  • searchVPC endpoint policy (if using PrivateLink) — check the policy document on the Secrets Manager VPC endpoint.
  • searchCloudTrail logs — filter by `GetSecretValue` and look for `errorCode` or `errorMessage`.
  • searchApplication logs — sometimes the SDK retries and the error is swallowed; grep for 'Secrets Manager' or 'AccessDeniedException'.
  • searchAWS Config rules if any — some compliance rules may restrict secret access.
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningIAM policy has `secretsmanager:GetSecretValue` on `*` but the secret's resource policy explicitly denies the principal.
  • warningKMS key is in a different account and the key policy does not grant `kms:Decrypt` to the calling account.
  • warningVPC endpoint policy is set to `Deny` all actions except a specific principal or condition that doesn't match.
  • warningThe secret was created with a custom KMS key, and the caller has `kms:Decrypt` but not `kms:GenerateDataKey` (not needed for GetSecretValue, but some SDKs try).
  • warningThe secret ARN or name changed after deletion/recreation, but the application uses a cached ARN.
  • warningIAM role's trust policy does not allow the service (e.g., Lambda) to assume the role, causing an implicit deny.
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildAdd `secretsmanager:GetSecretValue` to the IAM policy with the specific secret ARN as Resource, or use `arn:aws:secretsmanager:<region>:<account-id>:secret:*` cautiously.
  • buildModify the KMS key policy to include the calling principal with `kms:Decrypt` permission.
  • buildUpdate the VPC endpoint policy to allow the action from the principal or condition (e.g., source VPC, account).
  • buildRemove any explicit Deny in the secret's resource policy or adjust the condition keys.
  • buildRecreate the secret with the same name and update any hardcoded ARN in application configuration.
  • buildUse `aws secretsmanager list-secrets` to confirm the secret exists and note the correct ARN.
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedRun `aws secretsmanager get-secret-value --secret-id <secret-arn>` from the same environment (EC2, Lambda, etc.) and confirm it returns the secret.
  • verifiedCheck CloudTrail for a successful `GetSecretValue` event with the principal's ARN.
  • verifiedVerify the application no longer logs AccessDeniedException and can decrypt the secret value.
  • verifiedUse `aws iam simulate-custom-policy --action-names secretsmanager:GetSecretValue --resource-arns <secret-arn> --caller-arn <role-arn>` to test IAM policy without changing anything.
  • verifiedIf using VPC endpoint, run `curl` or `wget` from the EC2 instance to the Secrets Manager endpoint (e.g., `https://secretsmanager.<region>.amazonaws.com`) to confirm network connectivity.
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningDon't add broad `secretsmanager:*` permissions without resource constraints — it's a security risk and can mask other issues.
  • warningDon't assume the error is in IAM policy when the secret is encrypted with a customer-managed KMS key — check KMS first.
  • warningDon't forget that the Secrets Manager service itself uses the KMS key — the caller needs kms:Decrypt, not the secret itself.
  • warningDon't rely on the secret name alone — use the full ARN in policies for cross-account scenarios.
  • warningDon't confuse `secretsmanager:GetSecretValue` with `secretsmanager:ListSecrets` — listing secrets may work but retrieval may not.
  • warningDon't skip CloudTrail — it gives you the exact reason for the denial, often with details like missing permissions.
( 07 )War story

Lambda Can't Fetch Secret After VPC Migration

Senior Platform EngineerPython 3.9, AWS Lambda, ECS Fargate, Terraform, CloudTrail

Timeline

  1. 09:15PagerDuty alert: 'Failed to retrieve database credentials' from production Lambda function.
  2. 09:18Check Lambda logs: 'AccessDeniedException' from boto3 client when calling get_secret_value.
  3. 09:22Run `aws secretsmanager get-secret-value --secret-id prod/db/creds` from my local machine — works fine.
  4. 09:27Review IAM role for Lambda — has 'secretsmanager:GetSecretValue' on 'arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/*'.
  5. 09:32Check CloudTrail — event shows AccessDenied with 'User: arn:aws:sts::123456789012:assumed-role/my-lambda-role/my-function is not authorized to perform: kms:Decrypt'.
  6. 09:38Identify the KMS key — the secret uses a custom key 'alias/prod-secrets-key'.
  7. 09:40Check KMS key policy — no grant for the Lambda role.
  8. 09:45Add Lambda role to KMS key policy with 'kms:Decrypt' permission.
  9. 09:47Redeploy Lambda and test — secret retrieval succeeds.

The incident started with a PagerDuty alert at 9:15 AM. Our production Lambda function, which fetches database credentials from Secrets Manager, started failing. The error logs showed 'AccessDeniedException' when calling get_secret_value. My first instinct was to check IAM — I assumed the role policy had a typo or the secret name changed. But when I ran the CLI command from my laptop, it worked perfectly. That told me the secret itself was fine and the issue was specific to the Lambda execution context.

I then reviewed the Lambda's IAM role policy. It had a broad permission on secrets matching 'prod/*' — that looked correct. I checked CloudTrail next, and that's where the real cause surfaced: the error message mentioned 'kms:Decrypt'. The secret was encrypted with a customer-managed KMS key, and the Lambda role didn't have permission to decrypt it. I had completely forgotten that Secrets Manager uses KMS for encryption, and the caller needs both GetSecretValue and Decrypt permissions.

The KMS key policy was the culprit. It only allowed the root user and a few admins to use the key. I added the Lambda role's ARN to the key policy with 'kms:Decrypt' action, and within two minutes the function was working again. The lesson: always check KMS key policies when a secret retrieval fails, especially after migrating to a VPC or changing roles. The CloudTrail logs are your best friend — they explicitly tell you which permission is missing.

Root cause

The Lambda role had permission to access the secret but not to decrypt the KMS key that encrypted the secret, causing an AccessDeniedException.

The fix

Added the Lambda execution role ARN to the KMS key policy with 'kms:Decrypt' permission.

The lesson

Always verify KMS key permissions when troubleshooting Secrets Manager access denied errors — the IAM policy alone is insufficient if the secret uses a customer-managed key.

( 08 )Understanding the Auth Chain: IAM, KMS, and Resource Policies

When you call GetSecretValue, Secrets Manager performs three authorization checks: first, it verifies that the caller has `secretsmanager:GetSecretValue` on that secret (via IAM or resource policy). Second, if the secret is encrypted with a KMS key (default or custom), Secrets Manager calls KMS to decrypt the secret value. This requires the caller to have `kms:Decrypt` on that KMS key. Finally, any resource-based policy on the secret is evaluated. If any of these checks fail, you get an AccessDeniedException.

The most overlooked part is the KMS decryption. Many engineers assume that because the secret is in Secrets Manager, only the Secrets Manager API permissions matter. But the service acts as a proxy to KMS. You can verify this by checking CloudTrail: a failed GetSecretValue due to KMS will show the error message 'User is not authorized to perform: kms:Decrypt' in the event details. To fix, you must add the principal to the KMS key policy with the `kms:Decrypt` action.

( 09 )Cross-Account Secret Retrieval: ARN vs. Name

Retrieving a secret from another account requires more than just an IAM policy. The secret's resource policy must explicitly grant `secretsmanager:GetSecretValue` to the cross-account principal (or account). Additionally, if the secret uses a KMS key in the source account, that key's policy must allow `kms:Decrypt` from the calling account's principal. A common mistake is using the secret name instead of the full ARN in the IAM policy — names are not unique across accounts. Always use the ARN in cross-account scenarios.

To test cross-account access, use the ARN format: `arn:aws:secretsmanager:<region>:<source-account-id>:secret:<secret-name>-<random-suffix>`. Also, ensure the VPC endpoint (if any) is shared or accessible from the calling account. CloudTrail is indispensable here: it shows whether the failure is at the Secrets Manager layer (AccessDenied) or the KMS layer (kms:Decrypt).

( 10 )VPC Endpoints and Network Path Issues

When your application runs in a VPC (EC2, Lambda with VPC config, ECS), calls to Secrets Manager must go through a VPC endpoint or NAT gateway. If you use a VPC endpoint (PrivateLink), the endpoint's policy can restrict actions. A common issue is a policy that allows `secretsmanager:GetSecretValue` only for specific principals or conditions, but the calling principal doesn't match. For example, the policy might require `aws:SourceVpce` but the request comes from a different endpoint.

Symptoms of a VPC endpoint issue include timeouts or 'Unable to connect to endpoint' errors, not just AccessDenied. To diagnose, check the VPC endpoint policy via `aws ec2 describe-vpc-endpoints` and look at the `PolicyDocument`. Also verify that the security group associated with the endpoint allows inbound HTTPS from the application's security group. If the application uses a Lambda function in a VPC, ensure it has a route to the endpoint (e.g., via a VPC endpoint or NAT).

( 11 )Using CloudTrail to Pinpoint the Exact Denial Reason

CloudTrail is the single most powerful tool for debugging AccessDenied errors. Every Secrets Manager API call is logged, including the error details. Run: `aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=GetSecretValue --start-time $(date -u -d '10 minutes ago' '+%Y-%m-%dT%H:%M:%SZ')`. Look for events with `errorCode` set to 'AccessDenied'. The `errorMessage` field often includes the missing permission, e.g., 'not authorized to perform: kms:Decrypt'. You can also filter by the principal ARN to narrow down.

CloudTrail also logs the request parameters, including the secret ARN, which helps confirm you're accessing the correct secret. If no errors appear, but the application fails, check for network errors (timeouts) — those won't show in CloudTrail because the request never reached Secrets Manager. In that case, look at VPC flow logs or the application's DNS resolution.

( 12 )IAM Policy Simulation: Test Without Changing Permissions

Before modifying any policies, use the IAM policy simulator to test what actions are allowed. The command: `aws iam simulate-custom-policy --action-names secretsmanager:GetSecretValue --resource-arns arn:aws:secretsmanager:<region>:<account-id>:secret:mysecret --caller-arn arn:aws:iam::<account-id>:role/my-role`. This returns whether the action is allowed or denied, and if denied, which policy caused the denial. It supports both IAM policies and resource-based policies.

The simulator can also test KMS permissions: use `kms:Decrypt` as an action. However, it doesn't simulate KMS key policies directly — you must check those separately. But for IAM and resource policies, it's fast and safe. I always run this before making changes to confirm my diagnosis.

Frequently asked questions

Why can I list secrets but not get their values?

The `secretsmanager:ListSecrets` permission is separate from `secretsmanager:GetSecretValue`. Your IAM policy likely allows ListSecrets on all secrets but GetSecretValue is missing or restricted to specific secrets. Check the Resource element in your policy — ListSecrets works with Resource='*' but GetSecretValue often requires a specific secret ARN. Also, if the secret has a resource policy that denies GetSecretValue, it overrides an IAM allow.

Does the secret's KMS key need to be in the same region?

Yes, secrets in Secrets Manager are always encrypted with a KMS key in the same region. You cannot use a cross-region KMS key. If you need cross-region access, you must replicate the secret using Secrets Manager's multi-region replication feature, which creates a copy encrypted with a local KMS key.

My Lambda function is in a VPC and gets timeouts when accessing Secrets Manager. What's wrong?

Lambda functions in a VPC don't have internet access by default and need a VPC endpoint (AWS PrivateLink) for Secrets Manager or a NAT gateway. If you're using a VPC endpoint, ensure the endpoint policy allows the Lambda role to call GetSecretValue. Also, the Lambda function's security group must allow outbound HTTPS to the endpoint's security group. Check that the VPC has a route table entry pointing to the endpoint.

How do I grant cross-account access to a secret?

First, modify the secret's resource policy to allow the external account: `aws secretsmanager put-resource-policy --secret-id <arn> --resource-policy file://policy.json` where the policy has an Allow for `secretsmanager:GetSecretValue` with Principal set to the external account's root or a specific role ARN. Then, in the external account, attach an IAM policy allowing `secretsmanager:GetSecretValue` on the source secret's ARN. If the secret uses a customer-managed KMS key, update that key's policy to allow `kms:Decrypt` from the external principal.

Is there a way to rotate secrets without downtime if the secret retrieval fails?

Secrets Manager rotation works by creating a new version and then updating the secret. If GetSecretValue fails, the application may still be using the old version. To handle this gracefully, implement retries with exponential backoff in your application. Also, use the `--version-id` parameter to pin to a specific version during rotation. For production, ensure your IAM and KMS policies are correct before starting rotation to avoid AccessDenied errors during the process.