What this usually means
Cognito User Pool authentication errors typically fall into three categories: configuration drift (e.g., mismatched app client settings, callback URLs), token lifecycle issues (expired/revoked tokens, incorrect audience/issuer), or custom authentication flow bugs (incorrect Lambda trigger responses, missing challenge answers). Unlike OIDC providers, Cognito's default flows hide many details in the SDK layer, making it easy to confuse client-side errors with server misconfigurations. The most insidious bugs come from partial deployments—updating a trigger but not re-signing tokens, or changing a client secret without updating the client app.
The first ten minutes — establish facts before touching code.
- 1Run `aws cognito-idp describe-user-pool --user-pool-id <id>` and check `LambdaConfig` for misconfigured triggers, especially PreSignUp and DefineAuthChallenge.
- 2Use `aws cognito-idp list-users --user-pool-id <id> --filter 'email="user@example.com"'` to confirm the user exists and their status.
- 3Decode the JWT access token via jwt.io and verify `iss`, `aud`, `token_use`, and `exp` fields.
- 4Check CloudWatch Logs for the Lambda trigger functions (look for exceptions, timeouts, or missing response keys).
- 5Test authentication flow with explicit refresh: `aws cognito-idp initiate-auth --auth-flow REFRESH_TOKEN_AUTH --auth-parameters REFRESH_TOKEN=<token>`
The specific files, logs, configs, and dashboards that usually own this bug.
- searchCloudWatch Log Groups: /aws/lambda/<user-pool-trigger-function-name>
- searchCognito User Pool console: App client settings, triggers, and domain configuration
- searchAWS CloudTrail events: cognito-idp:InitiateAuth, cognito-idp:RespondToAuthChallenge
- searchApplication logs: token decoding results, SDK error messages (e.g., 'expired token' vs 'invalid signature')
- searchAPI Gateway settings: authorizer Lambda ARN, token validation method, TTL
- searchIAM roles: check if the Lambda trigger has permission to call Cognito APIs (e.g., AdminUpdateUserAttributes)
Practical causes, not theory. These are the things you will actually find.
- warningLambda trigger returns incorrect challenge response (e.g., missing 'answerCorrect' boolean)
- warningApp client secret mismatch between server and client SDK configuration
- warningCallback URL mismatch: allowed callback URL in app client doesn't match the request's redirect_uri
- warningToken expiry: access/ID token expires before client uses it (default 1 hour)
- warningUser pool domain changed after tokens issued: issuer mismatch invalidates existing tokens
- warningCustom authentication flow not completing all required challenge steps
Concrete fix directions. Pick the one that matches your root cause.
- buildFor trigger misconfiguration: validate Lambda response format per Cognito docs (e.g., PreSignUp must return `{ "autoConfirmUser": true }` if auto-confirm needed)
- buildFor token expiry: implement token refresh logic using refresh token before access token expires, or increase token validity in user pool settings
- buildFor client secret mismatch: regenerate client secret and update client configuration, or use public client (no secret) for mobile apps
- buildFor callback mismatch: list allowed callback URLs in app client settings and ensure redirect_uri matches exactly (including trailing slashes)
- buildFor failed custom auth: add logging to Lambda triggers to trace challenge steps, verify challenge answer format matches expected
A fix you cannot prove is a guess. Close the loop.
- verifiedRun a full authentication flow via AWS CLI: `aws cognito-idp admin-initiate-auth --user-pool-id <id> --client-id <client-id> --auth-flow ADMIN_USER_PASSWORD_AUTH --auth-parameters USERNAME=test,PASSWORD=test`
- verifiedDecode ID and access tokens after each step and validate claims
- verifiedCheck CloudWatch Logs for trigger invocations—confirm expected input/output
- verifiedUse `aws cognito-idp describe-user-pool-client --user-pool-id <id> --client-id <client-id>` to verify client settings
- verifiedPerform a token refresh cycle and verify new tokens are issued
Things that make this bug worse or harder to find.
- warningHardcoding token expiry checks that differ from Cognito's actual expiry time
- warningIgnoring the difference between ID token and access token when validating on API Gateway
- warningAssuming triggers are synchronous—Cognito expects a response within 5 seconds
- warningForgetting to update the client secret on the app side after rotating it in Cognito
- warningUsing the wrong user pool ID (e.g., staging vs production) in SDK configuration
Production Outage: Custom Auth Lambda Returns Wrong Boolean
Timeline
- 09:15PagerDuty alert: 401 errors spike to 80% of API requests from mobile app
- 09:20Check CloudWatch: API Gateway logs show 'Unauthorized' with token present
- 09:25Decode token via jwt.io: claims look valid, not expired
- 09:30Check Cognito triggers: PreSignUp Lambda was updated 2 hours ago
- 09:35Look at PreSignUp logs: function returns `{ "autoConfirmUser": "true" }` (string, not boolean)
- 09:40Rollback Lambda to previous version
- 09:45Errors drop to 0%; confirm user can authenticate
- 09:50Root cause: code change introduced string 'true' instead of boolean true in trigger response
Monday morning, 9:15 AM. Our mobile app's authentication started failing for all users. The React Native client was getting 401s on every API call, even though users had just logged in. The first thing I did was check the API Gateway logs—tokens were being passed, but the Lambda authorizer was rejecting them. I decoded a token on jwt.io: all claims looked fine, not expired, correct audience. So it wasn't a token issue.
I moved to Cognito's triggers. The PreSignUp Lambda had been updated two hours prior to add a new field. I checked CloudWatch Logs for that Lambda and saw that every invocation returned `{ "autoConfirmUser": "true" }`—with quotes around true! Cognito expects a boolean, not a string. Because the response was invalid, Cognito silently failed to confirm the user, leaving the user status as 'UNCONFIRMED'. The authentication flow then failed because the user wasn't confirmed.
We rolled back the Lambda to the previous version immediately. Errors dropped to zero within minutes. The lesson: always validate the exact response schema Cognito expects for Lambda triggers. A simple type mismatch can break the entire auth flow. Now we have integration tests that mock Cognito trigger responses and verify the format.
Root cause
PreSignUp Lambda returned `autoConfirmUser` as a string `"true"` instead of boolean `true`, causing Cognito to silently fail to confirm the user.
The fix
Rolled back the Lambda to the previous version. Fixed the code to return boolean `true` and re-deployed after thorough testing.
The lesson
Always validate the exact data types in Cognito Lambda trigger responses. Use strict TypeScript or Zod schemas to enforce types. Add integration tests that simulate Cognito's expected response.
Cognito User Pool authentication can use built-in flows (USER_PASSWORD_AUTH, REFRESH_TOKEN_AUTH) or custom authentication with Lambda triggers. The custom flow involves a challenge-response cycle where DefineAuthChallenge, CreateAuthChallenge, and VerifyAuthChallengeResponse Lambdas coordinate. Each trigger has a strict response schema—deviating from it causes silent failures.
Common pitfalls: returning a string instead of boolean, missing required keys (e.g., `issueTokens`, `failAuthentication`), or returning extra keys that Cognito ignores. Always log the exact response your Lambda returns and compare it to the AWS documentation.
Cognito issues three tokens: ID token, access token, and refresh token. The ID token contains user claims and is meant for the client. The access token is for resource servers (e.g., API Gateway). The refresh token is long-lived and used to get new tokens. When validating tokens, check `iss` (must match your user pool's issuer URL), `aud` (must match your app client ID for ID token, or resource server for access token), `token_use` (access or id), and `exp`.
A frequent issue is using the wrong token type for authorization. API Gateway authorizer expects an access token, not an ID token. Also, if you change your user pool domain, all existing tokens become invalid because the `iss` claim changes. Always renew tokens after domain changes.
Custom authentication flows are powerful but opaque. Use CloudWatch Logs extensively: log every challenge step, the session object, and the response. The `session` parameter contains challenge metadata—ensure you're handling `CUSTOM_CHALLENGE` events correctly.
A common bug: the VerifyAuthChallengeResponse Lambda expects `answerCorrect` as a boolean, but the CreateAuthChallenge Lambda might send a challenge that the client answers incorrectly due to encoding differences (e.g., base64 vs plain text). Validate the challenge and answer formats at every step.
Default token expiry: access/ID tokens live 1 hour, refresh token lives 30 days. If your client experiences 401s after an hour, it's likely token expiry. Implement refresh logic: use `REFRESH_TOKEN_AUTH` flow to get new tokens. Note that refresh tokens can be revoked if the user changes password or if you call `AdminRevokeToken`.
Another edge case: if you have multiple app clients, ensure you're using the correct client ID and secret when refreshing. A mismatch causes 'InvalidRefreshTokenException'. Also, refresh tokens are tied to the client secret—if you rotate the secret, existing refresh tokens become invalid.
Frequently asked questions
Why does my Lambda trigger work in the test console but fail in production?
The test console sends a static event, but production triggers include additional fields like `userAttributes`, `clientMetadata`, and `validationData`. Your Lambda might be missing handling for these fields, causing it to crash or return unexpected errors. Check CloudWatch Logs for the actual event payload.
Can I use the same access token across multiple API Gateway endpoints?
Yes, as long as the token's `aud` claim matches the API Gateway's configured audience. If you have multiple resource servers, you might need separate access tokens. For a single user pool, you can usually use one access token for all endpoints that trust that user pool.
What causes 'NotAuthorizedException' during sign-in with correct credentials?
This often means the user is not confirmed (check user status via `admin-get-user`). Other causes: the user is disabled, the app client is not allowed to use the authentication flow (e.g., USER_PASSWORD_AUTH enabled?), or the client secret is missing or wrong.
How do I invalidate all tokens for a user immediately?
Call `AdminUserGlobalSignOut` to invalidate all sessions and refresh tokens. Alternatively, change the user's password (triggers token revocation). For immediate revocation of a single token, use `AdminRevokeToken` with the refresh token.
Why does my custom auth flow hang indefinitely?
Check the DefineAuthChallenge Lambda: it must return `issueTokens`, `failAuthentication`, or `challengeName` in the response. If it returns an unexpected value, the flow stalls. Also ensure the client is sending the correct session and challenge answers.