What this usually means
A 500 means an unhandled exception. If it only happens in production, something in the production environment triggers a code path your local environment never reaches. It could be production-only data (a user record with a null field your local data never has), a production-only configuration (a feature flag that changes behaviour), a production-only integration (a third-party API that behaves differently), or a production-only load pattern (a race condition that only appears under concurrent requests).
The first ten minutes \u2014 establish facts before touching code.
- 1Get the full stack trace from production logs. Do not guess — the stack trace tells you exactly which line threw.
- 2Check what is different about the failing request. Is it a specific user, a specific input value, a specific time of day?
- 3Add error tracking (Sentry, Datadog, or a simple try/catch with full context logging) if production errors are not logged with enough detail.
- 4Check if the error correlates with a recent deployment. Was it working before and broke after the last release?
- 5Look for null/undefined access in the code path. Production data often has missing or unexpected values that local seed data does not.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchProduction error logs — full stack trace with line numbers
- searchError tracking tool (Sentry, Datadog, Bugsnag) — request context, user data, breadcrumbs
- searchThe specific request payload that triggered the 500
- searchRecent deployment diff — what changed?
- searchDatabase — the specific record being accessed during the error
- searchExternal API integrations — are they reachable from production?
- searchServer resource usage — memory, CPU, disk at the time of the error
Practical causes, not theory. These are the things you will actually find.
- warningNull or undefined value in production data that local test data always has
- warningProduction database has different schema, constraints, or data types
- warningThird-party API returns unexpected response format in production
- warningProduction environment has stricter security settings that block a request
- warningRace condition that only appears under concurrent production traffic
- warningMemory limit reached in production but not locally
- warningA dependency behaves differently in production mode (minification, tree-shaking, env-specific code)
Concrete fix directions. Pick the one that matches your root cause.
- buildAdd structured error logging that captures the full request context (input, user ID, timestamp) with every 500
- buildAdd input validation that fails fast with a 400 instead of letting bad data cause a 500 downstream
- buildUse an error tracking service to aggregate production errors and see patterns across requests
- buildCreate a staging environment with production-like data to reproduce the error before deploying fixes
- buildAdd a global error handler that catches unhandled exceptions and returns a safe error response with a correlation ID
A fix you cannot prove is a guess. Close the loop.
- verifiedReproduce the error in staging with production-like data and the same request payload.
- verifiedDeploy a fix and monitor the error rate — it should drop to zero for that specific error.
- verifiedAdd a regression test that covers the edge case (null value, missing field, unexpected response).
- verifiedCheck that the error tracking tool shows the fix resolved the issue across all instances.
- verifiedRun a load test against the endpoint to ensure the fix holds under production traffic levels.
Things that make this bug worse or harder to find.
- warningDeploying a fix without understanding the root cause because 'it works locally'
- warningCatching all errors with a blanket try/catch and returning 200 — this hides bugs
- warningNot logging enough context with production errors — a stack trace without request data is hard to debug
- warningAssuming production data looks like local seed data
- warningNot setting up error alerting — you should know about 500s before users report them