What this usually means
The app is running in a different environment than your machine. The difference could be anything: missing enviroment variables, a different Node version, a filesystem permission gap, a missing build artefact, a database that behaves differently, or a network boundary your local setup never crosses. The code itself is usually fine — the environment around it is wrong.
The first ten minutes \u2014 establish facts before touching code.
- 1Compare the runtime versions between local and production. Run `node --version`, `npm --version`, and check the language/runtime version in your deploy config or Dockerfile.
- 2List every environment variable the app reads at startup. Print them (excluding secrets) in a preflight log line. Check that production has every one.
- 3Read the first error in the production logs. Do not skip it. Do not guess. Copy the exact stack trace and trace backwards from the throw site.
- 4Diff your local `.env` (or config file) against the production config. One missing key, one wrong value, or one extra quote can cause the crash.
- 5Check if your local dev uses a different database, cache, or queue than production. A local SQLite vs production Postgres gap breaks more apps than any code change.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchRuntime version logs (startup lines, Dockerfile `FROM`, `engines` in package.json)
- searchEnvironment variable injection path (`.env`, secrets manager, CI/CD vars, Kubernetes ConfigMap)
- searchProduction logs — first error line and full stack trace
- searchBuild output vs runtime output comparison
- searchDatabase connection strings and driver versions
- searchFilesystem paths — `/tmp` vs `/var/tmp`, case sensitivity, permission bits
Practical causes, not theory. These are the things you will actually find.
- warningMissing environment variable that defaults to a local value during dev
- warningProduction runs a different Node/Go/Python version than local
- warningProduction database is a different engine (Postgres vs SQLite) or version
- warningBuild step produces artefacts that are not carried into the runtime container
- warningFilesystem path assumptions that work on macOS but fail on Linux
- warningNetwork egress blocked in production (firewall, security group, VPC)
- warningMemory or CPU limits in production that do not exist on a dev machine
Concrete fix directions. Pick the one that matches your root cause.
- buildAdd a startup preflight that checks required env vars and logs runtime versions before serving traffic
- buildPin the runtime version in your Dockerfile and in CI — use the same version local dev runs
- buildMake the production database available in a staging environment that mirrors production
- buildRun the production build command locally to catch artefact gaps before deploy
- buildAdd a health-check endpoint that verifies database, cache, and external dependency connectivity
A fix you cannot prove is a guess. Close the loop.
- verifiedDeploy to a staging environment that matches production. Run the same request that failed.
- verifiedCheck startup logs show the correct runtime version and all required env vars are set.
- verifiedRun the health-check endpoint after deploy — confirm 200 and all dependency checks pass.
- verifiedReproduce the exact production error locally by matching the production environment (Docker, same DB, same config).
- verifiedSet up a smoke test in CI that hits the deployed app and asserts expected behaviour.
Things that make this bug worse or harder to find.
- warningGuessing the cause without reading the production error log first
- warningAssuming local dev and production are the same because 'the code is the same'
- warningFixing the symptom (adding a try/catch) instead of fixing the environment gap
- warningDeploying a hotfix without testing it in a staging environment first
- warningNot adding a preflight check after the fix — the same gap will re-appear in 3 months