What this usually means
Databases differ between local and production: version, available extensions, data volume, concurrent load, permissions, and configuration. A migration that adds a column with a default value might take milliseconds on an empty table but locks a production table for minutes while rewriting every row. A migration that uses a Postgres 16 feature will fail on Postgres 14. A migration that requires a table lock will block all writes if the table is in active use.
The first ten minutes \u2014 establish facts before touching code.
- 1Check the database version in production vs local. `SELECT version()`. Are they the same major version?
- 2Check the data volume. `SELECT count(*) FROM table_name` locally vs production. A migration that is instant on 1,000 rows might time out on 10,000,000.
- 3Check the exact error message from the production migration. Is it a syntax error (version mismatch)? A lock timeout? A permission error?
- 4Check if the production database is being accessed concurrently during the migration. A migration that requires an exclusive lock will wait (or fail) if other queries hold locks.
- 5Check database user permissions. The local dev user might have SUPERUSER privileges that the production migration user does not.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchDatabase version — `SELECT version()` locally vs production
- searchTable sizes — `SELECT relname, n_live_tup FROM pg_stat_user_tables` locally vs production
- searchMigration framework logs — exact error, timing, which statement failed
- searchDatabase lock monitoring — `pg_locks` view, `pg_stat_activity` for blocking queries
- searchDatabase user permissions — `\du` or `SELECT * FROM pg_roles`
- searchMigration code — ALTER TABLE, CREATE INDEX, data backfill operations
- searchDatabase configuration — `statement_timeout`, `lock_timeout`, `maintenance_work_mem`
Practical causes, not theory. These are the things you will actually find.
- warningProduction database is an older version that does not support the SQL syntax used in the migration
- warningAdding a column with a DEFAULT value on a large table — this rewrites every row
- warningCreating an index on a large table without CONCURRENTLY — locks the table for writes
- warningData migration (backfill) processes millions of rows and exceeds the statement timeout
- warningMigration user lacks permission to run the DDL statement (CREATE INDEX, ALTER TABLE, etc.)
- warningAn uncommitted transaction holds a lock that the migration needs — migration waits until lock timeout
- warningDatabase replication lag — migration on primary succeeds but replicas fall behind
Concrete fix directions. Pick the one that matches your root cause.
- buildTest migrations on a production-like dataset — a staging database with comparable data volume
- buildUse `CREATE INDEX CONCURRENTLY` to avoid locking the table during index creation
- buildAdd columns without DEFAULT, then backfill in batches, then add the DEFAULT — avoids table rewrite
- buildSet a reasonable `statement_timeout` and `lock_timeout` for migrations — fail fast instead of blocking
- buildRun migrations during a maintenance window if they require exclusive locks
- buildVersion-control the database version used in all environments — use the same version everywhere
A fix you cannot prove is a guess. Close the loop.
- verifiedRun the migration on a staging database that mirrors production data volume and version.
- verifiedCheck the migration completes within the expected time window.
- verifiedVerify no queries are blocked during the migration by monitoring `pg_stat_activity`.
- verifiedAfter migration, verify data integrity: row counts, constraint validity, index health.
- verifiedTest rollback: run the down migration and confirm it works.
Things that make this bug worse or harder to find.
- warningRunning a migration in production without testing on a production-like dataset first
- warningAdding a column with DEFAULT on a large table without considering the table rewrite
- warningNot setting lock_timeout — a migration can wait indefinitely for a lock
- warningRunning migrations during peak traffic without understanding their locking behaviour
- warningNot having a rollback plan for every migration