All guides

LEARN \u00b7 DEBUGGING GUIDE

Database migration works locally but fails in production: how to debug it

You write a migration. It runs on your local Postgres 16. It fails on the production Postgres 14 with a syntax error. Or it passes locally with 100 rows but times out on production with 10 million rows.

AdvancedDatabase/debugging

What this usually means

Databases differ between local and production: version, available extensions, data volume, concurrent load, permissions, and configuration. A migration that adds a column with a default value might take milliseconds on an empty table but locks a production table for minutes while rewriting every row. A migration that uses a Postgres 16 feature will fail on Postgres 14. A migration that requires a table lock will block all writes if the table is in active use.

( 01 )Fast diagnosis

The first ten minutes \u2014 establish facts before touching code.

  • 1Check the database version in production vs local. `SELECT version()`. Are they the same major version?
  • 2Check the data volume. `SELECT count(*) FROM table_name` locally vs production. A migration that is instant on 1,000 rows might time out on 10,000,000.
  • 3Check the exact error message from the production migration. Is it a syntax error (version mismatch)? A lock timeout? A permission error?
  • 4Check if the production database is being accessed concurrently during the migration. A migration that requires an exclusive lock will wait (or fail) if other queries hold locks.
  • 5Check database user permissions. The local dev user might have SUPERUSER privileges that the production migration user does not.
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • searchDatabase version — `SELECT version()` locally vs production
  • searchTable sizes — `SELECT relname, n_live_tup FROM pg_stat_user_tables` locally vs production
  • searchMigration framework logs — exact error, timing, which statement failed
  • searchDatabase lock monitoring — `pg_locks` view, `pg_stat_activity` for blocking queries
  • searchDatabase user permissions — `\du` or `SELECT * FROM pg_roles`
  • searchMigration code — ALTER TABLE, CREATE INDEX, data backfill operations
  • searchDatabase configuration — `statement_timeout`, `lock_timeout`, `maintenance_work_mem`
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningProduction database is an older version that does not support the SQL syntax used in the migration
  • warningAdding a column with a DEFAULT value on a large table — this rewrites every row
  • warningCreating an index on a large table without CONCURRENTLY — locks the table for writes
  • warningData migration (backfill) processes millions of rows and exceeds the statement timeout
  • warningMigration user lacks permission to run the DDL statement (CREATE INDEX, ALTER TABLE, etc.)
  • warningAn uncommitted transaction holds a lock that the migration needs — migration waits until lock timeout
  • warningDatabase replication lag — migration on primary succeeds but replicas fall behind
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildTest migrations on a production-like dataset — a staging database with comparable data volume
  • buildUse `CREATE INDEX CONCURRENTLY` to avoid locking the table during index creation
  • buildAdd columns without DEFAULT, then backfill in batches, then add the DEFAULT — avoids table rewrite
  • buildSet a reasonable `statement_timeout` and `lock_timeout` for migrations — fail fast instead of blocking
  • buildRun migrations during a maintenance window if they require exclusive locks
  • buildVersion-control the database version used in all environments — use the same version everywhere
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedRun the migration on a staging database that mirrors production data volume and version.
  • verifiedCheck the migration completes within the expected time window.
  • verifiedVerify no queries are blocked during the migration by monitoring `pg_stat_activity`.
  • verifiedAfter migration, verify data integrity: row counts, constraint validity, index health.
  • verifiedTest rollback: run the down migration and confirm it works.
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningRunning a migration in production without testing on a production-like dataset first
  • warningAdding a column with DEFAULT on a large table without considering the table rewrite
  • warningNot setting lock_timeout — a migration can wait indefinitely for a lock
  • warningRunning migrations during peak traffic without understanding their locking behaviour
  • warningNot having a rollback plan for every migration