What this usually means
A deadlock occurs when two or more transactions hold locks that the others need, creating a cycle. In PostgreSQL, the deadlock detector runs every 1 second (deadlock_timeout) and kills one of the transactions to break the cycle. The root cause is almost always concurrent transactions that acquire locks in different orders, often involving foreign keys, explicit locks (SELECT FOR UPDATE), or multiple updates. Non-obvious triggers include: different code paths locking tables A then B vs. B then A; implicit locks from foreign key checks; and lock escalation in partitioned tables.
The first ten minutes — establish facts before touching code.
- 1Check PostgreSQL logs for deadlock reports: grep 'deadlock detected' /var/log/postgresql/postgresql-*.log
- 2Extract the transaction IDs (xid) and backends from the log entry
- 3Query pg_stat_activity for those backends: SELECT * FROM pg_stat_activity WHERE pid IN (pid1, pid2);
- 4Query pg_locks to see current blocking locks: SELECT * FROM pg_locks WHERE NOT granted;
- 5Enable log_lock_waits and deadlock_timeout temporarily to capture more details: ALTER SYSTEM SET log_lock_waits = on; ALTER SYSTEM SET deadlock_timeout = '1s';
- 6Correlate the transaction IDs with application logs to find the exact code path
The specific files, logs, configs, and dashboards that usually own this bug.
- searchPostgreSQL log file: typically /var/log/postgresql/postgresql-<version>-main.log or /var/log/postgresql.log
- searchpg_stat_activity view: SELECT * FROM pg_stat_activity WHERE state != 'idle';
- searchpg_locks view: SELECT * FROM pg_locks WHERE NOT granted;
- searchpg_blocking_pids() function: SELECT pg_blocking_pids(pid) FROM pg_stat_activity;
- searchApplication server logs (e.g., Rails, Django, Spring) to correlate transaction boundaries
- searchPostgreSQL configuration file for deadlock_timeout and log_lock_waits settings
- searchQuery plan for suspicious queries: EXPLAIN (ANALYZE, BUFFERS) your_query;
Practical causes, not theory. These are the things you will actually find.
- warningTransactions acquire locks in inconsistent order across code paths (e.g., UPDATE table A then table B vs. B then A)
- warningForeign key constraints cause implicit locks on referenced tables that are not accounted for
- warningLong-running transactions holding locks while waiting for user input or slow network
- warningBulk updates or deletes that lock many rows and escalate to page or table locks
- warningMissing indexes causing full table scans that lock many rows
- warningUse of SELECT FOR UPDATE NOWAIT or SKIP LOCKED in high contention scenarios
Concrete fix directions. Pick the one that matches your root cause.
- buildEnforce a global lock order: always lock tables in the same sequence, documented and enforced in code
- buildReduce transaction duration: keep transactions short, avoid user interaction inside transactions
- buildUse retry logic with exponential backoff for deadlock victims (catch 40A01 and retry)
- buildAdd indexes to minimize lock scope: narrow down rows affected by UPDATE/DELETE
- buildConsider using advisory locks or optimistic locking instead of pessimistic locking
- buildFor bulk operations, batch in smaller chunks and commit frequently
A fix you cannot prove is a guess. Close the loop.
- verifiedRun the concurrent workload in a test environment and check pg_locks for lock waiting (no deadlock)
- verifiedSimulate the exact transaction interleaving using two psql sessions and verify no deadlock occurs
- verifiedMonitor pg_stat_activity for 'idle in transaction' sessions and ensure they do not accumulate
- verifiedEnable log_lock_waits and confirm no lock waits exceed deadlock_timeout
- verifiedPerform stress testing with increased concurrency to confirm fix under load
Things that make this bug worse or harder to find.
- warningIncreasing deadlock_timeout blindly—this just masks the problem and increases user-visible latency
- warningAssuming only explicit locks cause deadlocks; foreign keys and unique constraints can too
- warningRetrying the deadlocked transaction infinitely without backoff—this can cause livelock
- warningIgnoring lock order in stored procedures or triggers that run inside transactions
- warningUsing SELECT FOR UPDATE on rows that are read-only—consider snapshot isolation or MVCC
- warningNot logging deadlock details in production—always have log_lock_waits = on
Payment Processing Deadlock During Peak Load
Timeline
- 09:30PagerDuty alert: 5xx errors spike in payment API
- 09:32Check application logs: many 'deadlock detected' errors in payment_task
- 09:35SSH to RDS, query pg_stat_activity: two long-running transactions in idle-in-transaction state
- 09:38Extract deadlock log from PostgreSQL log file; see conflicting UPDATEs on orders and payments
- 09:42Identify the two code paths: task A updates order then payment; task B updates payment then order
- 09:45Kill the blocking backends: SELECT pg_terminate_backend(pid) for both
- 09:50Deploy fix: enforce consistent lock order across all payment tasks
- 10:00Monitor error rate drops to zero; confirm no further deadlocks
We were processing thousands of payment tasks concurrently via Celery. Each task updated the orders table, then the payments table, or vice versa, depending on the developer who wrote the function. Under normal load this worked fine, but during a flash sale, concurrency spiked and transactions started colliding. The PostgreSQL deadlock detector kicked in, killing one transaction per cycle, which caused retries and cascading timeouts.
I looked at the PostgreSQL log and saw two backends each holding a lock on one table and waiting for the other. The log entry included the query text, which let me trace back to two different Celery task functions. One function did UPDATE orders... then UPDATE payments..., the other did UPDATE payments... then UPDATE orders... The deadlock was textbook—a cycle in the lock graph.
I killed both backends to restore service, then we refactored the code to always update orders before payments. We also added retry logic with exponential backoff for any 40A01 errors as a safety net. After deploying, we ran the same flash sale scenario in staging and confirmed zero deadlocks. The lesson: enforce lock order globally, even if it means refactoring old code.
Root cause
Two concurrent code paths acquiring locks on 'orders' and 'payments' tables in opposite orders.
The fix
Standardized lock order: all tasks update orders first, then payments. Added retry logic for deadlock errors.
The lesson
Lock ordering must be a documented, enforced convention across the entire codebase—not left to individual developers.
When PostgreSQL detects a deadlock, it writes a detailed report to the server log. The report contains two sections under 'Process X waits for' and 'Process Y waits for'. Each section shows the transaction ID, the lock type (relation, tuple, page), and the query text. Look for the 'Relation' line to see which table is locked, and the 'Query' line to see the exact SQL.
Example: 'Process 12467 waits for ShareLock on transaction 123456; blocked by process 12468'. The report also shows the lock details: 'Lock: relation 16384 of database 16385' – you can map that to table name via pg_class. Use: SELECT relname FROM pg_class WHERE oid = 16384;. The victim transaction (the one killed) is marked with 'deadlock detected' and SQLSTATE 40A01.
If the deadlock is ongoing, query pg_stat_activity to see blocked backends. The pg_blocking_pids() function returns the PIDs blocking a given session. Combine with pg_locks to see which locks are held and waited on. Run: SELECT pid, wait_event_type, wait_event, state, query FROM pg_stat_activity WHERE wait_event IS NOT NULL;.
To find the exact lock conflict, join pg_locks with itself: SELECT a.pid AS blocked_pid, b.pid AS blocking_pid, a.relation FROM pg_locks a JOIN pg_locks b ON a.relation = b.relation AND a.locktype = b.locktype WHERE a.granted = false AND b.granted = true;. This shows the blocked and blocking PIDs per relation. Kill the blocking session only if necessary: SELECT pg_terminate_backend(pid);.
Foreign key constraints add implicit locks that developers often forget. When you UPDATE or DELETE a parent row, PostgreSQL may lock child rows to enforce the constraint. If two transactions update parent and child records in different orders, deadlocks occur. For example: Transaction 1 updates parent A, then child B; Transaction 2 updates child B, then parent A. Even though both update the same rows, the order of lock acquisition creates a cycle.
To detect this, look at the deadlock report for locks on 'pg_constraint' or system catalog tables. Also, check if the queries involve foreign key columns. The fix is to lock parent and child tables in a consistent order, or defer constraint checking using DEFERRABLE constraints if appropriate. Alternatively, avoid updating primary key columns that are referenced.
Even with perfect lock ordering, deadlocks can still happen in complex systems. Always implement retry logic for 40A01 errors. The retry should include exponential backoff and jitter to avoid thundering herd. Example in Python: for attempt in range(3): try: execute_transaction(); break except psycopg2.errors.DeadlockDetected: sleep(0.1 * (2 ** attempt)).
Consider lowering the isolation level to READ COMMITTED (default) if possible. SERIALIZABLE isolation adds overhead and increases deadlock probability due to predicate locks. Use advisory locks (pg_advisory_lock) for custom concurrency control, but be careful with ordering there too. Also, consider optimistic locking using version columns and retries instead of SELECT FOR UPDATE.
Frequently asked questions
How do I find which tables are involved in a deadlock from the PostgreSQL log?
In the deadlock report, look for lines like 'Relation: pg_class_oid' or 'Lock: relation 16384'. Get the OID and query: SELECT relname FROM pg_class WHERE oid = 16384;. The report also lists the query text, which usually includes the table name.
Can a deadlock occur without explicit SELECT FOR UPDATE?
Yes. UPDATE, DELETE, and even INSERT with unique indexes can acquire locks. Foreign key constraints add implicit locks on referenced tables. Also, row-level locks can escalate to page or table locks under certain conditions (e.g., during autovacuum or when updating many rows).
Should I increase deadlock_timeout to reduce deadlock frequency?
No. Increasing deadlock_timeout only makes PostgreSQL check for deadlocks less often, so transactions wait longer before a deadlock is detected. This increases latency and connection pool pressure. Fix the root cause instead.
How do I reproduce a deadlock in development?
Open two psql sessions, begin a transaction in each, and execute statements that lock resources in opposite orders. For example: Session 1: BEGIN; UPDATE orders SET ... WHERE id=1; (hold); Session 2: BEGIN; UPDATE payments SET ... WHERE id=1; (hold); Then Session 1: UPDATE payments ...; Session 2: UPDATE orders ...; One will receive a deadlock error.
What is the difference between a deadlock and a lock wait?
Lock wait is one transaction waiting for another to release a lock (no cycle). Deadlock is a cycle of two or more transactions each waiting for a lock held by another. Lock waits can be resolved by the blocking transaction completing; deadlocks require PostgreSQL to kill one transaction. Monitor lock waits with log_lock_waits.