What this usually means
Multiple concurrent transactions are trying to SELECT FOR UPDATE the same row(s) and blocking each other. PostgreSQL row locks are held until the transaction ends (commit or rollback). If the transaction does additional work while holding the lock—like slow I/O, API calls, or user input—the lock hold time extends, causing a queue. When the queue becomes deep, deadlocks become likely if transactions acquire locks in different orders. This is not a database bug; it's a concurrency design problem where lock granularity and transaction duration are mismatched to the workload.
The first ten minutes — establish facts before touching code.
- 1Run `SELECT pid, state, wait_event, query FROM pg_stat_activity WHERE wait_event IS NOT NULL;` to spot sessions waiting on locks.
- 2Query pg_locks with `SELECT * FROM pg_locks WHERE NOT granted;` to see which locks are being waited on.
- 3Use `pg_blocking_pids(pid)` on a waiting session to identify the blocker PID.
- 4Check `pg_stat_activity` for the blocker PID—often idle in transaction holding the lock.
- 5Enable log_lock_waits = on and deadlock_timeout = 1s in postgresql.conf to capture lock wait events in logs.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchpg_stat_activity: view current queries, wait events, and transaction start times.
- searchpg_locks: detailed lock information including relation, tuple, and granted status.
- searchPostgreSQL server log (typically /var/log/postgresql/postgresql-*.log) for deadlock details and lock wait durations.
- searchpg_stat_database: check xact_commit, xact_rollback, and conflicts to gauge contention.
- searchApplication code: transaction boundaries and SELECT FOR UPDATE usage patterns.
- searchpg_stat_user_tables: n_tup_upd, n_tup_del for hot rows.
- searchpg_stat_statements: identify queries with high max_time or mean_time.
Practical causes, not theory. These are the things you will actually find.
- warningSELECT FOR UPDATE inside a long-running transaction that makes external API calls or waits for user input.
- warningMultiple transactions updating the same 'hot' row (e.g., a counter or status column).
- warningIndex missing on WHERE clause in SELECT FOR UPDATE causing full table lock escalation? No, but missing index leads to many rows locked.
- warningApplication retry logic that re-issues SELECT FOR UPDATE without releasing previous lock (nested transactions).
- warningConnection pooling with 'idle in transaction' sessions that forgot to commit/rollback.
- warningUsing SELECT FOR UPDATE with NOWAIT or SKIP LOCKED incorrectly, leading to starvation.
Concrete fix directions. Pick the one that matches your root cause.
- buildMinimize transaction duration: move all slow operations (API calls, file I/O) outside the transaction.
- buildUse SELECT FOR UPDATE NOWAIT or SKIP LOCKED if the business logic can tolerate skipping or immediate failure.
- buildAdd an index on the columns used in the WHERE clause to reduce the number of rows locked.
- buildRedesign application to use optimistic locking (e.g., version column) instead of pessimistic locking.
- buildImplement a queue or batch processor to serialize access to the contested row.
- buildSet statement_timeout and lock_timeout to prevent indefinite waits.
A fix you cannot prove is a guess. Close the loop.
- verifiedRun the locking query before and after fix: `SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';` should show zero waiting sessions.
- verifiedMonitor application response times: p99 latency should drop to baseline.
- verifiedCheck for deadlocks in logs after fix: deadlock error count should be zero.
- verifiedUse pgbench with a concurrent workload to simulate contention and confirm fix.
- verifiedVerify that no sessions are idle in transaction longer than a threshold (e.g., 1 second).
Things that make this bug worse or harder to find.
- warningAdding an index without analyzing query plan—might not help if WHERE clause is selective.
- warningUsing SKIP LOCKED without understanding that skipped rows may never be processed (starvation).
- warningKilling blocking sessions from the database without handling in application—can cause data corruption if mid-transaction.
- warningAssuming row locks are released on commit but forgetting that uncommitted transactions hold locks.
- warningBlindly increasing max_connections—more connections means more concurrent transactions competing for locks, making contention worse.
The Booking System Meltdown: SELECT FOR UPDATE Gone Wrong
Timeline
- 10:02PagerDuty alert: p99 API latency for /bookings/create spikes from 50ms to 15s.
- 10:05Check pg_stat_activity: 40 sessions on 'idle in transaction' waiting for 'transactionid' lock.
- 10:08Identify blocker PID 1234: 'SELECT ... FOR UPDATE' on seats table, idle for 5 minutes.
- 10:10Check application logs: PID 1234 is a booking request that triggered an external payment gateway call inside the transaction.
- 10:12Blocking session's payment gateway timeout is 120 seconds; transaction holds lock for entire duration.
- 10:15Kill blocking session (pg_terminate_backend). Queue clears; latency drops to 100ms.
- 10:20Deploy hotfix: move payment call outside transaction, use SELECT FOR UPDATE NOWAIT plus retry logic.
- 10:45Monitoring confirms no further lock contention; p99 latency stable at 55ms.
It was a typical Tuesday morning until our booking API started timing out. The alert showed p99 latency at 15 seconds, up from 50ms. I immediately checked pg_stat_activity and saw dozens of sessions waiting on 'transactionid' locks. The common thread: all were trying to SELECT FOR UPDATE on the same row in the 'seats' table. The blocker was a single session that had been idle for 5 minutes.
I traced the blocker's transaction to a code path that made a payment gateway call while holding the row lock. The gateway call had a 120-second timeout, so that one transaction blocked every subsequent booking request for the same seat. The application had no timeout on the SELECT FOR UPDATE itself, so requests piled up indefinitely.
After killing the blocker, I deployed a fix that moved the payment call outside the transaction and switched to SELECT FOR UPDATE NOWAIT with a retry loop. We also added a lock_timeout at the database level as a safety net. The fix cut p99 latency back to 55ms and eliminated the lock queue entirely. The lesson: never hold row locks across external I/O, and always have a timeout.
Root cause
SELECT FOR UPDATE inside a long transaction with an external API call (payment gateway) that blocked all subsequent requests for the same row.
The fix
Move payment call outside transaction; use SELECT FOR UPDATE NOWAIT with retry; add lock_timeout.
The lesson
Row locks are held until transaction end; never perform slow or external operations inside a transaction that holds locks.
When you issue SELECT FOR UPDATE, PostgreSQL locks the returned rows using a tuple-level lock. This lock is stored in the tuple header and is visible in pg_locks as a 'tuple' lock with locktype 'relation'? Actually, row locks appear as 'transactionid' locks when waiting, but the actual row lock is on the tuple. The key point: the lock is held until the transaction commits or rolls back. If the transaction is long, other transactions trying to SELECT FOR UPDATE the same row will wait.
The waiting mechanism: each waiting transaction creates a 'transactionid' lock on the blocking transaction's XID. The blocker's XID is visible in pg_locks as a granted lock. The wait queue is FIFO, but deadlocks can occur if there's a cycle. PostgreSQL's deadlock detection runs every deadlock_timeout (default 1s) and will abort one of the transactions.
The most effective query to find the blocking chain: SELECT pid, pg_blocking_pids(pid) AS blocked_by, state, query, wait_event FROM pg_stat_activity WHERE wait_event_type = 'Lock';. This gives you the PID of the blocked session and which PIDs are blocking it. You can then look up the blocker's query and transaction start time.
For deeper analysis, query pg_locks with joins: SELECT a.pid, a.usename, a.state, a.query, l.locktype, l.mode, l.granted FROM pg_stat_activity a JOIN pg_locks l ON a.pid = l.pid WHERE NOT l.granted;. This shows all ungraunted locks and the sessions waiting. The blocker will have granted locks on the same object.
PostgreSQL automatically detects deadlocks by building a wait-for graph. When it finds a cycle, it chooses a victim transaction based on the amount of work done (least recently started? Actually, it picks the transaction that is cheapest to rollback, but the algorithm is not documented precisely). The victim's transaction is aborted and a deadlock error is reported to the client.
To log deadlock details, set log_line_prefix to include %p (PID) and enable log_lock_waits and log_min_messages. The deadlock report in logs includes the query text of all involved transactions. This is invaluable for root cause analysis. Example log entry: 'process N detected deadlock while waiting for ShareLock on transaction M; details: Process N waits for ShareLock on transaction M; blocked by process K.'
Beyond the basics, consider using advisory locks for application-level coordination. They are lighter and can be used for custom concurrency control. For example, use pg_advisory_xact_lock() with a unique ID representing the business object. This decouples from row locks.
Another approach is to use a materialized queue: insert a 'booking request' row into a queue table, then have a background worker process requests sequentially. This avoids row locks altogether. For high contention, consider partitioning the hot table or using a separate 'counter' table with atomic updates (UPDATE ... RETURNING) to avoid SELECT FOR UPDATE.
Set up alerts on pg_stat_activity.count where wait_event_type = 'Lock' exceeding a threshold (e.g., >5). Also monitor pg_locks for number of ungraunted locks. Use a tool like pgwatch2 or Datadog to track these metrics.
Enable log_lock_waits with a low deadlock_timeout (e.g., 200ms) to capture lock waits quickly. This can cause log spam but is essential for debugging. Also, set log_min_duration_statement to log slow queries, which often correlate with lock contention.
Frequently asked questions
Does SELECT FOR UPDATE lock the entire table or just the rows returned?
It locks only the rows returned (matching the WHERE clause). However, if the WHERE clause doesn't use an index, PostgreSQL may lock many rows or even the entire table. Ensure the WHERE clause is index-backed to minimize locked rows.
What's the difference between NOWAIT and SKIP LOCKED?
NOWAIT: if the row is already locked, the query immediately returns an error (cannot acquire lock). SKIP LOCKED: skip over locked rows and return only unlocked rows (the locked rows are simply not returned). Use NOWAIT when you want to fail fast; use SKIP LOCKED when you can process a different subset.
Can I see which rows are locked?
PostgreSQL does not expose which specific rows (tuples) are locked in a simple way. You can use the pageinspect extension to examine tuple headers, but it's complex. For practical debugging, focus on which queries and transactions are blocked.
How do I set a timeout for SELECT FOR UPDATE?
Use the lock_timeout setting (SET lock_timeout = '5s';) to limit how long a statement waits for any lock. Alternatively, use statement_timeout for overall statement duration. These can be set per session or globally.
Why does idle in transaction hold locks?
Because the transaction hasn't committed or rolled back yet. The locks are released only at transaction end. An 'idle in transaction' session is still holding all its locks. Always ensure transactions are short and commit quickly.