PostgreSQL Table Bloat from Failed VACUUM

What this usually means

PostgreSQL's MVCC architecture creates dead tuples on every UPDATE or DELETE. VACUUM is supposed to reclaim that space for reuse. When bloat persists, it means VACUUM is either not running, running but skipping pages, or unable to mark dead tuples as reusable because something is holding back the global xmin horizon. The most common culprits are long-running idle-in-transaction queries, prepared transactions that never commit, or stale replication slots that prevent cleanup of tuples that might still be needed by a lagging replica. Autovacuum thresholds can also be too low or workers misconfigured, but the trick is to identify which xid horizon blocker is active.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

1SELECT datname, usename, state, query, xact_start, backend_xmin FROM pg_stat_activity WHERE backend_xmin IS NOT NULL ORDER BY xact_start;
2SELECT slot_name, slot_type, database, xmin, catalog_xmin, active FROM pg_replication_slots WHERE xmin IS NOT NULL OR catalog_xmin IS NOT NULL;
3SELECT n_dead_tup, n_live_tup, n_tup_del, n_tup_upd, last_autovacuum, last_vacuum FROM pg_stat_user_tables WHERE relname = 'your_big_table';
4VACUUM VERBOSE your_big_table; -- look for 'skipped' or 'dead tuple' lines
5SELECT pg_size_pretty(pg_total_relation_size('your_big_table')); -- compare with estimated live size

( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

searchpg_stat_activity: Look for long-running transactions with backend_xmin set
searchpg_replication_slots: Check if xmin or catalog_xmin is far behind current xid
searchpg_stat_user_tables: Track n_dead_tup and last_autovacuum/vacuum timestamps
searchPostgreSQL logs: Search for 'autovacuum' and 'still running' or 'skipped' messages
searchpg_locks: Check for conflicting locks on the table that might block VACUUM
searchpg_stat_all_tables: Also check n_mod_since_analyze to see if autovacuum thresholds are hit

( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

warningLong-running idle-in-transaction query holds a backend_xmin that is older than the oldest dead tuple
warningStale replication slot with xmin set prevents cleanup of tuples that replica might need
warningPrepared transaction (PREPARE TRANSACTION) never committed or rolled back
warningAutovacuum naptime or scale factor set too high so VACUUM never triggers on the table
warningAutovacuum max_workers exhausted by other tables, leaving this table neglected
warningFillfactor set to 100% on the table, preventing HOT updates and causing more dead tuples

( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

buildKill the long-running transaction: pg_cancel_backend(pid) or pg_terminate_backend(pid) after verifying it's safe
buildRemove or disable stale replication slot: SELECT pg_drop_replication_slot('slot_name');
buildCommit or rollback prepared transactions: SELECT * FROM pg_prepared_xacts; then COMMIT PREPARED or ROLLBACK PREPARED
buildManually trigger VACUUM with aggressive settings: VACUUM (VERBOSE, INDEX_CLEANUP ON, TRUNCATE ON) your_table;
buildAdjust autovacuum settings: Increase autovacuum_vacuum_scale_factor, decrease autovacuum_vacuum_threshold, or set per-table storage parameters
buildIf bloat is severe, use pg_repack or VACUUM FULL (with downtime) to rebuild the table

( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

verifiedAfter fix, run VACUUM VERBOSE and confirm 'dead tuple' lines drop to near zero
verifiedCheck pg_stat_user_tables: n_dead_tup should decrease and last_vacuum update
verifiedMonitor disk space with df -h or pg_total_relation_size to see bloat reduce
verifiedRe-run the query from quick diagnosis to confirm no backend_xmin is stuck
verifiedTest performance: the same slow query should now scan fewer pages (EXPLAIN (ANALYZE, BUFFERS))

( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

warningDo not VACUUM FULL on a production table during peak hours—it locks the table exclusively
warningDo not kill all long-running transactions blindly; check if they are legitimate work
warningDo not drop a replication slot that is still in use by a replica—will break replication
warningDo not set autovacuum_vacuum_scale_factor to 0 without monitoring—can cause constant vacuuming
warningDo not ignore prepared transactions; they survive crashes and can block xmin indefinitely

( 07 )War story

The 500GB table that kept growing despite nightly VACUUM

Senior Site Reliability EngineerPostgreSQL 13 on Amazon RDS, db.r5.4xlarge, ~2TB database

Timeline

09:00On-call alert: disk usage on primary DB instance at 85% and growing. Table 'orders' is 500GB.
09:10Check pg_stat_user_tables: orders has 50 million dead tuples, last autovacuum was 2 days ago.
09:15Run VACUUM VERBOSE orders: 'skipped 0 pages, removed 1000 dead tuples' — clearly not matching 50 million.
09:20Query pg_stat_activity: find a connection with state 'idle in transaction', backend_xmin = 1234567890, running for 6 hours.
09:25Kill the idle transaction with pg_terminate_backend(pid).
09:30Re-run VACUUM VERBOSE orders: 'removed 48 million dead tuples, pages 0.8 million become empty'.
09:35Check disk space: 50GB freed immediately, additional space freed as VACUUM coalesces pages.
09:45Set statement_timeout to 30s and idle_in_transaction_session_timeout to 5min to prevent recurrence.

The on-call alert hit at 9am: disk usage on our primary PostgreSQL instance had climbed to 85% and was still rising. Our 'orders' table had ballooned to 500GB, and we knew from past experience that something was blocking VACUUM. I immediately checked pg_stat_user_tables and saw 50 million dead tuples, but the last autovacuum was two days ago. That was odd because autovacuum was enabled.

I ran VACUUM VERBOSE on the table and it only removed 1000 dead tuples—that confirmed the problem wasn't VACUUM itself, but something preventing it from seeing the dead tuples as reclaimable. I then queried pg_stat_activity for any backend_xmin, and found a single connection that had been 'idle in transaction' for 6 hours with a very old xmin. That transaction was holding back the global xmin horizon, so VACUUM couldn't mark any tuple older than that as reusable.

After confirming with the team that the transaction was abandoned (a developer left a query window open), I killed it with pg_terminate_backend. Immediately, the next VACUUM VERBOSE showed 48 million dead tuples removed. Within minutes, disk usage dropped by 50GB. We then hardened our connection pool to enforce idle_in_transaction_session_timeout and statement_timeout to prevent this from happening again. The lesson: always monitor backend_xmin, not just active queries.

Root cause

A single idle-in-transaction connection held an old backend_xmin, preventing VACUUM from reclaiming dead tuples.

The fix

Killed the idle transaction, then ran VACUUM to reclaim space. Set idle_in_transaction_session_timeout to 5 minutes.

The lesson

Monitor backend_xmin in pg_stat_activity as a critical health metric; set timeouts aggressively for idle transactions.

( 08 )Understanding the xmin Horizon and Bloat

PostgreSQL's MVCC works by keeping multiple versions of rows. When a row is updated or deleted, the old version becomes a dead tuple. VACUUM can only remove dead tuples that are older than the oldest active transaction's xmin. That xmin is the minimum of: all backend xmins (from pg_stat_activity.backend_xmin), replication slot xmins, and prepared transaction xmins. If any of those is very old, dead tuples pile up and bloat the table.

The key metric is the 'global xmin horizon'. You can find it with: SELECT datname, usename, state, query, xact_start, backend_xmin FROM pg_stat_activity WHERE backend_xmin IS NOT NULL ORDER BY xact_start;. Also check pg_replication_slots for xmin/catalog_xmin. If you see an xmin that is hours or days old, that's the blocker.

( 09 )Autovacuum vs. Manual VACUUM: When to Step In

Autovacuum is reactive: it triggers when dead tuples exceed threshold + scale_factor * live tuples. On large tables, scale_factor (default 0.2) means 20% of the table must be dead before autovacuum fires. That's often too late. For a 500GB table, 20% dead = 100GB of bloat before any vacuuming starts. Lower the scale_factor or set per-table storage parameters: ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.01);

When bloat is already high, manual VACUUM (without FULL) is safe and can run concurrently. But if the xmin horizon is stuck, even manual VACUUM will be ineffective. Always verify with VACUUM VERBOSE. If you see 'skipped' pages, the horizon is the issue. If you see many dead tuples removed but table size doesn't shrink, that's normal—VACUUM marks space reusable but doesn't return it to OS; only VACUUM FULL or pg_repack does that.

( 10 )Replication Slots: The Silent Bloat Accumulator

Replication slots are meant to prevent the primary from discarding WAL that a replica hasn't received. But in PostgreSQL 9.4+, slots also hold a catalog_xmin that prevents cleanup of dead tuples that might be needed by the replica's snapshots. If a replica is lagging or disconnected, that xmin can be very old, causing bloat on the primary even if no transactions are running.

Check pg_replication_slots for slots where xmin or catalog_xmin is far behind current xid. If a slot is no longer needed, drop it: SELECT pg_drop_replication_slot('slot_name');. If it's still needed, ensure the replica is up to date or consider using hot_standby_feedback instead (but that has its own trade-offs).

( 11 )Prepared Transactions: Zombie Blockers

Prepared transactions (PREPARE TRANSACTION) are designed for distributed transactions. They survive crashes and connection drops, and they hold an xmin until committed or rolled back. They are not visible in pg_stat_activity, only in pg_prepared_xacts. A forgotten prepared transaction can block VACUUM indefinitely.

To check: SELECT * FROM pg_prepared_xacts;. If you find one, either COMMIT PREPARED or ROLLBACK PREPARED. In emergency, you can recover by setting max_prepared_transactions to a positive value and restarting, but that's extreme. Better to monitor pg_prepared_xacts regularly.

( 12 )Advanced Diagnostics: Tracking Bloat with Queries

You can estimate bloat with a query from the PostgreSQL wiki: SELECT ... FROM pg_stats ... that calculates wasted space. But the most reliable method is to use the pgstattuple extension: CREATE EXTENSION pgstattuple; SELECT * FROM pgstattuple('orders'); This gives dead_tuple_count, dead_tuple_len, and free_space. Compare with live_tuple_count to see if bloat is significant.

For ongoing monitoring, set up alerts on pg_stat_user_tables.n_dead_tup > some threshold (e.g., 1 million) and last_autovacuum > 24 hours. Also monitor replication slot xmin age with: SELECT slot_name, age(xmin) FROM pg_replication_slots;. A high age (e.g., > 1 billion) indicates a problem.

Frequently asked questions

Why does VACUUM finish quickly but not reduce table size?

VACUUM marks space inside the table files as reusable for future tuples, but it does not return the space to the operating system. The table file size stays the same. To shrink the file, you need VACUUM FULL or pg_repack, but those require exclusive locks. If you see 'dead tuple' lines decreasing in VACUUM VERBOSE, VACUUM is working—just not releasing disk space.

How do I find which transaction is blocking VACUUM?

Run: SELECT pid, state, query, xact_start, backend_xmin FROM pg_stat_activity WHERE backend_xmin IS NOT NULL ORDER BY xact_start; The oldest xact_start (with backend_xmin) is the blocker. Also check pg_replication_slots for xmin and pg_prepared_xacts. The global xmin is the minimum of all these.

Can I prevent bloat from replication slots?

Yes. Monitor slot age with: SELECT slot_name, age(xmin) FROM pg_replication_slots;. Set a maximum age alert. If a replica is permanently down, drop the slot. Alternatively, use hot_standby_feedback on the replica to push its xmin to the primary, but that can cause bloat on the replica. Consider using replication slot with max_replication_slots and ensure replicas keep up.

Is autovacuum enough to prevent bloat?

Autovacuum handles normal workloads, but it cannot overcome blocked xmin horizons. It also has thresholds that may be too high for large tables. For critical tables, set per-table autovacuum tuning: ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_threshold = 1000); Also ensure autovacuum_max_workers is adequate (default 3 may be too low for many databases).

What should I do if VACUUM FULL is needed but I can't take downtime?

Use pg_repack, which rebuilds tables online with minimal locking (only a brief ACCESS EXCLUSIVE lock at the end). It requires the pg_repack extension and some setup. Alternatively, you can use VACUUM (VERBOSE, INDEX_CLEANUP ON) to at least clean indexes, which reduces bloat impact. For extreme cases, schedule a maintenance window.

PostgreSQL Table Bloat: When VACUUM Stops Working

What this usually means

Frequently asked questions