What this usually means
High replication lag means the standby cannot keep up with the write rate on the primary. The bottleneck is almost never just 'the network is slow'—it's usually a combination of insufficient WAL retention, disk I/O contention on the standby (especially during replay of large transactions), or query conflicts on the standby that block replay. The pg_stat_replication view gives you write_lag, flush_lag, and replay_lag. If write_lag is high but flush_lag and replay_lag are low, the network is the problem. If flush_lag and replay_lag are both high, the standby's I/O or CPU is saturated. If replay_lag > flush_lag, replay is stuck—often due to long-running queries on the standby conflicting with WAL replay.
The first ten minutes — establish facts before touching code.
- 1Run: SELECT application_name, state, write_lag, flush_lag, replay_lag FROM pg_stat_replication; on the primary.
- 2Check last reported WAL position: SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp(); on the standby.
- 3Monitor standby replay rate: SELECT now() - pg_last_xact_replay_timestamp() AS lag; on the standby.
- 4Look at pg_stat_activity on standby for queries with state 'active' and wait_event 'WAL' or 'DataFileRead'.
- 5Check primary WAL generation rate: SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / (extract(epoch from now() - pg_postmaster_start_time()) * 1024*1024) AS mb_per_sec;
- 6Verify wal_keep_segments (or wal_keep_size) and max_wal_size on primary.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchPrimary: pg_stat_replication view
- searchStandby: pg_stat_recovery_endpoint, pg_stat_activity, pg_stat_bgwriter
- searchPrimary: PostgreSQL logs for 'WAL segment removed' errors
- searchNetwork: iperf3 test between primary and standby, tcpdump for retransmits
- searchOS: iostat -x 1 on standby data directory mount
- searchStandby: hot_standby_feedback setting in postgresql.conf
- searchPrimary: max_wal_size, wal_keep_segments, wal_level, archive_mode
Practical causes, not theory. These are the things you will actually find.
- warningInsufficient wal_keep_segments causing the primary to recycle WAL before standby reads it
- warningLong-running queries on standby blocking WAL replay (conflict with vacuum or DDL)
- warningNetwork bandwidth or latency between primary and standby
- warningStandby disk I/O bottleneck: replay cannot flush fast enough
- warningPrimary generating WAL faster than standby can replay (e.g., bulk load, large DDL)
- warninghot_standby_feedback = off leading to query conflicts and replay stalls
Concrete fix directions. Pick the one that matches your root cause.
- buildIncrease wal_keep_segments (or set wal_keep_size) to retain more WAL on primary
- buildEnable hot_standby_feedback on standby to prevent query conflicts
- buildSet max_standby_streaming_delay to a higher value or -1 (but accept stale reads)
- buildAdd more standby servers or use synchronous replication carefully
- buildUpgrade network link or enable WAL compression (wal_compression = on)
- buildTune standby: increase shared_buffers, checkpoint settings, or use faster storage
A fix you cannot prove is a guess. Close the loop.
- verifiedAfter fix, monitor pg_stat_replication lag columns for 10+ minutes during peak load
- verifiedQuery pg_stat_replication and confirm write_lag and replay_lag are within acceptable bounds (< 1 second ideal)
- verifiedRun a bulk insert on primary and watch replay_lag catch up within seconds
- verifiedCheck standby's pg_last_xact_replay_timestamp() is near current time
- verifiedVerify no 'WAL segment removed' errors in primary logs
- verifiedRepeat iperf3 test to ensure network is not the bottleneck
Things that make this bug worse or harder to find.
- warningSetting wal_keep_segments too low because you think 'streaming replication doesn't need it'—it does for standby catch-up after disconnect
- warningTurning off hot_standby_feedback without understanding query conflict risk
- warningUsing synchronous replication without ensuring standby can keep up (writes will block)
- warningIgnoring standby disk I/O—replay is sequential but still IOPS-bound on HDDs
- warningNot monitoring lag during maintenance windows or big batch jobs
- warningAssuming network is fine without testing iperf with TCP window size matching PostgreSQL
The Black Friday Replication Meltdown
Timeline
- 09:00Traffic spike begins; monitoring shows write_lag on standby rising to 30 seconds.
- 09:05Alert: write_lag > 60 seconds. Check pg_stat_replication: write_lag=62s, flush_lag=61s, replay_lag=60s.
- 09:10Check standby iostat: write iops at 80% of gp3 baseline. No immediate bottleneck.
- 09:15Primary WAL rate: 50 MB/s. Network iperf shows 1 Gbps with 2ms latency – seems fine.
- 09:20Check standby pg_stat_activity: two long-running analytical queries blocking WAL replay (wait_event 'WAL').
- 09:25Kill those queries. Replay lag starts dropping immediately.
- 09:30Replay lag down to 5 seconds. Enable hot_standby_feedback.
- 09:40Lag stabilized at < 1 second. Root cause identified: long-running queries on standby with hot_standby_feedback=off.
We were running with hot_standby_feedback=off because the old DBA said it could cause bloat on the primary. I knew the risks but figured our analytical queries were short. Black Friday traffic proved me wrong. At 09:00, our monitoring dashboard showed replication lag climbing from 2 seconds to 30 in minutes. The on-call engineer called me frantically.
I SSHed into the primary and ran the standard checks: pg_stat_replication showed all three lag metrics climbing together, meaning the standby was both receiving and replaying slowly. I checked the primary's WAL generation rate—50 MB/s—and network throughput—1 Gbps—both within limits. Then I checked standby I/O with iostat: writes at 80% of burst, but not capped. That left replay conflicts.
I queried pg_stat_activity on the standby and found two analytical queries running for over 15 minutes, with wait_event 'WAL'. PostgreSQL's WAL replay is blocked by any query that holds a snapshot conflicting with a WAL record (e.g., vacuum removal of rows). I killed those queries with pg_terminate_backend. Replay lag dropped to 5 seconds within 5 minutes. I immediately enabled hot_standby_feedback and set max_standby_streaming_delay to -1 (no limit) as a temporary measure. We later tuned the analytical queries to run during off-peak.
Root cause
Long-running queries on the standby with hot_standby_feedback=off blocked WAL replay, causing cascading lag.
The fix
Enable hot_standby_feedback on the standby and set max_standby_streaming_delay to a higher value.
The lesson
Never run a standby with hot_standby_feedback=off in a read-scaling setup. Also, monitor standby query conflicts proactively with pg_stat_database_conflicts.
pg_stat_replication returns three lag columns: write_lag (time between WAL write on primary and receipt by standby), flush_lag (time until flushed to disk on standby), and replay_lag (time until applied).
If write_lag is high but flush_lag and replay_lag are low, the network is the bottleneck. If all three are high together, the standby's I/O or CPU is saturated. If replay_lag exceeds flush_lag significantly, WAL replay is stuck—likely due to query conflicts or a slow replay process.
Use these to pinpoint the layer: `write_lag - flush_lag` gives network delay; `flush_lag - replay_lag` gives replay delay.
Many teams set wal_keep_segments too low or rely solely on replication slots. If a standby disconnects, the primary may recycle WAL needed for catch-up, forcing a full base backup.
Replication slots prevent WAL removal, but they can cause the primary to run out of disk if the standby stays disconnected. Monitor pg_replication_slots for active/inactive status.
Set wal_keep_size to a value that covers expected downtime (e.g., 1 hour of WAL). Use `pg_wal_lsn_diff` to calculate WAL generation rate per hour.
When hot_standby_feedback is off, a long-running query on the standby can prevent WAL replay of vacuum or DDL operations. The standby will wait for the query to finish, causing replay lag to spike.
Check pg_stat_database_conflicts on the standby for 'confl_snapshot' and 'confl_bufferpin' counters. A rising confl_snapshot indicates queries blocking replay.
Enable hot_standby_feedback to avoid most conflicts, but be aware it may cause bloat on the primary because it prevents vacuum of rows visible to standby queries.
Use iperf3 with parallel streams to test TCP throughput. PostgreSQL's WAL streaming uses a single TCP connection, so window size matters. Check net.core.rmem_max and net.core.wmem_max.
On the standby, ensure the data directory is on fast storage (SSD). Replay is sequential but still requires fsync. Use iostat to check await and %util. If await > 10ms, storage is slow.
Consider wal_compression=on to reduce network bandwidth (CPU trade-off). Also increase wal_max_senders if multiple standbys.
Synchronous replication guarantees no data loss but amplifies lag problems: if the standby cannot keep up, primary writes block. Monitor sync_state in pg_stat_replication.
Set synchronous_standby_names to a list of standbys; if the synchronous standby fails, fall back to asynchronous to avoid blocking.
For multi-datacenter setups, use multiple synchronous standbys with 'ANY' quorum to balance latency and durability.
Frequently asked questions
What is the difference between write_lag, flush_lag, and replay_lag?
write_lag is the time between the primary writing a WAL record and the standby receiving it. flush_lag is the time until the standby flushes that WAL to disk. replay_lag is the time until the standby applies the WAL changes to its data files. High write_lag points to network issues; high flush_lag or replay_lag points to standby I/O or CPU bottlenecks.
Why does a long-running query on the standby cause replication lag?
When hot_standby_feedback is off, a long-running query holds a snapshot that conflicts with WAL replay (e.g., vacuum removing rows). PostgreSQL blocks replay until the query finishes, causing replay_lag to spike. Enable hot_standby_feedback to send the query's xmin to the primary, preventing vacuum of needed rows.
How do I calculate the required wal_keep_segments value?
Monitor your WAL generation rate: SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / (extract(epoch from now() - pg_postmaster_start_time()) * 1024*1024) AS mb_per_sec. Multiply by expected maximum downtime in seconds and divide by 16 (default segment size 16MB). Or set wal_keep_size in MB directly in PostgreSQL 13+.
Should I use synchronous replication to avoid lag?
No. Synchronous replication ensures no data loss but does not reduce lag; it makes lag affect primary writes. If the standby is slow, primary writes block. Use synchronous only if you need zero data loss and can ensure the standby can keep up. For read scaling, asynchronous replication is usually sufficient.
How can I monitor replication lag proactively?
Use pg_stat_replication on the primary and pg_stat_wal_receiver on the standby. Set up alerts on write_lag (e.g., > 5 seconds). Also monitor pg_stat_database_conflicts on the standby for conflicts. Tools like Prometheus with postgres_exporter or pgDash can track trends.