What this usually means
Slow queries in Postgres usually result from inadequate indexing, poor query plans due to outdated statistics, I/O starvation, or bloated tables. Sometimes, suboptimal SQL statements or misplaced functions (e.g., using LIKE '%foo%' on large text columns) are to blame. Less often, resource starvation (CPU, RAM, or disk) or lock contention causes cascading slowdowns. The trick is isolating the real cause, not just the query shape.
The first ten minutes — establish facts before touching code.
- 1Run SELECT * FROM pg_stat_activity WHERE state = 'active' AND now() - query_start > interval '2 seconds';
- 2Capture the exact slow statement from the log: grep 'duration:' /var/log/postgresql/postgresql-*.log | tail -20
- 3Use EXPLAIN (ANALYZE, BUFFERS) <query> to see the actual plan and hotspots
- 4Check for missing or misused indexes: \\di+ or \\d <table>
- 5Inspect pg_stat_user_tables for sequential scans: SELECT relname, seq_scan, idx_scan FROM pg_stat_user_tables WHERE seq_scan > 0;
- 6Verify autovacuum health: SELECT relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables ORDER BY last_autovacuum ASC;
The specific files, logs, configs, and dashboards that usually own this bug.
- search/var/log/postgresql/postgresql-*.log (or your log_directory)
- searchpg_stat_activity for current queries
- searchpg_stat_statements (if enabled) for historical slow queries
- searchEXPLAIN (ANALYZE) output for the target query
- searchTable statistics: pg_class, pg_stat_user_tables
- searchSchema: \\d+ <table>, to check column types and existing indexes
- searchDisk and I/O metrics (e.g., iostat, vmstat, or cloud disk dashboards)
Practical causes, not theory. These are the things you will actually find.
- warningMissing composite or partial indexes on WHERE/JOIN columns
- warningFunctions, casts, or wildcards in WHERE clauses preventing index use
- warningOutdated table statistics (ANALYZE not run recently)
- warningTable or index bloat causing extra I/O (often after mass deletes/updates)
- warningHotspotting/lock contention on a single row or table
- warningUnder-provisioned storage (cloud EBS in 'cold' state, insufficient IOPS)
- warningInefficient query patterns (unbounded CTEs, unnecessary subqueries)
Concrete fix directions. Pick the one that matches your root cause.
- buildCreate the minimal covering or partial index (e.g., CREATE INDEX idx ON users(email) WHERE active = true;)
- buildRewrite queries to avoid non-SARGable predicates—move functions/casts out of WHERE
- buildRun ANALYZE or VACUUM ANALYZE to refresh statistics
- buildRebuild bloated tables: VACUUM FULL <table>; or pg_repack for less downtime
- buildTune work_mem and effective_cache_size appropriately (see current settings: SHOW work_mem;)
- buildDecompose large, monolithic queries into smaller steps where possible
A fix you cannot prove is a guess. Close the loop.
- verifiedRerun EXPLAIN (ANALYZE) and compare the new cost and actual time
- verifiedCheck pg_stat_statements for decreased mean and max execution time
- verifiedMonitor application-layer metrics for improved latency
- verifiedConfirm index usage in the plan (look for Index Scan, not Seq Scan)
- verifiedWatch disk and CPU utilization for normalization
- verifiedValidate with representative production traffic loads—not just single test queries
Things that make this bug worse or harder to find.
- warningBlindly adding indexes without checking their actual impact or maintenance cost
- warningIgnoring autovacuum settings—leaving bloated tables unfixed
- warningTesting only with small datasets or unrepresentative parameter values
- warningNot enabling pg_stat_statements for query insight
- warningRelying on EXPLAIN output without (ANALYZE)—estimated plans often mislead
- warningChanging work_mem or shared_buffers globally without understanding workload characteristics
Customer Search Endpoint Slows Down After Product Launch
Timeline
- 10:02PagerDuty alert: /search API latency exceeds 4s, major customer impact.
- 10:05grep 'duration:' in RDS logs shows repeated 6s+ queries on users table.
- 10:07pg_stat_activity confirms 7 queries running over 5s, all hitting WHERE LOWER(email) = $1
- 10:10EXPLAIN (ANALYZE) shows full table scan, 900k rows, no index use.
- 10:12Realize the query cannot use the (email) index due to LOWER().
- 10:16Create functional index: CREATE INDEX users_email_lower_idx ON users (LOWER(email));
- 10:19API latency drops back to baseline (<400ms).
I was on-call when the /search endpoint crawled to a halt minutes after a marketing campaign hit. At first, I suspected downstream service issues, but tracing pointed to a specific query against our users table.
Digging into RDS logs, I saw every search was spending 6-7 seconds on a SELECT WHERE LOWER(email) = $1. I initially thought the email index was broken, but EXPLAIN quickly showed a sequential scan. I realized the LOWER function was preventing index use.
Adding a functional index on LOWER(email) instantly fixed the endpoint for all queries. I’d seen missing indexes before, but misapplied function usage was a subtler miss. Updating our query patterns and review process caught similar problems elsewhere.
Root cause
A function (LOWER) in the WHERE clause blocked use of the existing index, causing full table scans under high load.
The fix
Added a functional index on LOWER(email), allowing the planner to choose an index scan.
The lesson
Always check for function calls in query predicates and use functional indexes when necessary. Review explain plans, not just schema.
Don’t just skim for 'Seq Scan' vs 'Index Scan'. Look at actual time per node, rows returned, and buffers read. If a node returns far more rows than expected, statistics may be stale.
Watch for 'Rows Removed by Filter'—a sign your index isn’t covering all filter clauses, or the planner underestimated selectivity. Analyze misestimates by manually running ANALYZE or updating statistics.
Beware of wildcards at the start of LIKE ('%foo'): these always force a seq scan. Where possible, switch to trigram indexes (pg_trgm), or redesign the search.
Avoid casting indexed columns in WHERE clauses. Instead, cast parameter values at the application layer, or use expression indexes.
Cloud providers may throttle disk IOPS; check with iostat or provider dashboards if buffer read times spike. Slow disks make every sequential scan much worse.
Check for lock contention: SELECT * FROM pg_locks WHERE granted = false;—especially during large migrations or bulk data jobs.
Frequently asked questions
Can EXPLAIN lie about query performance?
EXPLAIN without ANALYZE gives estimated costs, not real timing. Always use EXPLAIN ANALYZE to measure actual execution time and row counts.
How do I know if I need to VACUUM?
If pg_stat_user_tables shows n_dead_tup much higher than live rows, or autovacuum hasn’t run in days, run VACUUM ANALYZE. Frequent slowdowns after large deletes/updates are another sign.
Does increasing work_mem always help slow queries?
No. For some sorts or aggregations, it helps, but excessive work_mem in parallel queries can exhaust memory and hurt overall performance. Tune cautiously.
Is adding an index always the right solution?
No. Too many indexes slow down writes and bloats storage. Only add indexes based on actual query patterns and their observed frequency.