I once spent three days chasing a query that looked fine in development but fell apart in production. The query joined five tables, filtered on a few indexed columns, and returned 200 rows. In staging, it ran in 80ms. In production, it peaked at 12 seconds during peak traffic.
The culprit? A single column with a histogram that was uniformly distributed in test data but heavily skewed in real data. The query planner chose a nested loop join based on bad row estimates, and the result was disaster.
This is why profiling SQL queries is not a one-time task. You need to do it under realistic conditions, with real data, and with the right tools. Here's how I approach it.
Start with the Slow Query Log
Before you optimize anything, know what to optimize. The slow query log is your first stop. In PostgreSQL, set log_min_duration_statement to a value that catches outliers without flooding your logs. I start at 1000ms and lower it later.
For MySQL, enable slow_query_log and set long_query_time = 1. Then parse the log with mysqldumpslow or pt-query-digest from Percona Toolkit.
# PostgreSQL: enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 1000;
SELECT pg_reload_conf();
# MySQL: enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';The slowest 5% of queries usually cause 95% of the pain. Find them first.
Aggregate Statistics with Extensions
Slow logs give you individual queries, but you also need aggregate stats. In PostgreSQL, pg_stat_statements is essential. It tracks query execution counts, total time, mean time, and more. Install it, then query the view ordering by total_time desc.
In MySQL, the performance_schema provides similar data. Query events_statements_summary_by_digest to find queries consuming the most cumulative time.
-- PostgreSQL: top 10 queries by total execution time
SELECT query, calls, total_time, mean_time, rows
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;EXPLAIN ANALYZE: The Real Deal
Once you have a candidate query, run EXPLAIN ANALYZE. This actually executes the query and shows the execution plan with real timings. Pay attention to:
- Actual time per node (startup and total) - Actual rows vs estimated rows (large discrepancies indicate bad statistics) - Loops: a nested loop with many loops can kill performance
EXPLAIN ANALYZE
SELECT o.id, c.name, p.product_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at > '2025-01-01';EXPLAIN ANALYZE runs the query. For writes or slow queries, wrap in a transaction and rollback: BEGIN; EXPLAIN ANALYZE ...; ROLLBACK;
The 12-Second Join
- 00:00Alert: API endpoint p99 latency spikes to 5s
- 00:05Checked pg_stat_statements: a join query accounts for 80% of DB time
- 00:10EXPLAIN ANALYZE shows nested loop join with 50k loops on orders table
- 00:12Noticed estimated rows=10 but actual rows=5000 on customers filter
- 00:15Ran ANALYZE on customers table to refresh statistics
- 00:20Query now uses hash join, completes in 200ms
Lesson
Stale statistics can cause catastrophic plan choices. Regular ANALYZE (or autovacuum in PostgreSQL) is not optional.
Look for Sequential Scans and Missing Indexes
A sequential scan on a large table is often a red flag. But not always — if you're reading 30% of a table, a sequential scan can be faster than an index scan. The key is row selectivity.
When you see a sequential scan on a table with millions of rows and the filter returns few rows, you need an index. Use pg_stat_user_indexes to check if indexes are used.
-- Check index usage statistics in PostgreSQL
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;Profiling Under Load
A query that runs fast in isolation might blow up under concurrency. Lock contention, I/O queue depth, and memory pressure only show up with realistic load. Use tools like pgbench or sysbench to simulate traffic while you profile.
I once saw a query that took 50ms alone but degraded to 2s under 100 concurrent connections because of a lock on a heavily inserted table. The slow query log never caught it because each individual execution was fast. Only load testing revealed the issue.
Use pg_stat_activity to monitor currently running queries. Look for queries in 'waiting' state or with long durations. This helps you catch blocking and contention.
Parameterized Queries and Bind Variables
Many ORMs and connection pools use bind parameters. EXPLAIN ANALYZE with literal values may give a different plan than with parameters because the planner peeks at bind values in some databases (e.g., PostgreSQL's generic plan vs custom plan).
To profile parameterized queries, either enable auto_explain in PostgreSQL (which logs plan for slow queries) or use pg_stat_statements with the queryid to correlate.
# PostgreSQL auto_explain: log plans for queries over 1 second
LOAD 'auto_explain';
SET auto_explain.log_min_duration = 1000;
SET auto_explain.log_analyze = on;Fix the Root Cause, Not the Symptom
Once you find the slow query, the fix isn't always an index. Maybe you need to rewrite the query, add a covering index, denormalize, or change the join order with a hint (though I avoid hints unless necessary).
Apply the fix, then profile again. Compare before and after with the same dataset and load. Use EXPLAIN ANALYZE to verify the plan changed. Then monitor the slow query log and pg_stat_statements to confirm the improvement in production.
- 1Enable slow query logging and aggregate stats.
- 2Identify top offenders by total time or frequency.
- 3Run EXPLAIN ANALYZE with real parameters.
- 4Look for mismatches in row estimates, sequential scans, and high loop counts.
- 5Apply the fix (index, rewrite, statistics update).
- 6Profile again under similar conditions.
- 7Deploy and monitor for regression.
of database slowdowns are caused by fewer than 5% of queries
Frequently asked questions
What is the difference between EXPLAIN and EXPLAIN ANALYZE?
EXPLAIN shows the query plan with estimated costs, but EXPLAIN ANALYZE actually executes the query and shows actual times, rows, and loops. Use ANALYZE unless your query is destructive (then wrap in a transaction and rollback).
How do I find slow queries in PostgreSQL?
Enable slow query logging by setting log_min_duration_statement = 1000 (milliseconds). Then check the postgres log. For aggregate stats, install the pg_stat_statements extension and query pg_stat_statements.
What should I look for in an execution plan?
Focus on nodes with high actual time, large row count mismatches between estimate and actual, sequential scans on large tables, and nested loop joins with many loops. These indicate missing indexes or bad join order.
Can profiling an index improve query performance?
Yes — but only if the index is selective and the query uses it. Use EXPLAIN to verify index usage. Over-indexing can hurt write performance, so profile before adding indexes.