I remember the exact moment I learned that indexes aren't magic. It was 3 AM, and our main PostgreSQL database was gasping under a query that should have been fast. The table had an index. The query was simple. But the response time was 12 seconds instead of 10 milliseconds.
I'd spent two hours tweaking configuration parameters, restarting the database, and blaming the hardware. Eventually, I ran EXPLAIN ANALYZE and saw something I didn't expect: the query plan showed an Index Scan, but the actual rows returned were 5 million out of 10 million. The index was being used, but it was almost as slow as a full table scan.
That night, I dug into how B-tree indexes actually work. And I realized that debugging a slow query isn't about guessing — it's about understanding the mechanics of the index itself.
The Anatomy of a B-tree Index
A B-tree index is a balanced tree data structure that stores key-value pairs sorted by key. In PostgreSQL, each index entry points to a row in the table's heap via a tuple ID (ctid). The tree has three levels: root, internal branches, and leaf pages. The leaf pages are doubly linked, allowing efficient range scans.
But here's the key: the index only contains the indexed columns (and the ctid). If your query selects columns not in the index, the database must fetch those columns from the heap. That's a random I/O per row. For a query returning many rows, those random I/Os add up fast.
In PostgreSQL, an Index Scan requires a heap lookup for every matching row. An Index Only Scan can skip the heap if all needed columns are in the index and the visibility map says the page is all-visible.
The War Story: A Missing Covering Index
Our table was an orders table with columns: id, customer_id, status, created_at, total_amount. We had an index on (customer_id). The query was:
"SELECT id, status, total_amount FROM orders WHERE customer_id = 12345 AND created_at > '2024-01-01';"
The index on customer_id was used, but the query returned 50,000 rows. For each row, PostgreSQL had to fetch id, status, total_amount from the heap. Those 50,000 random reads took 8 seconds.
Adding a composite index on (customer_id, created_at) would help with the WHERE clause, but we still needed id, status, total_amount. The real solution was a covering index: (customer_id, created_at) INCLUDE (id, status, total_amount). This allowed an Index Only Scan — zero heap fetches. The query dropped to 20 ms.
-- Original index
CREATE INDEX idx_orders_customer ON orders (customer_id);
-- Better index: covers the WHERE clause
CREATE INDEX idx_orders_customer_created ON orders (customer_id, created_at);
-- Best index: covering index to avoid heap lookups
CREATE INDEX idx_orders_customer_created_covering ON orders (customer_id, created_at) INCLUDE (id, status, total_amount);An index that is used but still slow is often a sign that your index doesn't cover the query.
Debugging Index Performance with EXPLAIN
The first tool in any index debugging session is EXPLAIN (ANALYZE, BUFFERS). The BUFFERS option tells you how many shared buffers were hit or read. If you see a high number of heap fetches, your index isn't covering.
Look for these patterns in the plan:
- "Index Scan" with many rows and high "rows removed by filter" — your index is not selective enough. - "Bitmap Heap Scan" — the planner chose to gather many index pointers and then read heap pages sequentially. This can be efficient for large result sets, but if you see it with a small result set, something is off. - "Index Only Scan" with "Heap Fetches: 0" is ideal. If you see Heap Fetches > 0, check the visibility map or consider a REINDEX.
The Slow Dashboard Query
- 00:00Alert: Dashboard query for daily revenue takes 45 seconds.
- 00:15Run EXPLAIN (ANALYZE, BUFFERS) on the query.
- 00:20Plan shows Index Scan on idx_revenue_date with 1.2M rows, 1.1M heap fetches.
- 00:25Check index definition: only covers date column. Query selects total_amount, currency.
- 00:30Add covering index: (date) INCLUDE (total_amount, currency).
- 00:35Query now uses Index Only Scan, 60 ms.
Lesson
Always verify that the index covers all columns in SELECT and WHERE. A covering index can turn a slow disk-bound query into a fast memory-bound one.
When Indexes Make Things Worse
Not all indexes are beneficial. A classic example is a high-write table with many indexes. Each INSERT, UPDATE, or DELETE must update every index. If you have 10 indexes on a table, write throughput drops significantly.
I once saw a team add an index on every column 'just in case.' Writes slowed to a crawl. Removing the unused indexes fixed it. Use pg_stat_user_indexes to find indexes that are never scanned.
Another trap: indexes on low-cardinality columns (e.g., boolean, status with few values). The B-tree still works, but the planner may ignore it because a sequential scan is cheaper — scanning 50% of the table via an index is slower than scanning the whole table sequentially.
Speed improvement from adding a covering index in the war story
Systematic Steps to Debug Index Issues
- 1Capture the slow query and run EXPLAIN (ANALYZE, BUFFERS) on a representative dataset.
- 2Check the plan type: Seq Scan, Index Scan, Index Only Scan, Bitmap Scan.
- 3If Index Scan: note the number of rows and heap fetches. If heap fetches >> 0, the index is not covering.
- 4Check if the WHERE clause columns are the leading columns of the index. PostgreSQL can use index columns only from left to right.
- 5Verify index selectivity: query pg_stats for the column to see distinct values. Low distinct count may lead to planner choosing sequential scan.
- 6Consider adding INCLUDE columns for all SELECT columns to enable Index Only Scan.
- 7Monitor index usage over time with pg_stat_user_indexes to remove unused indexes.
-- Example: check index usage for a table
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename = 'orders'
ORDER BY idx_scan DESC;Index Debugging Beyond Basics
Sometimes the problem isn't the index definition but the query itself. Functions on indexed columns (e.g., WHERE DATE(created_at) = '2024-01-01') prevent index usage because the database sees a computed value, not the column. Use a range condition instead.
Another advanced topic: partial indexes. If you only query active orders, create an index WHERE status = 'active'. This keeps the index small and fast.
Finally, remember that index maintenance matters. Over time, B-tree pages become fragmented. REINDEX can reclaim space and improve scan performance. In PostgreSQL, you can use pg_repack for zero-downtime reindexing.
Always test index changes on a staging environment with production-like data volume. An index that works on 10,000 rows may behave very differently on 10 million rows.
The next time you're debugging a slow query, don't assume the index is the problem. Look at the plan. Check the heap fetches. Question whether the index actually covers the query. That 3 AM outage taught me that understanding the internals of a B-tree index is the most practical skill for a database engineer.
Start with EXPLAIN, think in terms of pages and buffers, and never forget: an index that's used but slow is a symptom of an incomplete index design.
Frequently asked questions
What is the difference between a B-tree index and a hash index?
B-tree indexes are ordered and support range queries (>, <, BETWEEN) and sorting, while hash indexes only support equality comparisons (=, IN). B-trees are the default in most databases and are more versatile for typical workloads.
Why does my query still use a sequential scan even though I have an index?
The query planner may estimate that reading a large portion of the table via the index is slower than a full sequential scan. This can happen if the index is not selective enough (e.g., low cardinality column) or if the query returns a large fraction of rows. Use EXPLAIN ANALYZE to see the planner's cost estimates.
What is an index-only scan and when does it occur?
An index-only scan occurs when all columns required by the query are present in the index itself, so the database can avoid fetching rows from the heap table. This requires a covering index (e.g., INCLUDE columns in PostgreSQL) and visibility map optimization.
How can I debug a slow query related to indexes?
Start by running EXPLAIN (ANALYZE, BUFFERS) to get the query plan and actual execution time. Look for sequential scans, large row estimates, and high buffer hit counts. Then check index usage, index condition, and whether a covering index could reduce heap fetches.