LEARN · DEBUGGING GUIDE

Why MongoDB Queries Miss the Index — and How to Catch It

If your MongoDB query is hitting COLLSCAN despite an index existing, it's usually a shape mismatch, a type coercion, or a sort/select conflict. Here's how to prove it and fix it.

IntermediateDatabase8 min read

What this usually means

The index exists on disk but the query optimizer decided it cannot use it, or the query shape doesn't match the index key pattern. Common causes: the query condition uses a different type than the index (e.g., string vs int), the query includes a sort or projection that forces a collection scan, or the query uses operators like $nin, $ne, $not or regex with leading wildcards that the optimizer cannot satisfy efficiently. Also, compound indexes require an exact prefix match; a query that filters on a second field without the first may be ignored unless a separate index exists. The query planner does not 'forget' indexes—it actively selects COLLSCAN when it estimates it's cheaper, which usually means the selectivity is low or the index doesn't cover the result set.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

  • 1Run db.collection.find(<query>).explain("executionStats") and check winningPlan.stage
  • 2Confirm the index exists: db.collection.getIndexes() — verify key pattern and options
  • 3For compound indexes, check if the query includes a prefix of the index keys in order
  • 4Test with hint(): db.collection.find(<query>).hint({field:1}).explain("executionStats") — if hint forces IXSCAN but it's slower, the optimizer chose correctly
  • 5Check for type mismatches: db.collection.find({field: {$type: "string"}}) vs actual index type
  • 6Enable the profiler: db.setProfilingLevel(1, {slowms: 50}) then tail the system.profile collection
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • search/var/log/mongodb/mongod.log — look for slow queries or index creation warnings
  • searchdb.currentOp() — see running operations with planSummary
  • searchsystem.profile collection — detailed query shapes and timings
  • searchdb.collection.getIndexes() — list all indexes
  • searchexplain('executionStats') output — especially rejectedPlans array
  • searchmongostat or mongotop — disk I/O and page fault spikes
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningQuery uses operators that suppress index usage: $nin, $ne, $not, $exists (on sparse indexes), regex with leading '^'
  • warningType mismatch: index stores numbers but query passes strings (or vice versa) — MongoDB does implicit coercion but may skip index
  • warningCompound index but query doesn't include the leftmost prefix field, or uses it with $in/$or that break prefix
  • warningSort and projection: index doesn't cover the sort order (direction mismatch) or projection includes fields not in index (non-covered query forces document fetch)
  • warningLow cardinality / selectivity — optimizer estimates COLLSCAN is cheaper due to small index range
  • warningIndex was built in background and is not fully replicated or read barrier not met
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildAdd a covered query index: include all fields in query + projection + sort, in correct order
  • buildAlign types: ensure query values match the BSON type of the indexed field (e.g., use NumberLong() for int64 fields)
  • buildRestructure compound index to match the most common query patterns (prefix match)
  • buildReplace $nin/$ne with $gt/$lt or $in with positive values if possible
  • buildUse $hint sparingly but validate optimizer choice before forcing
  • buildDrop and recreate index if it was built in background on a high-write collection (or use foreground rebuild during maintenance)
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedRun explain("executionStats") before and after: confirm stage changes from COLLSCAN to IXSCAN and totalKeysExamined drops
  • verifiedMeasure query latency in application: 5 runs before, 5 after index fix — average should drop significantly
  • verifiedConfirm index usage with db.collection.find(...).explain().queryPlanner.winningPlan.inputStage.stage
  • verifiedCheck totalDocsExamined: if it equals nreturned, index is used; if it's far larger, still scanning
  • verifiedUse profiler to confirm no more slow queries on the same namespace
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningAdding an index on every field individually — use compound indexes carefully
  • warningForgetting that sort direction matters: index {a:1, b:-1} does not support sort {a:1, b:1}
  • warningAssuming a text index works for exact field match — text indexes are for $text only
  • warningCreating an index with sparse:true then querying for {field: null} — sparse indexes skip nulls
  • warningRelying on hint() as a permanent fix without understanding why optimizer chose different plan
  • warningNot checking for type coercion: query {age: "30"} on an integer field — index may be ignored
( 07 )War story

Inventory Query Slows to Crawl After Adding Text Index

Backend EngineerMongoDB 4.4, Node.js Mongoose, AWS EC2

Timeline

  1. 09:15PagerDuty alert: inventory API latency > 5s
  2. 09:18Checked mongos log: slow query on inventory collection — stage COLLSCAN, nscanned 2.4M
  3. 09:22getIndexes() shows compound index {warehouse:1, sku:1} and a new text index on description
  4. 09:25explain() on the query {warehouse: 'WH1', sku: 'ABC123'} — rejectedPlans shows IXSCAN on compound index, but winning plan is COLLSCAN
  5. 09:30Hinted the compound index — query runs in 3ms. So optimizer chose wrong?
  6. 09:35Re-examined optimizer: the compound index is built on warehouse (string) and sku (string). But the query had sku: 'ABC123' as a string — correct. However, warehouse field in documents stored as 'WH1' but some had leading spaces
  7. 09:40Found that the text index on description was being considered by optimizer but actually not used — the real issue: the compound index was inefficient because warehouse had low cardinality (only 10 values) and sku had high. Optimizer estimated COLLSCAN cheaper for 'WH1' because it would scan many docs anyway
  8. 09:50Added a new index {sku:1, warehouse:1} to match the query order of selectivity
  9. 10:00Query latency dropped to 2ms. Resolved.

The alert came in around 9:15 AM — inventory endpoint p95 latency jumped from 50ms to 5 seconds. I opened mongos log and found a query on the inventory collection doing a full collection scan of 2.4 million documents. The query was a simple find by warehouse and SKU, and we had an index on both fields. I assumed it was a stale index or corruption.

I ran getIndexes() and saw the compound index {warehouse:1, sku:1} plus a recently added text index on description. The explain() output showed the optimizer considered the compound index but then chose COLLSCAN. I tried hint() and the query ran fast — so the optimizer was making a bad decision? That felt wrong. I looked closer at the query statistics: warehouse 'WH1' matched about 40% of the collection. The optimizer's cost model estimated that scanning 40% via index would be more expensive than a full scan because of random I/O. That was the first insight: the index prefix was too low-cardinality.

I reversed the index order to {sku:1, warehouse:1}, because SKU is highly selective (almost unique). After rebuilding the index, the query plan used IXSCAN naturally, and latency dropped to 2ms. I learned that index prefix order matters not just for query shape but also for selectivity — the optimizer will reject an index if it estimates the range too large. The text index was a red herring.

Root cause

The compound index {warehouse:1, sku:1} had low cardinality on the prefix (warehouse), causing the query optimizer to estimate a full collection scan was cheaper than index scan for a high-selectivity query.

The fix

Created a new compound index {sku:1, warehouse:1} with the most selective field first, and dropped the old index.

The lesson

Always match index prefix order to the query's most selective field, not the order of fields in the query. Use explain() to see why the optimizer rejects an index — often it's cardinality, not a bug.

( 08 )Understanding the Query Planner's Cost Model

MongoDB's query planner uses a cost-based approach: it evaluates multiple candidate plans by executing them for a short trial period and picks the one with the lowest total cost (I/O + CPU). If an index exists but the optimizer estimates that scanning it would require reading a large fraction of the collection (due to low selectivity), it may choose COLLSCAN. This is not a bug — it's by design.

To inspect the planner's decision, look at explain() output: winningPlan and rejectedPlans. The rejectedPlans array shows other plans considered and why they were rejected (though the exact reason isn't explicitly stated). Pay attention to inputStage.stage and the number of keys examined vs documents examined. A common pattern: winningPlan uses IXSCAN but still examines many documents (non-covered query), indicating the index is used but not efficiently.

( 09 )Type Coercion: The Silent Index Killer

MongoDB is schemaless, but indexes are typed. If you have an index on a field that stores integers, and your query passes a string (e.g., {age: "30"}), the query may still work due to implicit coercion, but the index may be ignored. This happens because the BSON type of the query value doesn't match the indexed value's type. The optimizer sees a type mismatch and may fall back to COLLSCAN.

To detect this, run a type check: db.collection.find({field: {$type: "string"}}).count() vs {$type: "number"}. If you find mixed types, you need to either homogenize the data or explicitly cast in queries. Use explain() to see if the index is used — if not, type mismatch is a likely cause.

( 10 )Compound Index Prefix and Sort Direction Gotchas

A compound index can support queries on any prefix of its keys. For index {a:1, b:1, c:1}, queries on {a}, {a,b}, and {a,b,c} can use the index. But queries on {b} alone or {b,c} cannot. This is the 'leftmost prefix rule'. Many engineers forget this and create indexes out of order.

Sort direction also matters. An index {a:1, b:-1} can support sort {a:1, b:-1} but not {a:1, b:1}. If your query includes a sort that mismatches the index direction, MongoDB may do an in-memory sort after the index scan (blocking sort), or it may skip the index entirely. Check explain() for 'sort' stage and 'sort' memory usage.

( 11 )The Impact of Covered Queries on Index Selection

A covered query is one where all fields required (query filter, projection, sort) are contained within the index itself. In that case, MongoDB never needs to fetch the document — it reads only the index. This is the most efficient scenario. To achieve a covered query, your index must include all fields in the query, projection, and sort, in the correct order.

If your query is not covered, the index scan will be followed by a FETCH stage to retrieve the full documents. That's fine, but if the index is not selective enough, the overhead of random document fetches may make COLLSCAN cheaper. Use explain() to see if 'totalDocsExamined' is high — that's a sign your query is not covered and the index is not selective.

( 12 )When the Optimizer Chooses Wrong: Forcing with hint()

Sometimes the optimizer's cost estimation is wrong due to stale statistics or uneven data distribution. You can force a specific index with hint(). However, this is a double-edged sword: if data distribution changes later, the hinted index may become slower. Use hint() only as a temporary measure or after thorough benchmarking.

Before hinting, verify that the forced index plan is indeed faster by comparing execution times. Also, consider updating index statistics with db.collection.reIndex() or waiting for the background stats refresh (MongoDB 4.2+ has automatic stats collection). Long-term, redesign the index to match the query pattern so the optimizer naturally chooses it.

Frequently asked questions

Why does my query use an index sometimes and not other times?

Usually because of type coercion: if the field stores mixed types (e.g., some numbers, some strings), the optimizer may decide not to use the index when the query value type doesn't match the majority type. Also, if the query uses operators like $in with a large array, the planner may estimate a full scan is cheaper. Check explain() for both cases.

Can a text index interfere with regular field indexes?

A text index is only used for $text queries. It does not interfere directly, but the query planner may consider it as a candidate plan and then reject it. That adds overhead but doesn't cause a COLLSCAN. If you see the text index in rejectedPlans, it's harmless.

How do I force a query to use an index?

Use the hint() method: db.collection.find(...).hint({field:1}). However, only do this after verifying with explain() that the hinted plan is faster. If the optimizer still chooses COLLSCAN even with hint, check that the index exists and the query shape matches the index keys.

What does 'totalKeysExamined' much larger than 'nreturned' mean?

It means the index scan examined many keys but only a few matched the query. This indicates low selectivity — the index is not filtering well. Consider a more selective index or adding additional fields to the compound index to narrow down the scan range.

Should I drop and recreate an index if it's not being used?

Not immediately. First, understand why it's not used. Use explain() to see if the optimizer considered it. If it's in rejectedPlans, the optimizer decided it was more expensive. That could be due to low selectivity or because the query doesn't match the index prefix. Recreate only if the index key pattern is wrong.