What this usually means
The consumer might be connected to the wrong queue, might be stuck on a poison message that crashes it before acknowledgement, or might have its prefetch count set to zero. In some message brokers (RabbitMQ, SQS, Kafka), if a consumer fails to acknowledge a message, the broker holds it and waits. If the consumer crashes before acking, the message is requeued — but if the consumer crashes on every attempt, the message loops until it hits the dead-letter queue, and the consumer appears idle.
The first ten minutes \u2014 establish facts before touching code.
- 1Check the queue dashboard (RabbitMQ Management, AWS SQS console, Kafka UI). What state are the messages in? Ready? Unacked? In flight?
- 2Check if the consumer process is actually running. `ps aux | grep consumer` or check your process manager / Kubernetes pod status.
- 3Check the consumer logs for the last processed message. If there is one, the consumer may be stuck on a single message.
- 4Check if the consumer is connected to the right queue, exchange, or topic. A misconfigured routing key or topic subscription means messages go elsewhere.
- 5Check for a dead-letter queue (DLQ). Are messages being routed there after repeated failures?
The specific files, logs, configs, and dashboards that usually own this bug.
- searchMessage broker dashboard — queue depth, message states, consumer count, DLQ depth
- searchConsumer connection settings — host, port, queue name, routing key, consumer group
- searchConsumer logs — last processed message timestamp, any error patterns
- searchConsumer code — prefetch count, acknowledgement mode (auto vs manual), error handling
- searchMessage payload — is the consumer failing to parse the message body?
- searchDLQ — are messages accumulating there?
Practical causes, not theory. These are the things you will actually find.
- warningConsumer is connected to a different queue or topic than the one receiving messages
- warningConsumer crashes on a poison message and restarts, creating a crash loop
- warningConsumer has no error handling — an uncaught exception kills the process
- warningPrefetch count is set to 0 or a very low number
- warningConsumer is stuck waiting for an external resource (database, API) that is unavailable
- warningFor Kafka: consumer group rebalance in progress — no consumer is assigned partitions during rebalance
- warningFor SQS: visibility timeout is too long and messages are hidden after a failed processing attempt
Concrete fix directions. Pick the one that matches your root cause.
- buildAdd a health check endpoint to the consumer that reports last-processed timestamp and queue lag
- buildWrap message processing in a try/catch with a dead-letter queue for messages that fail repeatedly
- buildSet a reasonable prefetch count (e.g. 10-50) so the consumer pulls multiple messages
- buildAdd structured logging: log message ID on receive, on processing start, on success, and on failure
- buildFor Kafka, increase `max.poll.interval.ms` if processing takes longer than the default 5 minutes
- buildFor SQS, set the visibility timeout to exceed the maximum expected processing time
A fix you cannot prove is a guess. Close the loop.
- verifiedPublish a test message and observe it flowing through to the consumer logs and being acknowledged.
- verifiedCheck the queue dashboard — the message should move from ready to unacked to gone.
- verifiedPublish a message that the consumer would fail to process. Confirm it lands in the DLQ, not stuck.
- verifiedRun the consumer locally against a development queue and verify processing.
- verifiedMonitor queue depth over time — it should stabilise or decrease, not grow indefinitely.
Things that make this bug worse or harder to find.
- warningNot setting up a dead-letter queue from the start
- warningUsing auto-ack mode without understanding its implications (message is lost on crash)
- warningNot logging message IDs — makes it impossible to trace a specific message through the system
- warningAssuming the consumer is fine because the process is running
- warningDeploying a consumer without monitoring queue depth and consumer lag