What this usually means
Tokio runtime errors fall into two broad categories: configuration mismatches and blocking violations. The most common is accidentally calling `block_on` inside an async context that is already running on a Tokio runtime, which triggers the 'Cannot start a runtime from within a runtime' panic. Another classic is holding a `std::sync::MutexGuard` across an `.await` point, which blocks the entire worker thread and prevents other tasks from making progress. Missing wakers—when a future is never polled because no one called `wake()`—cause tasks to hang silently. Less frequent but equally painful: using a non-Tokio async primitive (like `async_std::channel`) inside a Tokio runtime, or forgetting to enable the `rt` and `macros` features in Cargo.toml.
The first ten minutes — establish facts before touching code.
- 1Set `RUST_BACKTRACE=1` and reproduce the hang or panic; examine the full backtrace for the exact panic message and location.
- 2Run `cargo tree -p tokio --depth=1` to verify features: `rt`, `macros`, `time`, `sync` are present if used.
- 3If hanging, take a thread dump: on Linux, `kill -QUIT <pid>` (SIGQUIT) prints all thread stacks; look for threads stuck in `tokio::runtime::block_on` or `park`.
- 4Add `tokio-console` (via `console-subscriber`) to your app and inspect spawned tasks, their states, and poll counts.
- 5Search the codebase for `std::sync::Mutex` or `std::sync::RwLock`—replace with `tokio::sync::Mutex` if held across `.await`.
- 6Check for nested runtimes: grep for `block_on` inside a function that is itself called from an async context.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchCargo.toml: verify Tokio features enabled
- searchThread dumps (e.g., from SIGQUIT or `jstack` equivalent)
- search`tokio-console` output: task list, poll times, wake counts
- searchAll `.await` points: ensure no blocking locks held across them
- search`panic!()` messages in stderr or log files
- search`RUST_LOG=tokio=trace` output to see task scheduling events
- searchAny custom `Future` implementations: check `poll()` returns `Poll::Pending` correctly and stores wakers
Practical causes, not theory. These are the things you will actually find.
- warningCalling `block_on` inside an async function already running on a Tokio runtime
- warningUsing `std::sync::Mutex` and holding the lock across an `.await`
- warningMissing Tokio features: `rt-multi-thread`, `macros`, `time` not enabled in Cargo.toml
- warningForgetting to call `wake()` in a custom future or manual waker implementation
- warningUsing a synchronous blocking call (like `std::thread::sleep`) inside an async task
- warningCreating a new runtime inside a task (e.g., `tokio::runtime::Runtime::new().block_on(...)`)
Concrete fix directions. Pick the one that matches your root cause.
- buildReplace nested `block_on` with the outer runtime's handle: use `tokio::runtime::Handle::current()` and `handle.block_on()` only at the top level.
- buildRefactor `std::sync::Mutex` to `tokio::sync::Mutex` for any lock held across `.await`.
- buildEnsure Cargo.toml includes `tokio = { version = "1", features = ["full"] }` or at least the specific features you need.
- buildIn custom futures, always call `cx.waker().wake_by_ref()` when the future can make progress again.
- buildReplace blocking calls with async alternatives (e.g., `tokio::time::sleep` instead of `std::thread::sleep`).
- buildUse `tokio::task::spawn_blocking` for CPU-heavy or blocking I/O work.
A fix you cannot prove is a guess. Close the loop.
- verifiedRun the application under `tokio-console`; confirm all tasks reach completion and none are stuck in 'pending' for too long.
- verifiedWrite a minimal reproduction test: spawn a task that does a simple `tokio::time::sleep` and assert it completes within a timeout.
- verifiedAdd a periodic health-check endpoint that logs the number of active tasks (`tokio::runtime::Handle::current().metrics().num_alive_tasks()`).
- verifiedRun with `RUST_LOG=tokio=trace` and verify that tasks are polled and woken as expected.
- verifiedUse `cargo test` with `#[tokio::test]` to ensure no panics related to runtime context.
Things that make this bug worse or harder to find.
- warningDo not use `std::thread::sleep` in async code; use `tokio::time::sleep`.
- warningDo not create a new `Runtime` inside a task; pass the handle instead.
- warningDo not ignore the `Send` bound: if a task is spawned with `spawn`, it must implement `Send`.
- warningAvoid holding a `tokio::sync::Mutex` lock across `.await` if possible; consider `tokio::sync::Semaphore` or restructure.
- warningDo not assume `#[tokio::main]` automatically enables `rt-multi-thread`; check your features.
The Silent Hang: A Tokio Runtime Starvation
Timeline
- 09:15Deploy new version of auth service to staging
- 09:18Health check endpoint starts returning 503 after 30s timeout
- 09:20Check logs: no errors, just normal INFO lines
- 09:25Run `curl -v` against health endpoint; hangs forever
- 09:30Take thread dump with `kill -QUIT <pid>`; see two worker threads stuck in `park`
- 09:35Inspect dump: one thread at `tokio::runtime::block_on` on a `JoinHandle`
- 09:40Discover recently added code: `std::sync::Mutex` locked in an async handler, held across an `.await` to a DB query
- 09:45Replace `std::sync::Mutex` with `tokio::sync::Mutex`
- 09:47Redeploy; health check responds in <5ms
I had just added a new endpoint that cached some user permissions in a `HashMap` behind a `std::sync::Mutex`. The handler would lock the mutex, check the cache, and if missing, async-query the database with `.await` while holding the lock. That mutex is not designed for async—it blocks the thread. Since Tokio's multi-thread runtime has a fixed number of worker threads, one thread blocking on that mutex meant all other tasks on that thread starved.
The health check endpoint was a simple 'OK' response, but it was spawned onto the same worker thread that was blocked. So the health check never got polled. The thread dump showed two workers parked—one waiting on `epoll_wait`, the other stuck on the mutex. The third worker was idle. The blocking was subtle: only requests hitting that specific endpoint would trigger the deadlock.
I replaced `std::sync::Mutex` with `tokio::sync::Mutex`, which is designed to yield the task while waiting for the lock, not block the thread. After redeploy, everything worked. The lesson: never hold a blocking lock across `.await`. Use Tokio's sync primitives or restructure to avoid holding locks across suspension points.
Root cause
`std::sync::Mutex` held across `.await` point, blocking the Tokio worker thread and starving all tasks on that thread.
The fix
Replaced `std::sync::Mutex` with `tokio::sync::Mutex`.
The lesson
Blocking a Tokio worker thread is like pulling a single chair from a musical chairs game—every task on that thread loses. Use async-aware synchronization or spawn_blocking for blocking work.
Tokio's multi-thread runtime maintains a global queue and per-worker local queues. When a task is spawned, it goes into the global queue, and each worker pulls tasks into its local queue. Workers can steal tasks from each other's local queues when idle. A blocking call (like `std::sync::Mutex::lock`) on a worker thread prevents that worker from processing any tasks, but other workers can still steal its tasks—unless the blocking call holds up the worker for so long that the global queue empties and all workers starve.
The key metric is `tokio::runtime::RuntimeMetrics::num_alive_tasks()` and `blocking_threads`. If you see high `blocking_threads` count, it means tasks are being offloaded to the blocking thread pool, which is a sign of blocking in async code. Use `tokio-console` to visualize task poll durations and identify tasks that are pending for suspiciously long times.
This panic happens when you call `tokio::runtime::Runtime::new()` or `#[tokio::main]` inside a function that is already running on a Tokio runtime. Tokio forbids nested runtimes because it would lead to two independent event loops interfering. The fix is to use `tokio::runtime::Handle::current()` to get a handle to the existing runtime and call `handle.block_on()` only at the top-level entry point.
Common scenarios: a library function that creates its own runtime for convenience, or a test that uses `#[tokio::test]` and also calls a function that does `block_on`. To debug, set a breakpoint on the panic location and inspect the call stack to identify the offending call.
If you implement a custom `Future`, you must call `cx.waker().wake()` (or `wake_by_ref()`) when the future can make progress again. Forgetting this results in a future that is never polled again, causing hangs. The symptom is a task that is 'pending' forever in tokio-console, with zero wakeups.
A minimal example: a oneshot channel that stores the waker but never calls it on the receiver side. To debug, add logging in your `poll` method to confirm it returns `Poll::Pending` and that the waker is stored correctly. Use `tokio::sync::Notify` instead of rolling your own if possible.
Missing features like `rt` or `time` cause panics like 'there is no reactor running'. The `time` feature is required for `tokio::time::sleep`. The `rt` feature is needed for `spawn` and `block_on`. For multi-thread, you need `rt-multi-thread`. Always check Cargo.toml: `tokio = { version = "1", features = ["full"] }` is the safest start.
If you are writing a library, avoid depending on `rt` features; instead, use `tokio::io` without runtime by requiring the user to provide a runtime. Use `cfg` attributes to conditionally compile runtime-dependent code.
tokio-console is a diagnostics tool that shows the state of all tasks in a Tokio runtime. Add `console-subscriber` to your dependencies and enable the `tracing` feature on Tokio. Then run your app with `TOKIO_CONSOLE_BIND=127.0.0.1:6669` and connect with `tokio-console`. You'll see a list of tasks, their state (idle, running, pending), poll counts, and wakeups.
For hangs, look for tasks in 'pending' state with no wakeups—they are stuck because a waker was never invoked. For performance issues, look for tasks with high poll counts or long durations. tokio-console can also show resource usage per task, helping you identify hot loops.
Frequently asked questions
Why does my Tokio application hang when I use `std::sync::Mutex`?
`std::sync::Mutex` blocks the current thread when locked. In Tokio's async runtime, worker threads must not block—otherwise they cannot poll other tasks. If you hold the lock across an `.await`, the thread is blocked, and any task waiting on that thread (including the lock holder itself, after the `.await`) cannot proceed. Use `tokio::sync::Mutex` which yields the task instead of blocking the thread.
What does 'there is no reactor running' mean?
This panic occurs when you try to use async I/O (like `TcpStream::connect`) or timers without a Tokio runtime being active. Usually you forgot to annotate your `main` with `#[tokio::main]` or you are calling async functions from a synchronous context. Ensure you have a runtime and that you are inside a task spawned by that runtime.
How do I fix 'Cannot start a runtime from within a runtime'?
This means you're trying to create a new Tokio runtime inside an async function that is already running on one. Instead of creating a new runtime, use `tokio::runtime::Handle::current()` to get the current runtime's handle and call `handle.block_on()` only at the top-level entry point. In library code, accept a `&Handle` parameter or use `Handle::try_current()` to optionally use an existing runtime.
Why does `tokio::time::sleep` never complete?
Check that you have the `time` feature enabled in your Cargo.toml: `tokio = { features = ["time"] }`. Also ensure you are inside a Tokio runtime context. If you are in a test, use `#[tokio::test]` instead of `#[test]`. If the sleep is in a spawned task, make sure the runtime isn't dropped before the sleep completes (e.g., the `main` function exits).
How can I detect if my async code is blocking the runtime?
Enable `tokio-console` to see task states and wakeups. Set `RUST_LOG=tokio=trace` to log scheduling events. Use `tokio::runtime::RuntimeMetrics::blocking_threads()` to see if tasks are being offloaded. Write a simple test that spawns multiple tasks and measures their completion time; if they take longer than expected, you likely have blocking.