What this usually means
Cloudflare Workers run in a V8 isolate with strict CPU, memory, and network limits. Most errors stem from hitting these resource caps (10ms CPU per request, 128MB memory, 100 subrequests), uncaught exceptions in async code, or misconfigured bindings (KV, R2, D1). The worker runtime will terminate the isolate on any unhandled rejection or synchronous throw, often giving only a generic 500 to the client. Debugging requires inspecting the real-time logs, not just the HTTP response.
The first ten minutes — establish facts before touching code.
- 1Run `wrangler tail --format pretty` in your terminal to stream live logs from the worker.
- 2Check worker metrics in Cloudflare Dashboard > Workers > your worker > Metrics: look for 'Errors' and 'CPU Time' spikes.
- 3Reproduce the failure with `curl -v https://your-worker.example.com` and note the 'cf-ray' header.
- 4Use `wrangler tail` filtering: `wrangler tail --status 5xx` to capture only error responses.
- 5Enable 'Log on exceptions' in the worker's Dashboard > Logs > Settings to capture stack traces.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchCloudflare Dashboard > Workers > [your worker] > Logs: real-time and historical logs with stack traces
- search`wrangler tail` output (CLI): live stream of console.log, errors, and fetch events
- searchWorker runtime metrics: CPU time (ms), Memory (MB), Errors count under Dashboard > Metrics
- searchKV namespace operations: check KV metrics for read/write latency and error rates
- searchwrangler.toml: verify bindings (KV namespaces, R2 buckets, D1 databases) and compatibility date
- searchError stacks in `wrangler tail` often point to the exact line in your bundled script (source map available)
Practical causes, not theory. These are the things you will actually find.
- warningUnhandled promise rejection inside a fetch event handler (most common)
- warningCPU limit exceeded: synchronous loops or heavy computation (e.g., 5MB JSON parse)
- warningMemory limit exceeded: building large objects or caching responses in global scope
- warningKV read failures due to missing key or expired TTL, returned as null without check
- warningSubrequest limit exceeded: more than 100 fetch calls per request (including retries)
- warningIncompatible API used with older compatibility date (e.g., use of `crypto.subtle` without proper flag)
Concrete fix directions. Pick the one that matches your root cause.
- buildWrap all async event handlers in try/catch and return a meaningful error response
- buildMove heavy computation to a separate service or use Durable Objects for stateful processing
- buildUse `await` correctly: avoid promise starvation by calling `event.waitUntil()` for background tasks
- buildAdd KV read retries with exponential backoff for transient failures
- buildUse `response.clone()` before consuming body to avoid 'body already consumed' errors
- buildSet `compatibility_date` to the latest in wrangler.toml and adjust deprecated APIs
A fix you cannot prove is a guess. Close the loop.
- verifiedDeploy the fix and run `wrangler tail --status 5xx` while hitting the endpoint with load (e.g., `hey -n 100 -c 10 https://your-worker.example.com`).
- verifiedMonitor the 'Errors' metric in Dashboard > Metrics: should drop to zero under normal load.
- verifiedCheck 'CPU Time' metric: should stay well below 10ms per request (or your plan's limit).
- verifiedFor KV issues: verify reads return correct data by checking KV Browser in Dashboard.
- verifiedRun a smoke test with a script that simulates the failing scenario (e.g., missing key, large payload).
Things that make this bug worse or harder to find.
- warningDon't rely on console.log in production: use `wrangler tail` instead; logs have no performance impact but are essential.
- warningDon't cache the entire response in a global variable: it will persist across requests and cause memory leaks.
- warningDon't ignore the `ctx` parameter: `ctx.waitUntil()` is required for background tasks; forgetting it kills the promise silently.
- warningDon't assume `fetch` always succeeds: always check `response.ok` or catch network errors.
- warningDon't set a huge `fetch` timeout manually; Workers have a built-in 30-second subrequest limit.
- warningDon't use synchronous `XMLHttpRequest` or `localStorage`: they don't exist in Workers; use `fetch` and KV.
The Case of the Silent 500s: A KV Binding Gone Wrong
Timeline
- 09:15Deployed a new worker version that fetches user preferences from KV on every request.
- 09:18PagerDuty alert: 500 error rate jumped to 12% on the user-facing API.
- 09:20Checked worker logs via `wrangler tail --status 5xx`: no output (logs not streaming).
- 09:22Realized logs were disabled; enabled 'Log on exceptions' in Dashboard.
- 09:25Dashboard logs showed: 'Uncaught (in promise) TypeError: Cannot read properties of null (reading 'preferences')'.
- 09:27Found that KV key for new users was missing; code assumed it always exists.
- 09:30Added null check and default value for missing KV keys.
- 09:32Deployed fix; error rate dropped to 0% immediately.
- 09:35Added KV read retry logic for transient failures as a preventive measure.
We were rolling out a new feature: personalized dashboard based on user preferences stored in KV. I had tested locally with my own account, which had preferences set. Production had many new users with no KV entry. The code did `prefs = await KV.get(userId)` and then accessed `prefs.preferences` without checking for null. The worker threw a TypeError, but the error wasn't caught anywhere, so it returned a generic 500.
The real pain was that `wrangler tail` showed nothing initially because our logging level was set to 'error' only, and the TypeError was an uncaught promise rejection, which didn't print to console.log. I had to enable 'Log on exceptions' in the Dashboard to see the stack trace. Once I saw the null reference, I added a simple guard: `prefs = (await KV.get(userId)) || { preferences: {} }`.
I also added a retry wrapper for KV reads because we had seen occasional transient failures (KV is eventually consistent). The fix was deployed within 10 minutes. Afterward, I wrote a unit test that simulates missing KV keys. The lesson: always assume external data can be null, and always catch unhandled rejections in event handlers.
Root cause
Unhandled promise rejection caused by accessing a property on a null value from KV.
The fix
Added null check and default value for KV reads; added try/catch around the fetch event handler.
The lesson
Always validate external data and catch promise rejections. Enable exception logging in production.
Cloudflare Workers run on a global network of V8 isolates. Each request gets a hard CPU time limit: 10ms for the 'free' plan, 50ms for Bundled, and up to 30s for Unbound (or custom). If your code exceeds this, the runtime terminates the isolate and returns a 503. This is the most common cause of 'random' timeouts.
Memory is capped at 128MB per isolate. Workers share memory with other tenants on the same machine, so your actual limit may be lower under load. A common culprit is building a large JSON object (e.g., parsing a 10MB response) or accidentally storing data in a global variable that persists across requests. Use `globalThis` sparingly and avoid caching response bodies in memory.
`wrangler tail` streams logs in real-time from your worker. It shows `console.log`, `console.error`, and uncaught exceptions. Use it with filters like `--status 5xx` or `--ip` to narrow down. The output includes the request URL, status, and stack trace. However, it only captures logs while the command is running—it doesn't persist.
The Dashboard logs keep recent logs (up to 24 hours) and support search. You can also enable 'Log on exceptions' to capture stack traces for unhandled rejections. For production debugging, I always have a separate 'tail' session open during deployments. If logs don't appear, check that the worker is actually receiving traffic and that you have the correct environment selected.
KV reads are eventually consistent and can return stale data or null for missing keys. Always handle `null` returns gracefully. KV also has a write rate limit (1 write per second per key), so rapid writes can silently fail. Use `KV.get()` with `type: 'text'` or `json` as appropriate.
R2 (object storage) and D1 (SQLite) have their own quirks. R2 requires a custom binding in `wrangler.toml` and can cause 'binding not found' errors if the namespace name is misspelled. D1 queries have a 10-second timeout; use prepared statements to avoid SQL injection. Always check the binding's availability with a try/catch in the worker's initialization.
Workers can make up to 100 subrequests per request (including redirects). If you exceed this, the runtime throws an error. This often happens when retrying failed fetches in a loop. Use `fetch()` with a timeout wrapper (since Workers don't have native timeout for fetch) and limit retries to 2-3.
Another common issue is 'body already consumed' when you try to read `response.text()` twice. Always clone the response if you need to read it multiple times: `const cloned = response.clone();`. Also, remember that `fetch` in Workers follows redirects by default, which counts toward the subrequest limit.
Frequently asked questions
Why does my worker return 1101 (internal error) but no stack trace?
Error 1101 is a generic internal error. It usually means an uncaught exception in your worker. To see the stack trace, enable 'Log on exceptions' in the Dashboard (Logs > Settings) or use `wrangler tail` with `--format pretty`. The most common cause is a synchronous throw inside a fetch event handler (e.g., accessing a property on undefined).
How do I debug a worker that works locally but fails in production?
Local dev uses a different environment (miniflare) that may not perfectly replicate production limits. Check production metrics for CPU and memory usage. Also verify bindings: a KV namespace that exists locally may not be bound in production. Use `wrangler tail --env production` to see production logs. Another difference is the compatibility date: ensure your `wrangler.toml` has the same `compatibility_date` as what you test with.
What does 'Script too large' mean during deployment?
Cloudflare Workers have a 1MB code size limit (compressed). If your bundle exceeds that, you need to split your code into multiple workers or use Durable Objects for heavy logic. Common culprits are large npm libraries (e.g., lodash, moment.js). Use a tree shaker like esbuild (which wrangler uses) and avoid importing entire libraries. You can also use the 'worker bundler' option in wrangler.toml to reduce size.
How do I handle rate limiting from external APIs in a worker?
Workers can't hold state across requests (unless using Durable Objects). To rate-limit outbound requests, use a combination of KV (with TTL) to track tokens and a simple counter. However, be aware that KV is eventually consistent, so you may overshoot slightly. For precise rate limiting, consider using a Durable Object as a rate limiter. Alternatively, offload the rate-limited calls to a separate service.
Can I use npm packages in Cloudflare Workers?
Yes, but with caveats. Packages that rely on Node.js built-in modules (fs, http, net) won't work because Workers run in a V8 isolate, not Node. Use packages that are compatible with the Workers runtime (e.g., pure JS or those that use Web APIs). Wrangler bundles your code with esbuild, which automatically treeshakes. Check the package's documentation for Workers compatibility.