LEARN · DEBUGGING GUIDE

Celery Beat Periodic Task Not Running: A Field Guide

When a scheduled task just doesn't run, the cause is almost never 'Celery is broken'. It's a misconfigured scheduler, a stale lock, a timezone trap, or a worker that never received the task. Here's how to find out which.

IntermediatePython9 min read

What this usually means

Celery Beat is a separate process that reads the schedule from your configuration or database, then sends tasks to the message broker at the appropriate times. If tasks aren't running, one of four things is broken: (1) The Beat process itself is dead, stuck, or not actually running your schedule. (2) The Beat process cannot reach the broker (Redis/RabbitMQ) or the broker is rejecting messages. (3) The task is being scheduled correctly but no worker is listening on the correct queue to pick it up. (4) The scheduler's internal lock (used to prevent duplicate scheduling in multi-node deployments) is held by a ghost process, effectively blocking all new schedules. The non-obvious culprit is almost always #4 — a stale database-backed scheduler lock that silently prevents the scheduler from advancing.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

  • 1Run `celery -A your_app inspect scheduled` to see what Beat thinks is scheduled. If empty or missing your task, Beat is not reading your config.
  • 2Check Beat's logs explicitly: `journalctl -u celery-beat -n 50 --no-pager` or `tail -f /var/log/celery/beat.log`. Look for 'DatabaseScheduler: schedule changed' or 'Clock: timezone mismatch'.
  • 3Verify the Beat process is running: `ps aux | grep celery.*beat` and check that there's exactly one process, not zero and not two.
  • 4Send a test task directly to the queue: `celery -A your_app send_task 'myapp.tasks.my_task'` and watch the worker logs to confirm the worker can receive and execute it.
  • 5If using database scheduler, inspect the `celery_periodic_tasks` table: `SELECT * FROM celery_periodic_tasks WHERE enabled=1 AND last_run_at IS NOT NULL ORDER BY last_run_at DESC LIMIT 5;`
  • 6Check the broker queue: `redis-cli llen celery` or `rabbitmqctl list_queues` to see if tasks are being enqueued but not consumed.
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • searchCelery Beat log file (default: /var/log/celery/beat.log or journalctl -u celery-beat)
  • searchWorker log file (default: /var/log/celery/worker.log)
  • searchBroker monitoring: Redis MONITOR or RabbitMQ management UI
  • searchDatabase: `celery_periodic_tasks` and `celery_schedule_entries` tables
  • searchApplication config: `celery.py` or `beat_schedule` definition
  • searchSystem time and timezone: `timedatectl` and `date`
  • searchProcess list: `ps aux | grep celery` and check for multiple beat processes
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningStale database scheduler lock: The `celery_schedule_entries` table has a row with a lock that was never released after a previous Beat crash, preventing any new schedule from being written.
  • warningTimezone mismatch between Beat process and task schedule: Beat uses UTC by default; if your schedule uses local time and Beat's timezone is set wrong, tasks fire at unexpected times or never.
  • warningMultiple Beat instances running: In a multi-server deployment, two Beat processes both try to update the schedule, leading to lost tasks or duplicate entries.
  • warningBroker connection lost: Beat loses connection to Redis/RabbitMQ but doesn't crash — it silently stops sending tasks until the connection is restored.
  • warningTask routed to wrong queue: The periodic task is configured to go to a queue that no worker is consuming (e.g., default queue but workers listen on 'celery').
  • warningTask function import failure: Beat successfully schedules the task, but when the worker tries to execute it, the import fails and the task is lost with a traceback only in worker logs.
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildClear stale scheduler lock: Delete rows from `celery_schedule_entries` where `last_run_at` is older than expected or where a lock is held, then restart Beat.
  • buildExplicitly set timezone in Beat config: Add `beat_timezone = 'America/New_York'` (or your zone) to your Celery app config, and ensure all task schedules use the same timezone.
  • buildEnsure single Beat instance: Use a process supervisor (systemd, supervisord) with a PID file to prevent multiple launches, or implement a distributed lock via Redis SETNX.
  • buildConfigure Beat to reconnect to broker: Set `broker_connection_retry = True` and `broker_connection_max_retries = None` in Celery config.
  • buildExplicitly set task queue in `@periodic_task` or `beat_schedule`: Add `{'queue': 'my_queue'}` to the schedule entry and ensure workers are consuming that queue.
  • buildVerify task import path: Run `celery -A your_app inspect registered` to see all registered tasks. If your task isn't listed, fix the import.
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedAfter a fix, restart Beat and immediately run `celery -A your_app inspect scheduled` — your task should appear with a next run time.
  • verifiedSet the schedule to run every 1 minute and watch the Beat log for 'Scheduler: Sending task' messages.
  • verifiedCheck the worker log for the task execution — you should see 'Task myapp.tasks.my_task[task_id] succeeded'.
  • verifiedVerify only one Beat process is running: `ps aux | grep celery.*beat | wc -l` should return 1 (excluding grep).
  • verifiedFor database scheduler, query `celery_periodic_tasks` to see `last_run_at` updating as expected.
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningDon't blindly restart Beat without checking the lock table — you'll just reacquire the stale lock if the database still holds it.
  • warningDon't set `timezone = 'UTC'` and then schedule tasks with local times expecting them to convert automatically — Celery won't.
  • warningDon't run Beat as a root process if your app runs as a different user — file permissions on logs or PID files can cause silent failures.
  • warningDon't rely solely on `celery -A proj status` to check Beat health — it only checks workers, not Beat.
  • warningDon't use the default scheduler (`PersistentScheduler`) in production if you need reliable multi-node scheduling — use `DatabaseScheduler` with proper locking.
  • warningDon't ignore worker logs when debugging Beat — the task may be sent but fail on execution with an import error that only appears in worker logs.
( 07 )War story

The Silent Midnight Scheduler

Senior Backend EngineerDjango 3.2, Celery 5.2, Redis 6, PostgreSQL 13, Ubuntu 20.04

Timeline

  1. 08:45Alert: 'daily_report' task not run since 02:13 AM. Task should run every hour.
  2. 08:50Check Beat process: `ps aux | grep beat` — two Beat processes running (one from old deploy, one new).
  3. 08:52Kill both Beat processes, restart single Beat. Task runs once, then stops again.
  4. 09:00Inspect `celery_periodic_tasks` table — `last_run_at` for 'daily_report' is 02:13, but `enabled=1`.
  5. 09:05Check `celery_schedule_entries`: found a row with `lock=True` and `last_run_at=02:13`. Stale lock.
  6. 09:07Delete the stale lock row manually: `DELETE FROM celery_schedule_entries WHERE lock=True AND id=42;`
  7. 09:08Restart Beat. Task runs at 09:10 as scheduled.
  8. 09:15Verify with `celery inspect scheduled` — 'daily_report' shows next run at 10:00.
  9. 09:20Add a systemd unit with `KillMode=process` and a PID file to prevent duplicate beats.

The night before, we had a rolling deploy of the Django app. The old Beat process was still running when the new one started, because our supervisor config didn't stop the old process gracefully. Now we had two Beat processes both trying to manage the schedule. One of them acquired a lock on the `celery_schedule_entries` table to write the 'daily_report' schedule, then crashed during the deploy. The lock was never released.

When I arrived at 8:45 AM, the daily report hadn't run since 2:13 AM. My first instinct was to restart Beat, which I did. But the stale lock prevented the new Beat from updating the schedule — it kept seeing the old lock and silently skipped scheduling. The task ran once because Beat tried to send the missed task, but then fell back to waiting for the lock to clear, which never happened.

I eventually found the stale lock by querying the database directly. Deleting the lock row and restarting Beat fixed it immediately. The lesson: always use a proper process supervisor with pre-stop signals and implement a distributed lock with a timeout (e.g., Redis SETNX with TTL) so stale locks expire. I also added monitoring on the `celery_schedule_entries` table to alert if a lock is held for more than 5 minutes without a corresponding Beat process.

Root cause

Two Beat instances running concurrently due to a rolling deploy without proper process termination; one crashed while holding a database lock for the scheduler, leaving a stale lock that blocked all subsequent schedule updates.

The fix

Delete the stale lock row from `celery_schedule_entries` table, then restart Beat with a single-instance guarantee (systemd PID file). Implement distributed lock with TTL via Redis.

The lesson

Never assume a single Beat instance. Use a PID file or distributed lock with expiration. Monitor lock age and alert on anomalies.

( 08 )Understanding Beat's Scheduler Lock

Celery Beat's `DatabaseScheduler` uses a database table (`celery_schedule_entries` by default) to coordinate schedule updates across multiple Beat instances. The scheduler acquires a lock by setting a column (e.g., `lock=True`) on a specific row before modifying the schedule. This prevents race conditions where two instances write conflicting schedules.

The problem: if the Beat process crashes or is killed without releasing the lock, the lock remains set indefinitely. Subsequent Beat instances see the lock and skip their schedule update, effectively pausing all periodic tasks. The lock is only cleared when the locking Beat restarts or the row is manually deleted. This is the single most common cause of 'tasks not running' in multi-server deployments.

( 09 )Timezone: The Silent Off-by-One

Celery Beat uses UTC by default for all schedule calculations, regardless of the system timezone. If you define a schedule with `crontab(hour=9, minute=0)` expecting it to run at 9 AM local time, but your server is UTC+5, it will run at 9 AM UTC (which is 2 PM local). Worse, if you set `timezone='America/New_York'` in Celery config but your crontab entries still use UTC, tasks might fire at unexpected times.

The fix: always set `beat_timezone` explicitly in your Celery app config to the timezone your schedules are written in. Then ensure all `crontab` and `timedelta` schedules are defined in that same timezone. A common mistake is to use local time in the schedule but forget to set `beat_timezone`, leading to tasks firing hours early or late.

( 10 )Tracing the Task Lifecycle: From Beat to Worker

When a task fails to run, the problem could be at any stage: (1) Beat doesn't schedule it (schedule not read), (2) Beat schedules but fails to send to broker (broker connection issue), (3) Broker receives but worker doesn't consume (queue mismatch), (4) Worker consumes but task fails silently (import error, exception caught).

To isolate the stage, use these commands: `celery -A proj inspect scheduled` (shows what Beat intends to send), `celery -A proj inspect active` (shows tasks currently running on workers), and `celery -A proj inspect reserved` (shows tasks fetched from broker but not yet started). If the task appears in 'scheduled' but never in 'active', the broker or worker routing is the issue. If it never appears in 'scheduled', the scheduler is the problem.

( 11 )Using Beat's Logging for Deep Diagnostics

Beat logs are notoriously terse by default. To get more detail, increase the log level: start Beat with `--loglevel=DEBUG`. Then look for messages like 'DatabaseScheduler: schedule changed' (indicates the scheduler is updating), 'Scheduler: Sending task' (confirms a task is being dispatched), and 'Clock: timezone mismatch' (catches timezone misconfig).

Common log patterns: a healthy Beat logs a 'Sending task' message at the scheduled time. If you see 'Scheduler: task skipped' or 'Lock not acquired', the lock is stale. If you see nothing at the scheduled time, Beat may not have the task in its schedule at all — check the database or config file.

( 12 )Preventing Duplicate Beats in Production

The safest way to run Beat in production is with a process supervisor that ensures exactly one instance. Use systemd with `KillMode=process` and a PID file: `ExecStart=/usr/bin/celery -A proj beat --pidfile=/var/run/celery/beat.pid`. The PID file prevents a second instance from starting. Additionally, set `CELERYBEAT_PIDFILE` environment variable.

For high-availability setups where you need Beat on multiple nodes, use a distributed lock (e.g., Redis SETNX with a TTL of 60 seconds) so only one instance holds the scheduler lock at a time. The lock must be refreshed periodically (e.g., every 30 seconds) to prevent expiry while Beat is healthy. If Beat crashes, the lock expires automatically, allowing another instance to take over.

Frequently asked questions

Why does my task run once after restarting Beat but then stops again?

This is a classic sign of a stale scheduler lock. When Beat restarts, it may send the missed task immediately, but then it tries to update the schedule and encounters the lock. Since the lock is held by a ghost process, Beat cannot update the schedule and stops sending future tasks. Delete the stale lock from `celery_schedule_entries` and restart Beat again.

Can I use a file-based scheduler in production?

You can, but it's risky. The `PersistentScheduler` stores the schedule in a local file (`celerybeat-schedule`). If you have multiple Beat instances (e.g., due to auto-scaling), they will each write to their own file, causing schedule inconsistencies. Worse, if the file becomes corrupted, you lose the schedule. Use `DatabaseScheduler` for any deployment with more than one server.

How do I check if my Beat process is actually running the schedule?

Run `celery -A your_app inspect scheduled` — this queries Beat's in-memory schedule via the broker. If your task appears with a correct 'next_run_time', Beat has loaded it. If the command fails or returns nothing, Beat may not be running, or the broker connection is broken. Also check Beat's log for 'DatabaseScheduler: schedule changed' messages.

What happens if the broker goes down while Beat is running?

Beat will attempt to reconnect based on `broker_connection_retry` settings (default: retry forever). During the outage, tasks that were due are missed — they are not queued later. Once the broker is back, Beat resumes scheduling future tasks, but missed tasks are lost unless you have a catch-up mechanism. Set `worker_cancel_long_running_tasks_on_connection_loss` if needed.

My task runs locally but not in production. What's different?

Common differences: (1) Timezone — local dev may use UTC while production uses local time. (2) Database scheduler vs file scheduler — production may use `DatabaseScheduler` with stale locks. (3) Queue names — production workers may listen on different queues. (4) Import paths — a module that exists in dev may be missing in production. Compare the output of `celery inspect scheduled` and `celery inspect registered` between environments.