Task queues

A task queue runs units of work (jobs) in the background on a pool of workers, off the request path, with scheduling, retries, and result tracking built in.

Why it matters

Anything slow or flaky — sending email, resizing images, generating a PDF, calling a third-party API — should not run inside the user’s HTTP request. A task queue moves it to a worker pool so the web tier responds in milliseconds and stays responsive under load. It layers job-level features on top of a raw broker: retries with backoff, scheduled/periodic execution (cron), priorities, and a place to store results — things you’d otherwise hand-roll on a bare message queue.

How it works

A web process enqueues a task (function name + arguments, serialized); workers pull and execute. Frameworks like Celery, Sidekiq, and Resque add operational machinery on top:

FeatureWhat it gives you
Retry + backoffre-run failed jobs after 1s, 4s, 16s…
Scheduled / periodicrun at a time, or every N minutes (beat)
Prioritiesdrain critical before low
Result backendstore return value / status by job id

The flow:

enqueue("resize_image", {photo_id: 42}) → broker
worker: deserialize → run → on error, retry w/ backoff → DLQ after N

A task queue uses a broker (Redis, RabbitMQ) for transport but owns the job lifecycle. Keep tasks small, idempotent (retries re-run them), and pass ids, not payloads — fetch fresh data inside the worker so a job sitting in the queue never carries stale state. When producers outpace workers, apply back-pressure rather than letting the queue grow without bound.

Example

User uploads a profile photo:

POST /avatar  → save original to S3
              → enqueue("make_thumbnails", {photo_id: 42})
              → return 202 Accepted immediately

worker pool (×8):
  load photo 42 → generate 64/128/512px → write to S3 → mark done
  on transient S3 error → retry (backoff); 5 fails → DLQ + alert

The user sees “processing…” and the UI swaps in thumbnails when the job finishes — the request itself returned in ~20 ms.

Pitfalls

  • Fat payloads. Serializing whole objects into the job bloats the broker and goes stale; enqueue an id and re-read inside the task.
  • Non-idempotent jobs. With at-least-once execution, a retried “charge card” job double-charges unless guarded by an idempotency key.
  • Silent failures. Without DLQ monitoring and alerts, jobs vanish; track failure rate, queue latency, and DLQ depth.
  • Long-running tasks blocking workers. One 10-minute job ties up a worker; chunk big work into many small tasks so the pool stays liquid.

See also