Prerequisites
Oban and Reliable Background Jobs
Use Oban for durable background jobs in Elixir. Covers retries, scheduling, uniqueness, workflows, idempotency, and operational monitoring.
Oban is the de facto background job system for Elixir applications that need reliability and operational visibility. It stores jobs in PostgreSQL and executes them with supervised workers.
Core Worker Example
defmodule MyApp.Workers.SendReceipt do
use Oban.Worker, queue: :mailers, max_attempts: 10
@impl Oban.Worker
def perform(%Oban.Job{args: %{"order_id" => order_id}}) do
with {:ok, order} <- Orders.fetch(order_id),
:ok <- Mailer.send_receipt(order) do
:ok
else
{:error, :not_found} -> :discard
{:error, reason} -> {:error, reason}
end
end
end
Key return values:
:okcompleted,{:error, reason}retry,:discardnon-retryable,{:snooze, seconds}reschedule.
Enqueueing Jobs
%{"order_id" => order.id}
|> MyApp.Workers.SendReceipt.new(unique: [period: 300, keys: [:order_id]])
|> Oban.insert()
unique prevents duplicate inserts for the same logical task in a time window.
Idempotency First
Jobs must be safe to run more than once.
Practical strategies:
- use unique constraints for side effects,
- store external provider ids/status,
- make worker operations state-aware (
already_sentchecks).
Queue and Retry Design
Split workloads by behavior:
:defaultfor normal jobs,:mailersfor external providers,:criticalfor time-sensitive jobs,:lowfor backfill tasks.
Tune:
- queue concurrency,
- retry backoff,
- max attempts,
- dead/failed job handling policies.
# Celery
# Broker + worker model, often Redis/RabbitMQ-backed.
# Reliability depends on broker durability and task design.
// BullMQ / Agenda
// Queue libraries with retries and scheduling,
// usually Redis-backed.
# Oban
# PostgreSQL-backed durability + Ecto integration + rich job controls.
Testing Workers
Use Oban test helpers to assert inserts and execution behavior.
Test both:
- happy path side effects,
- retries/discards for known failure classes.
Exercise
Build a Resilient Email Job Flow
Implement a worker flow for transactional emails:
- Enqueue with uniqueness by recipient and template.
- Handle transient provider failures with retries.
- Discard permanent failures (invalid address).
- Record delivery attempts and final status in DB.
- Add tests for success, retry, and discard cases.
Production Clinic: Oban Operations
Oban problems in production are usually queue-shape and idempotency design issues.
Common failure modes:
- queue starvation where low-priority jobs block high-priority work,
- non-idempotent workers causing duplicate side effects during retries,
- retry storms after downstream provider outages,
- missing visibility into failed/snoozed job trends.
Decision checklist:
- Are queues split by SLA and failure profile (
critical,default,backfill)? - Do workers guarantee idempotency for all external side effects?
- Are retry backoff and max-attempt settings tuned by error class?
- Is there an explicit policy for dead/failed job replay?
- Are queue depth, retry rate, and failure rate visible in dashboards/alerts?
Runbook snippet:
- Check queue depth by queue and state (
available,scheduled,retryable). - Identify top failing workers and classify transient vs permanent failures.
- Throttle or pause non-critical queues during incident containment.
- Apply provider fallback/degraded mode and watch retry drain behavior.
- Replay dead jobs only after idempotency safeguards are validated.
FAQ and Troubleshooting
Why are jobs stuck in available or scheduled?
Check queue configuration, plugin startup, and node clock synchronization. Misconfigured queues or disabled execution environment are common causes.
Why do duplicate emails still occur?
Uniqueness windows only prevent duplicate inserts within configured constraints. You still need idempotent worker logic for retries and out-of-band replays.
When should I split into multiple queues?
Split when workloads have different latency, retry, or resource profiles.
Related Lessons
Key Takeaways
- Oban provides durable, database-backed background processing with predictable retry behavior
- Idempotent worker design is required for correctness under retries
- Queues, uniqueness, and scheduling settings are core operational controls