Oban and Reliable Background Jobs

Use Oban for durable background jobs in Elixir. Covers retries, scheduling, uniqueness, workflows, idempotency, and operational monitoring.

Oban is the de facto background job system for Elixir applications that need reliability and operational visibility. It stores jobs in PostgreSQL and executes them with supervised workers.

Core Worker Example

defmodule MyApp.Workers.SendReceipt do
  use Oban.Worker, queue: :mailers, max_attempts: 10

  @impl Oban.Worker
  def perform(%Oban.Job{args: %{"order_id" => order_id}}) do
    with {:ok, order} <- Orders.fetch(order_id),
         :ok <- Mailer.send_receipt(order) do
      :ok
    else
      {:error, :not_found} -> :discard
      {:error, reason} -> {:error, reason}
    end
  end
end

Key return values:

:ok completed,
{:error, reason} retry,
:discard non-retryable,
{:snooze, seconds} reschedule.

Enqueueing Jobs

%{"order_id" => order.id}
|> MyApp.Workers.SendReceipt.new(unique: [period: 300, keys: [:order_id]])
|> Oban.insert()

unique prevents duplicate inserts for the same logical task in a time window.

Idempotency First

Jobs must be safe to run more than once.

Practical strategies:

use unique constraints for side effects,
store external provider ids/status,
make worker operations state-aware (already_sent checks).

Queue and Retry Design

Split workloads by behavior:

:default for normal jobs,
:mailers for external providers,
:critical for time-sensitive jobs,
:low for backfill tasks.

Tune:

queue concurrency,
retry backoff,
max attempts,
dead/failed job handling policies.

# Celery
# Broker + worker model, often Redis/RabbitMQ-backed.
# Reliability depends on broker durability and task design.

// BullMQ / Agenda
// Queue libraries with retries and scheduling,
// usually Redis-backed.

# Oban
# PostgreSQL-backed durability + Ecto integration + rich job controls.

Testing Workers

Use Oban test helpers to assert inserts and execution behavior.

Test both:

happy path side effects,
retries/discards for known failure classes.

Exercise

Build a Resilient Email Job Flow

Implement a worker flow for transactional emails:

Enqueue with uniqueness by recipient and template.
Handle transient provider failures with retries.
Discard permanent failures (invalid address).
Record delivery attempts and final status in DB.
Add tests for success, retry, and discard cases.

Production Clinic: Oban Operations

Oban problems in production are usually queue-shape and idempotency design issues.

Common failure modes:

queue starvation where low-priority jobs block high-priority work,
non-idempotent workers causing duplicate side effects during retries,
retry storms after downstream provider outages,
missing visibility into failed/snoozed job trends.

Decision checklist:

Are queues split by SLA and failure profile (critical, default, backfill)?
Do workers guarantee idempotency for all external side effects?
Are retry backoff and max-attempt settings tuned by error class?
Is there an explicit policy for dead/failed job replay?
Are queue depth, retry rate, and failure rate visible in dashboards/alerts?

Runbook snippet:

Check queue depth by queue and state (available, scheduled, retryable).
Identify top failing workers and classify transient vs permanent failures.
Throttle or pause non-critical queues during incident containment.
Apply provider fallback/degraded mode and watch retry drain behavior.
Replay dead jobs only after idempotency safeguards are validated.

FAQ and Troubleshooting

Why are jobs stuck in `available` or `scheduled`?

Check queue configuration, plugin startup, and node clock synchronization. Misconfigured queues or disabled execution environment are common causes.

Why do duplicate emails still occur?

Uniqueness windows only prevent duplicate inserts within configured constraints. You still need idempotent worker logic for retries and out-of-band replays.

When should I split into multiple queues?

Split when workloads have different latency, retry, or resource profiles.

Prerequisites

Oban and Reliable Background Jobs

Core Worker Example

Enqueueing Jobs

Idempotency First

Queue and Retry Design

Testing Workers

Exercise

Build a Resilient Email Job Flow

Production Clinic: Oban Operations

FAQ and Troubleshooting

Why are jobs stuck in `available` or `scheduled`?

Why do duplicate emails still occur?

When should I split into multiple queues?

Related Lessons

Further Reading on HexDocs

Key Takeaways

Prerequisites

Oban and Reliable Background Jobs

Core Worker Example

Enqueueing Jobs

Idempotency First

Queue and Retry Design

Testing Workers

Exercise

Build a Resilient Email Job Flow

Production Clinic: Oban Operations

FAQ and Troubleshooting

Why are jobs stuck in available or scheduled?

Why do duplicate emails still occur?

When should I split into multiple queues?

Related Lessons

Further Reading on HexDocs

Key Takeaways

Why are jobs stuck in `available` or `scheduled`?