LLM Application Architecture in Elixir

Design reliable LLM-backed features with Phoenix, Oban, retries, and provider abstraction in Elixir systems.

LLM features are production features. They need explicit architecture boundaries, failure handling, and operational visibility.

Core Architecture Pattern

A practical baseline for Elixir applications:

Phoenix edge for request validation, auth, and UX response contracts.
AI service layer for prompt assembly and provider routing.
Oban workers for long-running or retryable operations.
Persistence layer for prompt version, output metadata, cost, and auditability.

Synchronous vs Asynchronous Boundaries

Use synchronous calls only when users need immediate feedback in the same interaction. For expensive generation, tool calls, or multi-step workflows, enqueue a job and return a task handle to the client.

Provider Abstraction with Behaviors

Define one provider contract:

defmodule MyApp.AI.Provider do
  @type request :: %{model: String.t(), messages: list(map())}
  @type response :: %{content: String.t(), usage: map(), raw: map()}

  @callback chat(request(), keyword()) :: {:ok, response()} | {:error, term()}
end

Implement adapters per vendor:

defmodule MyApp.AI.Providers.OpenAI do
  @behaviour MyApp.AI.Provider

  @impl true
  def chat(request, opts) do
    timeout = Keyword.get(opts, :timeout, 15_000)

    body = %{
      model: request.model,
      messages: request.messages
    }

    case Req.post("https://api.openai.com/v1/chat/completions", json: body, receive_timeout: timeout) do
      {:ok, %{status: 200, body: payload}} ->
        {:ok, %{content: extract_text(payload), usage: payload["usage"] || %{}, raw: payload}}

      {:ok, %{status: status, body: payload}} ->
        {:error, {:provider_error, status, payload}}

      {:error, reason} ->
        {:error, reason}
    end
  end

  defp extract_text(payload) do
    payload
    |> get_in(["choices", Access.at(0), "message", "content"])
    |> to_string()
  end
end

Now application code depends on MyApp.AI.Provider, not a vendor SDK.

Service Layer and Fallback Routing

defmodule MyApp.AI.ChatService do
  alias MyApp.AI.Providers.{OpenAI, Anthropic}

  @providers [OpenAI, Anthropic]

  def generate(request, opts \\ []) do
    Enum.reduce_while(@providers, {:error, :no_provider_available}, fn provider, _acc ->
      case provider.chat(request, opts) do
        {:ok, response} -> {:halt, {:ok, %{provider: provider, response: response}}}
        {:error, _reason} -> {:cont, {:error, :try_next_provider}}
      end
    end)
  end
end

Fallback order should be explicit and tied to cost, latency, and quality constraints.

Async Orchestration with Oban

defmodule MyApp.Workers.GenerateSummary do
  use Oban.Worker,
    queue: :ai,
    max_attempts: 8

  @impl Oban.Worker
  def perform(%Oban.Job{args: %{"document_id" => id, "prompt_version" => version}}) do
    with {:ok, doc} <- MyApp.Documents.fetch(id),
         request <- MyApp.AI.Prompts.build_summary_request(doc, version),
         {:ok, %{provider: provider, response: response}} <- MyApp.AI.ChatService.generate(request, timeout: 20_000),
         :ok <- MyApp.AI.Results.store(id, version, provider, response) do
      :ok
    else
      {:error, :invalid_input} -> :discard
      {:error, reason} -> {:error, reason}
    end
  end
end

Use :discard for non-retryable errors. Retry transient provider and network failures.

Data Model for Traceability

Persist enough data to debug and audit:

prompt version id,
provider name and model,
response content hash,
token usage and unit cost,
latency and retry count,
safety decision metadata.

Store raw provider payloads only when policy allows it.

Important

Do not log full prompts or outputs by default when they may contain sensitive user data. Log structured metadata first, then add redacted payload capture behind explicit controls.

Telemetry Events

Emit events at clear boundaries:

:telemetry.execute(
  [:my_app, :ai, :request],
  %{duration_ms: duration_ms, tokens_in: in_tokens, tokens_out: out_tokens, cost_usd: cost},
  %{provider: provider, model: model, status: status}
)

This enables dashboards for latency, error rate, and spend trends.

Architecture Decision Checklist

Before shipping an AI endpoint, decide:

sync vs async boundary,
timeout and retry budgets,
fallback provider policy,
cost guardrails and tenant limits,
data retention and redaction rules,
quality evaluation threshold for release.

# Typical Python pattern:
# FastAPI + Celery + provider SDK + Redis queue.
# Works well, but requires more manual process supervision choices.

// Typical JS pattern:
// API route + BullMQ + provider SDK.
// Strong ecosystem, but runtime isolation and retries need careful ops discipline.

# Typical Elixir pattern:
# Phoenix + Oban + behavior adapters + Telemetry.
# Strong fit for supervised retries, observability, and runtime resilience.

Exercise

Design a Production-Grade AI Endpoint

Design a “summarize document” feature and include:

sync API contract and async fallback path,
provider behaviour and two adapters,
Oban worker with retry and discard rules,
telemetry event schema,
data retention and redaction policy.

Write one architecture note explaining tradeoffs and why you chose them.

Summary

Reliable LLM architecture in Elixir comes from clear boundaries: Phoenix at the edge, provider abstraction in the service layer, and durable async orchestration with Oban. Design retries, fallbacks, telemetry, and data policy before launch.

FAQ and Troubleshooting

Should I start with synchronous or asynchronous workflows?

Start synchronous for short, low-risk interactions that must return immediately. Use asynchronous jobs when calls are slow, retry-prone, or involve multi-step orchestration.

How many providers should I support initially?

One provider is fine to start if your abstraction boundary is clean. Add a second provider when reliability, cost, or policy requirements justify fallback complexity.

Prerequisites

LLM Application Architecture in Elixir

Core Architecture Pattern

Synchronous vs Asynchronous Boundaries

Provider Abstraction with Behaviors

Service Layer and Fallback Routing

Async Orchestration with Oban

Data Model for Traceability

Telemetry Events

Architecture Decision Checklist

Exercise

Design a Production-Grade AI Endpoint

Summary

FAQ and Troubleshooting

Should I start with synchronous or asynchronous workflows?

How many providers should I support initially?

Related Lessons

Further Reading on HexDocs

Key Takeaways