Chapter 70: Workflow vs agent: when to use durable execution vs an LLM loop

This is a critical chapter for production LLM systems work. The distinction between workflows and agents is one of the most consequential architectural decisions in any LLM application, and most teams get it wrong by defaulting to “agent” when “workflow” would be a better fit.

By the end of this chapter you’ll know exactly when each is appropriate, what each gives up, and how to combine them when needed.

Outline:

The fundamental distinction.
What a workflow is.
What an agent is.
The reliability difference.
The cost difference.
The flexibility trade-off.
Hybrid patterns.
The decision rubric.

70.1 The fundamental distinction

A workflow is a deterministic state machine. The steps are defined in code, the transitions between steps are defined in code, and the LLM’s role (if any) is to fill in specific blanks within steps. The control flow does not depend on the LLM’s decisions.

An agent is a non-deterministic policy. The LLM decides what step to take next based on the current state. The control flow is whatever the LLM decides it should be at each iteration.

The distinction is about who decides what happens next: code (workflow) or the LLM (agent).

Workflows are deterministic — code decides the next step; agents are non-deterministic — the LLM decides, which makes them flexible but harder to debug and more expensive to run.

A simple example. “Process a customer support ticket”:

As a workflow:

Classify the ticket (LLM call: pick one of 5 categories).
If category == “billing”, call the billing tool.
Else if category == “technical”, call the troubleshooting tool.
Else if category == “general”, call the FAQ tool.
Use the tool result to draft a response (LLM call).
Send the response.

As an agent:

Give the LLM the ticket and a set of tools.
The LLM decides which tool to call.
The LLM uses the result to decide what to do next.
Loop until the LLM decides it’s done.
Send the response.

Both approaches “work” in the sense that they produce a response. But they have different reliability, cost, and maintainability characteristics. Picking the wrong one is a real production cost.

70.2 What a workflow is

A workflow is a structured execution graph. Each node is a step (either a code function or an LLM call); each edge is a transition. The graph can have branches (if/else), loops, parallel paths, error handling, and retries.

The key properties of a workflow:

(1) Deterministic structure. The graph is defined ahead of time. The same input always takes the same path (modulo branches based on data values, which are still deterministic given the data).

(2) Durable execution. Each step’s result is persisted. If the workflow crashes mid-execution, it can resume from the last completed step instead of restarting.

(3) Observable. You can see exactly which step is currently running, which steps have completed, and which are pending. Debugging is straightforward.

(4) Testable. You can test each step in isolation. You can mock the LLM calls and verify the structural logic.

(5) Versionable. When you change the workflow, you can run the old version and the new version in parallel and compare.

The technologies for workflows include Temporal, AWS Step Functions, Apache Airflow, Prefect, Dagster, and LangGraph (which is workflow-shaped despite being marketed for agents). For LLM workflows specifically, Temporal and LangGraph are the most common choices.

Temporal is the most mature: it provides durable execution, retries, signals, queries, and versioning out of the box. We’ll cover Temporal in more detail in Chapter 80.

For LLM applications, most “agent” use cases are actually workflows in disguise. The decisions aren’t really agentic — they’re conditional logic that could be encoded in code. Recognizing this lets you build more reliable systems.

70.3 What an agent is

An agent (Chapter 67) is the opposite: the LLM decides at each step what to do. The control flow is determined at runtime by the LLM’s outputs.

The key properties:

(1) Non-deterministic. The same input can produce different paths through the agent’s logic, because the LLM might decide differently each time.

(2) Flexible. The agent can handle unanticipated situations because it doesn’t have a fixed set of steps.

(3) Hard to test. You can’t enumerate the possible execution paths because they depend on LLM decisions.

(4) Hard to debug. When something goes wrong, you have to trace through what the LLM was thinking at each step.

(5) Stateful in ways that are hard to manage. The LLM’s context grows with each step, and managing it (truncating, summarizing) is its own problem.

Agents are the right tool when you genuinely don’t know in advance what steps will be needed — when the task is open-ended and the LLM has to figure out the structure.

70.4 The reliability difference

This is the biggest practical difference. Workflows are dramatically more reliable than agents.

The reasons:

(1) Workflows have explicit error handling. Each step has explicit retry policies, fallbacks, and error handlers. When a step fails, the workflow knows exactly what to do.

(2) Agents have implicit error handling. When an agent’s tool call fails, the LLM has to figure out what to do. Sometimes it retries; sometimes it gives up; sometimes it goes off the rails.

(3) Workflows are deterministic. Same input → same execution path. Bugs are reproducible. Debugging is straightforward.

(4) Agents are non-deterministic. Same input → potentially different execution paths. Bugs are hard to reproduce.

(5) Workflows have crash recovery. Temporal-style durable execution means a workflow can survive a server crash and resume.

(6) Agents typically don’t have crash recovery. If the agent crashes mid-execution, you usually start over.

The empirical observation: agent-based systems have 5-10× more production incidents than workflow-based systems doing the same tasks. The reliability gap is significant.

For tasks where reliability matters (customer-facing automation, billing, anything regulated), start with a workflow. Add agentic flexibility only where necessary.

70.5 The cost difference

Workflows are also cheaper to run. The reasons:

(1) Workflows make fewer LLM calls. Each step has a specific, focused LLM call (or no LLM call at all). Agents make many LLM calls per task (5-12 typical, Chapter 67).

(2) Workflows can use cheap models for simple steps. A classification step can use a small fast model; only the synthesis step needs a big model. Agents typically use one model for everything.

(3) Workflows have better caching. Determinism enables caching: if you’ve seen this exact input before, return the cached result. Agents are harder to cache because the path through the graph differs.

The cost ratio: a workflow doing the same task as an agent typically costs 3-10× less in LLM compute.

For high-volume applications, this matters. A million queries per day at $0.005/query (agent) is $5000/day. The same workload at $0.001/query (workflow) is $1000/day. The savings are real.

70.6 The flexibility trade-off

The downside of workflows: they’re less flexible.

A workflow can only handle the cases its designer anticipated. When a customer query comes in that doesn’t fit any of the predefined categories, the workflow fails or routes to a default that may be wrong.

An agent can handle unanticipated cases because it can decide to do whatever seems right. It can call tools the workflow designer didn’t think to include. It can recover from unexpected errors by trying something else.

This flexibility is the main argument for agents. For tasks where the input space is unbounded and the right action is hard to predict, agents handle the long tail better than workflows.

The trade-off is real, but the question is how much flexibility do you actually need? For most production tasks, the input space is more bounded than people think. Categorizing customer queries into 20 buckets covers 95%+ of real traffic. The 5% that don’t fit can be routed to human review.

The mistake teams make: they design for the 5% with an agent, then discover the agent fails the 95% (the easy cases) more often than a workflow would. The flexibility costs reliability.

70.7 Hybrid patterns

The right answer is often both: a workflow as the primary structure, with agent-like flexibility at specific points where it’s needed.

The hybrid pattern handles 95% of traffic with fast, cheap, deterministic workflow branches and reserves the agent sub-loop for the long tail that code cannot enumerate, with a quality gate protecting both paths.

Example: a customer support workflow:

1. Classify the ticket (LLM call, deterministic categories).
2. If a known category, use the workflow's branch logic.
3. If "unknown" or "complex", spawn a sub-agent that has flexibility to handle the case.
4. Whatever the sub-agent produces, run it through the workflow's quality check.
5. If the quality check passes, send the response. Else, escalate to human.

The workflow handles the bulk (95% of traffic) reliably and cheaply. The sub-agent handles the long tail. The quality check catches sub-agent failures. The whole system is more reliable than either pure workflow or pure agent.

This hybrid pattern is the production reality. Pure workflows are too rigid; pure agents are too unreliable. The combination gives you both reliability and flexibility.

In LangGraph, this is straightforward — you build a workflow graph with most nodes being deterministic functions, and a few nodes being agent loops. In Temporal, the same: the workflow definition is mostly deterministic, with an “agent” activity that runs an LLM loop when needed.

70.8 The decision rubric

Use a workflow when:

The task has a clear structure.
You can enumerate the steps in advance.
Reliability matters (customer-facing, billing, regulated).
Cost matters (high volume).
Debuggability matters.
The input space is bounded.

Use an agent when:

The task is genuinely open-ended.
You can’t enumerate the steps in advance.
The input space is unbounded.
Some reliability and cost overhead is acceptable.
The task is exploratory or research-heavy.

Use a hybrid when:

You have a mix of common and unusual cases.
You need workflow reliability for the common case and agent flexibility for the unusual.
You can quality-gate the agent’s output before committing it.

For most production LLM applications, start with a workflow. Add an agent only where the workflow can’t handle the case. Use the hybrid pattern to combine them.

The opposite (“start with an agent, fall back to a workflow”) usually doesn’t work — agents that haven’t been constrained by a workflow tend to drift in production.

70.9 The mental model

Eight points to take into Chapter 71:

Workflow vs agent is about who decides what happens next: code or the LLM.
Workflows are deterministic, durable, observable, testable, versionable.
Agents are non-deterministic, flexible, hard to debug, expensive.
Workflows are 5-10× more reliable in production.
Workflows are 3-10× cheaper to run.
Agents are more flexible for unanticipated cases.
Hybrid patterns combine workflow reliability with agent flexibility.
Default to workflow. Use agents only where flexibility is essential.

In Chapter 71 we look at the failure modes of agents in production.

Read it yourself

The Temporal documentation, especially the introduction to durable execution.
The LangGraph documentation on workflow-style agents.
The Anthropic blog post “Building effective agents” (which makes a strong workflow case).
The AWS Step Functions documentation.
Chapter 23 of Designing Data-Intensive Applications on stream processing and durable execution.

Practice

Take a customer support task and design it as a workflow. Then design it as an agent. Compare.
Why are workflows more reliable than agents in production? List three specific reasons.
Why are workflows cheaper than agents? Trace the LLM call count for a typical task.
Construct a use case where pure agent (no workflow) is the right choice.
Design a hybrid workflow + agent system for a research assistant. Where would you use each?
Why does “default to workflow” beat “default to agent” in most production cases? Argue.
Stretch: Implement the same task as a Temporal workflow and as a LangChain agent. Compare reliability over 100 runs.