Agent SDKs and frameworks: from primitives to production
"Every six months a new agent framework promises to "make agents easy"
Chapters 128–130 gave you the conceptual machinery — tool use, memory, planning, and evaluation. This chapter is about the concrete tooling you reach for when it is time to wire those ideas into production code. We will walk through the major SDKs and frameworks (Anthropic Claude Agent SDK, OpenAI Agents SDK, LangGraph, and several lighter-weight libraries), write real comparison code, and — most importantly — build a decision framework so you pick the right level of abstraction for your problem, rather than the one with the best README.
131.1 — Build-vs-buy for agent frameworks
The first question is whether you need a framework at all. Build-vs-buy analysis for agent tooling is not the same as for, say, web frameworks. A web framework handles thousands of edge cases around HTTP, routing, middleware, and connection pooling; an agent framework handles… a while-loop and some JSON.
That sounds reductive, but it captures a real tension:
| Factor | Build from scratch | Use a framework |
|---|---|---|
| Loop control | Full | Partial — you rely on the framework’s loop semantics |
| Tool integration | You own the serialisation contract | Framework dictates schema conventions |
| Observability | You instrument what you want | Framework may provide tracing, or may obscure it |
| Multi-agent | You design the topology | Framework enforces a topology (graph, handoff, crew) |
| Time to first demo | Hours | Minutes |
| Time to production | Weeks | Weeks (you still need evals, guardrails, deployment) |
| Upgrade risk | None (you own it) | Breaking changes in fast-moving libraries |
The honest answer: most teams should start with a thin SDK (Anthropic or OpenAI), add a framework only when they hit a structural need — persistent state machines, human-in-the-loop checkpoints, or multi-agent orchestration — and even then they should be prepared to eject.
131.2 — The minimal agent: 50 lines of Python
Before adopting any framework, understand the minimal agent loop. Every framework is a decoration on this skeleton:
"""minimal_agent.py — a complete agent in ~50 lines."""
from __future__ import annotations
import json
from typing import Any
import anthropic
client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-20250514"
# ── Tool registry ──────────────────────────────────────────────
TOOLS: list[dict[str, Any]] = [
{
"name": "get_weather",
"description": "Return current weather for a city.",
"input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
}
]
def dispatch_tool(name: str, args: dict) -> str:
"""Route tool calls to implementations."""
if name == "get_weather":
return json.dumps({"temp_f": 72, "condition": "sunny",
"city": args["city"]})
raise ValueError(f"Unknown tool: {name}")
# ── Agent loop ─────────────────────────────────────────────────
def run_agent(user_message: str, max_turns: int = 10) -> str:
messages = [{"role": "user", "content": user_message}]
for _ in range(max_turns):
response = client.messages.create(
model=MODEL,
max_tokens=4096,
tools=TOOLS,
messages=messages,
)
# Append assistant turn
messages.append({"role": "assistant", "content": response.content})
# If the model stopped normally, we are done
if response.stop_reason == "end_turn":
return "".join(
blk.text for blk in response.content if blk.type == "text"
)
# Otherwise, process every tool_use block
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = dispatch_tool(block.name, block.input)
tool_results.append(
{"type": "tool_result",
"tool_use_id": block.id,
"content": result}
)
messages.append({"role": "user", "content": tool_results})
raise RuntimeError("Agent exceeded max turns")
if __name__ == "__main__":
print(run_agent("What is the weather in Tokyo and Paris?"))
Key observations:
- The loop is a
forwith a budget. Every framework wraps this. - Tool dispatch is a function mapping names → implementations. Every framework wraps this too.
- Messages accumulate. The full conversation is the agent’s “state.”
The OpenAI equivalent is structurally identical — swap anthropic.Anthropic
for openai.OpenAI, adjust the message schema, and change stop_reason to
finish_reason. The loop does not change.
131.3 — Anthropic Claude Agent SDK
The Claude Agent SDK (claude-agent-sdk) is Anthropic’s opinionated
layer on top of the Messages API. It targets single- and multi-agent
workflows with first-class support for tool registration, hooks,
guardrails, and sub-agents.
Core concepts
| Concept | Role |
|---|---|
Agent | Wraps a system prompt, model, tools, and optional sub-agents |
AgentLoop | The run-time loop that calls the model, dispatches tools, and checks guardrails |
Tool | A Python callable decorated with @tool; schema is inferred from type hints |
Hook | A callback fired at well-defined lifecycle points (pre-tool, post-tool, pre-response) |
Guardrail | A check that can block or rewrite a tool call or final response |
SubAgent | A child agent that can be invoked as a tool by the parent |
Typical usage
"""claude_agent_sdk_example.py — multi-tool agent with guardrail."""
from claude_agent_sdk import Agent, tool, guardrail, run
@tool
def search_docs(query: str, top_k: int = 5) -> list[dict]:
"""Search internal documentation.
Args:
query: Natural-language search query.
top_k: Number of results to return.
"""
# In production, call your vector store here.
return [{"title": "Deployment guide", "score": 0.92,
"snippet": "To deploy, run `make deploy`..."}]
@tool
def run_sql(query: str) -> list[dict]:
"""Execute a read-only SQL query against the analytics warehouse."""
# Placeholder — real implementation uses a DB connection pool.
return [{"count": 42}]
@guardrail
def block_mutation(tool_name: str, tool_input: dict) -> None:
"""Reject any SQL that is not a SELECT."""
if tool_name == "run_sql":
sql = tool_input.get("query", "").strip().upper()
if not sql.startswith("SELECT"):
raise ValueError("Only SELECT queries are permitted.")
agent = Agent(
name="analyst",
model="claude-sonnet-4-20250514",
system="You are a data analyst assistant. Use tools to answer questions.",
tools=[search_docs, run_sql],
guardrails=[block_mutation],
max_turns=15,
)
result = run(agent, "How many deployments happened last week?")
print(result.final_text)
Hooks — lifecycle control
Hooks let you inject behaviour without subclassing:
from claude_agent_sdk import Agent, Hook, HookEvent
class AuditHook(Hook):
"""Log every tool invocation to an audit trail."""
def on(self, event: HookEvent) -> None:
if event.kind == "pre_tool":
log_to_audit_trail(
tool=event.tool_name,
input=event.tool_input,
agent=event.agent_name,
timestamp=event.timestamp,
)
agent = Agent(
name="audited-agent",
model="claude-sonnet-4-20250514",
tools=[search_docs, run_sql],
hooks=[AuditHook()],
)
Available hook points: pre_tool, post_tool, pre_response,
post_response, on_error, on_turn_start, on_turn_end.
Sub-agents
A sub-agent is a child agent the parent can delegate to. The parent sees the sub-agent as a tool; the sub-agent runs its own loop with its own tools and guardrails, then returns a summary to the parent.
researcher = Agent(
name="researcher",
model="claude-sonnet-4-20250514",
system="You are a researcher. Search docs and summarise findings.",
tools=[search_docs],
)
orchestrator = Agent(
name="orchestrator",
model="claude-sonnet-4-20250514",
system="You coordinate research and SQL analysis.",
tools=[run_sql],
sub_agents=[researcher],
max_turns=20,
)
result = run(orchestrator, "Summarise last week's deployment failures.")
The SDK serialises the sub-agent boundary cleanly: the parent’s context never sees the child’s internal tool calls, only the final summary. This keeps context window budgets manageable.
131.4 — OpenAI Agents SDK
The OpenAI Agents SDK (openai-agents) takes a slightly different
architectural stance. Where Anthropic centres on an agent-loop with hooks,
OpenAI’s SDK builds around four primitives: Agent, Runner, Handoff,
and Guardrail, plus a first-class tracing layer.
Architecture at a glance
| Primitive | Purpose |
|---|---|
Agent | Declares a model, instructions, tools, handoffs, and output schema |
Runner | Executes one or more agents; manages the turn loop and tool dispatch |
Handoff | A typed edge from one agent to another — the first agent yields control |
Guardrail | An input or output validator that can reject, rewrite, or escalate |
Trace | Structured telemetry emitted automatically during a run |
Code walkthrough
"""openai_agents_example.py — triage + specialist pattern."""
from openai_agents import Agent, Runner, Handoff, InputGuardrail
# ── Specialist agents ──────────────────────────────────────────
billing_agent = Agent(
name="billing",
instructions="You handle billing questions. Be concise.",
model="gpt-4o",
)
tech_agent = Agent(
name="tech_support",
instructions="You handle technical support. Ask for logs if needed.",
model="gpt-4o",
)
# ── Triage agent with handoffs ─────────────────────────────────
triage_agent = Agent(
name="triage",
instructions=(
"You are a triage agent. Determine whether the user needs "
"billing help or technical support, then hand off."
),
model="gpt-4o",
handoffs=[
Handoff(target=billing_agent),
Handoff(target=tech_agent),
],
)
# ── Guardrail ──────────────────────────────────────────────────
class TopicGuardrail(InputGuardrail):
"""Block requests that are not about our product."""
async def run(self, text: str) -> None:
if "competitor" in text.lower():
raise self.reject("We can only help with our own products.")
# ── Execution ──────────────────────────────────────────────────
async def main():
runner = Runner(
agent=triage_agent,
guardrails=[TopicGuardrail()],
)
result = await runner.run("I was double-charged on my last invoice.")
print(result.final_output)
# Inspect trace
for span in result.trace.spans:
print(f" {span.name}: {span.duration_ms}ms")
Handoff vs. sub-agent
The handoff model differs from Anthropic’s sub-agent model in an important way: a handoff transfers the conversation, whereas a sub-agent delegates a subtask and returns. Handoffs suit customer-service routing; sub-agents suit divide-and-conquer research.
Tracing
Every Runner.run() call produces a Trace object with spans for each
LLM call, tool invocation, guardrail check, and handoff. The trace is
OpenTelemetry-compatible and can be exported to any OTLP backend:
from openai_agents.tracing import export_otlp
export_otlp(result.trace, endpoint="http://localhost:4318")
This is one of the strongest reasons to adopt the SDK even if you only need a single agent: production observability out of the box.
131.5 — LangGraph — graph-as-agent
LangGraph (from LangChain, Inc.) models agents as state machines expressed via a directed graph. Each node is a function that reads and writes a typed State object; edges can be conditional. This makes it the natural choice when your workflow has explicit branching, looping, human-in-the-loop pauses, or long-running persistence.
Core concepts
StateGraph— the graph definition. Parameterised by a typedState(usually aTypedDict).- Nodes — Python functions
(state) -> partial_state. - Edges — static or conditional transitions between nodes.
Checkpointer— serialises state after every node so the graph can be paused, resumed, or replayed.interrupt()— pauses execution and waits for external input (human-in-the-loop).
Example: research agent with human approval
"""langgraph_research.py — graph-based agent with human-in-the-loop."""
from __future__ import annotations
from typing import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import interrupt
class ResearchState(TypedDict):
question: str
sources: list[dict]
draft: str
approved: bool
def search_node(state: ResearchState) -> dict:
"""Call a retrieval API and store sources."""
sources = vector_search(state["question"], top_k=8)
return {"sources": sources}
def draft_node(state: ResearchState) -> dict:
"""Ask the LLM to draft an answer from sources."""
draft = llm_draft(state["question"], state["sources"])
return {"draft": draft}
def human_review_node(state: ResearchState) -> dict:
"""Pause for human approval."""
decision = interrupt(
{"draft": state["draft"], "prompt": "Approve this draft? (yes/no)"}
)
return {"approved": decision.lower().strip() == "yes"}
def route_after_review(state: ResearchState) -> str:
return END if state["approved"] else "draft"
# ── Build graph ────────────────────────────────────────────────
graph = StateGraph(ResearchState)
graph.add_node("search", search_node)
graph.add_node("draft", draft_node)
graph.add_node("review", human_review_node)
graph.add_edge(START, "search")
graph.add_edge("search", "draft")
graph.add_edge("draft", "review")
graph.add_conditional_edges("review", route_after_review)
checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)
# First invocation — will pause at human_review_node
thread = {"configurable": {"thread_id": "t-001"}}
result = app.invoke({"question": "How does RLHF work?"}, config=thread)
# Later — resume with human input
result = app.invoke(
None, # no new input, just resuming
config=thread,
input={"decision": "yes"}, # human approval
)
Checkpointing and persistence
The Checkpointer interface has pluggable backends — MemorySaver
for development, SqliteSaver or PostgresSaver for production.
Every node execution writes a checkpoint; the graph can be rewound to any
prior state for debugging or replay.
When LangGraph earns its complexity
LangGraph adds real value when you need:
- Durable, resumable workflows — the checkpointer handles crash recovery.
- Human-in-the-loop —
interrupt()is a first-class primitive. - Explicit control flow — conditional edges make branching visible.
- Multi-agent graphs — each sub-graph is a node in a parent graph.
It is overkill for a single-turn tool-calling agent. Use the minimal loop (§131.2) or a thin SDK (§131.3/§131.4) instead.
131.6 — Lighter-weight: Pydantic AI, Instructor, Mirascope, Magentic
Not every agent needs a state machine. Several libraries occupy the sweet spot between “raw SDK” and “full framework.”
Pydantic AI
Pydantic AI wires tool calls into Pydantic models with zero boilerplate. It supports Anthropic, OpenAI, and other providers behind a unified interface.
"""pydantic_ai_example.py — structured tool agent."""
from pydantic_ai import Agent
from pydantic import BaseModel
class WeatherReport(BaseModel):
city: str
temp_f: float
condition: str
agent = Agent(
"anthropic:claude-sonnet-4-20250514",
system_prompt="You report weather data using the provided tool.",
result_type=WeatherReport,
)
@agent.tool_plain
def get_weather(city: str) -> dict:
"""Fetch current weather for a city."""
return {"city": city, "temp_f": 72.0, "condition": "sunny"}
result = agent.run_sync("Weather in Berlin?")
print(result.data) # WeatherReport(city='Berlin', temp_f=72.0, ...)
Key selling point: the result is a validated Pydantic model, not a raw string. This eliminates a whole class of parsing bugs in downstream code.
Instructor
Instructor focuses on structured output extraction. It patches the underlying SDK client to add automatic retries, validation, and streaming of Pydantic models. It is not an agent framework per se, but it solves the “get JSON out of the model reliably” problem better than anything else:
import instructor
import anthropic
from pydantic import BaseModel
client = instructor.from_anthropic(anthropic.Anthropic())
class Entity(BaseModel):
name: str
entity_type: str
confidence: float
entities = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Extract entities: 'Apple released the iPhone in Cupertino.'"}],
response_model=list[Entity],
)
# [Entity(name='Apple', entity_type='COMPANY', confidence=0.98), ...]
Mirascope
Mirascope provides decorators that turn ordinary Python functions into LLM-backed calls. It supports multiple providers, tool calling, and structured extraction with a lighter API surface than LangChain.
from mirascope.core import anthropic, prompt_template
@anthropic.call("claude-sonnet-4-20250514")
@prompt_template("Summarise the following text in {n} bullet points: {text}")
def summarise(text: str, n: int = 3): ...
response = summarise("LangGraph models agents as state machines...")
print(response.content)
Magentic
Magentic uses Python’s type system to bind LLM outputs. Its signature
feature is @prompt — a decorator that makes an LLM call look like a
regular function:
from magentic import prompt
@prompt("Create a list of {n} names for a {animal} character.")
def character_names(animal: str, n: int) -> list[str]: ...
names = character_names("cat", 5)
# ['Whiskers', 'Luna', 'Shadow', 'Mittens', 'Cleo']
Comparison matrix
| Library | Agent loop | Tool calling | Structured output | Multi-provider | Multi-agent |
|---|---|---|---|---|---|
| Pydantic AI | Yes | Yes | Native | Yes | No |
| Instructor | No | No | Native | Yes | No |
| Mirascope | Minimal | Yes | Yes | Yes | No |
| Magentic | No | Partial | Native | Partial | No |
Use these when your agent is one model, a few tools, and a need for typed outputs. They compose well with a hand-rolled outer loop if you need multi-step reasoning.
131.7 — CrewAI and AutoGen — when multi-agent frameworks earn their keep
When the problem genuinely decomposes into multiple personas with different tools and knowledge — think “researcher + writer + editor” — multi-agent frameworks can reduce the orchestration burden.
CrewAI
CrewAI organises agents into a Crew with a defined Process (sequential or hierarchical):
"""crewai_example.py — research crew."""
from crewai import Agent, Task, Crew, Process
researcher = Agent(
role="Senior Researcher",
goal="Find the latest papers on RLHF.",
backstory="You are an ML researcher with 10 years of experience.",
tools=[arxiv_search],
llm="anthropic/claude-sonnet-4-20250514",
)
writer = Agent(
role="Technical Writer",
goal="Write a clear summary for an engineering audience.",
backstory="You translate complex research into actionable briefs.",
llm="anthropic/claude-sonnet-4-20250514",
)
research_task = Task(
description="Find 5 recent RLHF papers and summarise key findings.",
agent=researcher,
expected_output="A bullet-point summary of 5 papers.",
)
writing_task = Task(
description="Write a one-page brief from the research summary.",
agent=writer,
expected_output="A one-page Markdown document.",
)
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential,
verbose=True,
)
result = crew.kickoff()
print(result)
AutoGen
AutoGen (Microsoft) supports conversational multi-agent patterns. Agents message each other in a group chat, with an optional GroupChatManager that controls turn-taking:
"""autogen_example.py — group chat with two specialists."""
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
coder = AssistantAgent(
name="coder",
system_message="You write Python code to solve data analysis tasks.",
llm_config={"model": "gpt-4o"},
)
reviewer = AssistantAgent(
name="reviewer",
system_message="You review code for correctness and style.",
llm_config={"model": "gpt-4o"},
)
user_proxy = UserProxyAgent(
name="user",
human_input_mode="NEVER",
code_execution_config={"work_dir": "/tmp/autogen"},
)
group_chat = GroupChat(
agents=[user_proxy, coder, reviewer],
messages=[],
max_round=10,
)
manager = GroupChatManager(groupchat=group_chat)
user_proxy.initiate_chat(manager, message="Analyse sales.csv and plot trends.")
When multi-agent earns its keep
Multi-agent frameworks add value when:
- Different agents need different tools or permissions. A “reader” agent that can only query and an “executor” agent that can write reduces blast radius.
- The workflow is naturally conversational. Debate, critique, and refinement loops benefit from role separation.
- You need parallelism. CrewAI’s hierarchical process can fan out tasks.
They are not worth the complexity when a single agent with good tools can do the job. The overhead of serialising context between agents, managing turn order, and debugging cross-agent failures is significant.
131.8 — The framework decision tree
Use this decision tree to pick the right level of abstraction. Start at the
top and follow the first yes.
┌─────────────────────────────────────────────────────────┐
│ Do you need durable, resumable workflows with │
│ human-in-the-loop or complex branching? │
│ │
│ YES → LangGraph │
│ NO ↓ │
├─────────────────────────────────────────────────────────┤
│ Do you need multiple agents with different roles, │
│ tools, or permissions collaborating? │
│ │
│ YES → CrewAI / AutoGen / Claude sub-agents / │
│ OpenAI Handoffs │
│ NO ↓ │
├─────────────────────────────────────────────────────────┤
│ Do you need structured (typed) outputs with │
│ automatic validation and retry? │
│ │
│ YES → Pydantic AI / Instructor │
│ NO ↓ │
├─────────────────────────────────────────────────────────┤
│ Do you want hooks, guardrails, tracing, or sub-agents │
│ without building them yourself? │
│ │
│ YES → Anthropic Claude Agent SDK / OpenAI Agents SDK │
│ NO ↓ │
├─────────────────────────────────────────────────────────┤
│ Minimal agent loop (§ 131.2) │
│ 50 lines. Full control. Ship it. │
└─────────────────────────────────────────────────────────┘
The right framework is the least framework that solves your structural problem. Features like “easy tool registration” do not justify a dependency; a decorator over a dict does the same thing in 10 lines.
131.9 — Building from scratch — when and how
Sometimes the answer is “none of the above.” You should build from scratch when:
- Latency is critical and the framework adds measurable overhead to the hot path (serialisation, checkpointing).
- Your orchestration topology is novel — e.g., a tree-of-thought search with backtracking, or a mixture-of-agents ensemble.
- You need deep integration with an existing system (job scheduler, internal RPC framework, custom auth) and wrapping a framework’s abstractions is harder than writing the loop.
- Compliance or security policy prohibits third-party orchestration libraries in your runtime.
A production-grade scratch build
If you go this route, build these layers explicitly:
"""scratch_agent/core.py — production agent loop from scratch."""
from __future__ import annotations
import json
import time
import logging
import uuid
from dataclasses import dataclass, field
from typing import Any, Callable, Protocol
import anthropic
logger = logging.getLogger(__name__)
# ── Types ──────────────────────────────────────────────────────
class ToolFn(Protocol):
def __call__(self, **kwargs: Any) -> Any: ...
@dataclass
class ToolDef:
name: str
description: str
input_schema: dict
fn: ToolFn
@dataclass
class TurnRecord:
turn: int
role: str
tool_calls: list[dict] = field(default_factory=list)
latency_ms: float = 0.0
@dataclass
class AgentResult:
text: str
turns: list[TurnRecord]
trace_id: str
# ── Tool registry ──────────────────────────────────────────────
class ToolRegistry:
def __init__(self) -> None:
self._tools: dict[str, ToolDef] = {}
def register(self, tool_def: ToolDef) -> None:
self._tools[tool_def.name] = tool_def
def dispatch(self, name: str, args: dict) -> str:
if name not in self._tools:
raise ValueError(f"Unknown tool: {name}")
result = self._tools[name].fn(**args)
return json.dumps(result, default=str)
def schemas(self) -> list[dict]:
return [
{"name": t.name, "description": t.description,
"input_schema": t.input_schema}
for t in self._tools.values()
]
# ── Guardrail interface ────────────────────────────────────────
class Guardrail(Protocol):
def check_tool_call(self, name: str, args: dict) -> None: ...
def check_response(self, text: str) -> None: ...
# ── Agent loop ─────────────────────────────────────────────────
class AgentLoop:
def __init__(
self,
model: str,
system: str,
registry: ToolRegistry,
guardrails: list[Guardrail] | None = None,
max_turns: int = 15,
pre_tool_hook: Callable | None = None,
post_tool_hook: Callable | None = None,
):
self.client = anthropic.Anthropic()
self.model = model
self.system = system
self.registry = registry
self.guardrails = guardrails or []
self.max_turns = max_turns
self.pre_tool_hook = pre_tool_hook
self.post_tool_hook = post_tool_hook
def run(self, user_message: str) -> AgentResult:
trace_id = str(uuid.uuid4())
messages: list[dict] = [{"role": "user", "content": user_message}]
turns: list[TurnRecord] = []
for turn_idx in range(self.max_turns):
t0 = time.perf_counter()
response = self.client.messages.create(
model=self.model,
system=self.system,
max_tokens=4096,
tools=self.registry.schemas(),
messages=messages,
)
latency = (time.perf_counter() - t0) * 1000
messages.append({"role": "assistant",
"content": response.content})
record = TurnRecord(turn=turn_idx, role="assistant",
latency_ms=latency)
if response.stop_reason == "end_turn":
text = "".join(
b.text for b in response.content if b.type == "text"
)
for g in self.guardrails:
g.check_response(text)
record.role = "final"
turns.append(record)
return AgentResult(text=text, turns=turns,
trace_id=trace_id)
# Process tool calls
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
for g in self.guardrails:
g.check_tool_call(block.name, block.input)
if self.pre_tool_hook:
self.pre_tool_hook(block.name, block.input)
result = self.registry.dispatch(block.name, block.input)
if self.post_tool_hook:
self.post_tool_hook(block.name, block.input, result)
record.tool_calls.append(
{"tool": block.name, "input": block.input}
)
tool_results.append(
{"type": "tool_result", "tool_use_id": block.id,
"content": result}
)
messages.append({"role": "user", "content": tool_results})
turns.append(record)
logger.info("turn=%d tools=%d latency=%.0fms trace=%s",
turn_idx, len(tool_results), latency, trace_id)
raise RuntimeError(
f"Agent exceeded {self.max_turns} turns (trace={trace_id})"
)
This is roughly 120 lines and gives you:
- Typed tool registry with schema generation.
- Guardrails as a protocol — plug in any checker.
- Pre/post tool hooks for logging, metrics, or side effects.
- Turn-level telemetry (latency, tool call counts, trace IDs).
- Budgeted loop with a hard turn limit.
You don’t get persistence, human-in-the-loop, or multi-agent orchestration. If you need those, reach for LangGraph or the SDKs. If you don’t, this loop is easier to debug than any framework.
131.10 — Mental model
-
The agent loop is a while-loop with a budget. Every framework wraps this; none transcend it. Understand the loop before you abstract it.
-
Tools are functions with JSON schemas. Registration is syntactic sugar. The hard part is designing good tools, not registering them.
-
Guardrails belong at the loop boundary, not inside tools. Check inputs before dispatch and outputs before returning to the user.
-
Hooks are the escape hatch. When a framework does 90% of what you need, hooks cover the remaining 10% without forking the library.
-
Handoffs transfer; sub-agents delegate. Choose the pattern that matches your control flow — routing (handoff) vs. fan-out (sub-agent).
-
State machines earn their cost when state must survive process restarts. If your agent is stateless and single-turn, a graph is overhead.
-
Multi-agent is a last resort, not a first design. A single agent with well-chosen tools beats a committee of poorly-tooled specialists in most real-world benchmarks.
-
The right framework is the least framework. Start with the minimal loop. Add a library when you hit a structural problem it solves. Eject when the library becomes the structural problem.
Read it yourself
- Anthropic Claude Agent SDK — official documentation and source at
github.com/anthropics/claude-codeand the Anthropic docs site. - OpenAI Agents SDK —
github.com/openai/openai-agents-pythonand the OpenAI platform documentation. - LangGraph —
langchain-ai.github.io/langgraph/for the concepts guide, tutorials, and API reference. - Pydantic AI —
ai.pydantic.devfor quickstart and tool-calling docs. - Instructor —
python.useinstructor.comfor structured output patterns. - CrewAI —
docs.crewai.comfor multi-agent orchestration patterns. - AutoGen —
microsoft.github.io/autogen/for conversational agents. - Harrison Chase, “LangGraph: Multi-Actor Programs with LLMs” (2024) — design philosophy behind the graph-as-agent paradigm.
Practice
-
Implement the minimal agent loop (§131.2) from memory, targeting the OpenAI API instead of Anthropic. Verify it handles parallel tool calls (where the model returns multiple
tool_useblocks in a single turn). -
Add a guardrail to the minimal loop that rejects any tool call whose string-serialised input exceeds 10 KB. Where in the loop does the check belong, and why?
-
Build a two-agent system using the Anthropic Claude Agent SDK where a “planner” agent breaks a complex question into sub-tasks and a “worker” sub-agent executes each one. Compare the token usage against a single-agent baseline that receives the same question.
-
Port the LangGraph research example (§131.5) to use a PostgreSQL-backed checkpointer. Simulate a crash by killing the process mid-run, then resume from the last checkpoint. Measure how much state is recovered.
-
Use Instructor to extract structured data from 100 product reviews (sentiment, key topics, star rating). Compare the retry/validation rate against a hand-rolled
json.loads()approach with manual re-prompting. -
Benchmark latency overhead of three approaches for the same single-tool agent task: (a) raw Anthropic SDK, (b) Claude Agent SDK, (c) LangGraph with a two-node graph. Report per-turn overhead in milliseconds and identify where the framework spends it.
-
Stretch: Design and implement a framework-ejection layer — a thin adapter interface that lets your agent code swap between the Anthropic Claude Agent SDK, the OpenAI Agents SDK, and a hand-rolled loop without changing any tool implementations or guardrail logic. Define the minimal interface, write a concrete adapter for each backend, and demonstrate with a three-tool agent that runs identically on all three.