Part XI · Building Agents and Agent Infrastructure
Chapter 131 ~21 min read

Agent SDKs and frameworks: from primitives to production

"Every six months a new agent framework promises to "make agents easy"

Chapters 128–130 gave you the conceptual machinery — tool use, memory, planning, and evaluation. This chapter is about the concrete tooling you reach for when it is time to wire those ideas into production code. We will walk through the major SDKs and frameworks (Anthropic Claude Agent SDK, OpenAI Agents SDK, LangGraph, and several lighter-weight libraries), write real comparison code, and — most importantly — build a decision framework so you pick the right level of abstraction for your problem, rather than the one with the best README.


131.1 — Build-vs-buy for agent frameworks

The first question is whether you need a framework at all. Build-vs-buy analysis for agent tooling is not the same as for, say, web frameworks. A web framework handles thousands of edge cases around HTTP, routing, middleware, and connection pooling; an agent framework handles… a while-loop and some JSON.

That sounds reductive, but it captures a real tension:

FactorBuild from scratchUse a framework
Loop controlFullPartial — you rely on the framework’s loop semantics
Tool integrationYou own the serialisation contractFramework dictates schema conventions
ObservabilityYou instrument what you wantFramework may provide tracing, or may obscure it
Multi-agentYou design the topologyFramework enforces a topology (graph, handoff, crew)
Time to first demoHoursMinutes
Time to productionWeeksWeeks (you still need evals, guardrails, deployment)
Upgrade riskNone (you own it)Breaking changes in fast-moving libraries

The honest answer: most teams should start with a thin SDK (Anthropic or OpenAI), add a framework only when they hit a structural need — persistent state machines, human-in-the-loop checkpoints, or multi-agent orchestration — and even then they should be prepared to eject.


131.2 — The minimal agent: 50 lines of Python

Before adopting any framework, understand the minimal agent loop. Every framework is a decoration on this skeleton:

"""minimal_agent.py — a complete agent in ~50 lines."""
from __future__ import annotations

import json
from typing import Any

import anthropic

client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-20250514"

# ── Tool registry ──────────────────────────────────────────────
TOOLS: list[dict[str, Any]] = [
    {
        "name": "get_weather",
        "description": "Return current weather for a city.",
        "input_schema": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    }
]

def dispatch_tool(name: str, args: dict) -> str:
    """Route tool calls to implementations."""
    if name == "get_weather":
        return json.dumps({"temp_f": 72, "condition": "sunny",
                           "city": args["city"]})
    raise ValueError(f"Unknown tool: {name}")

# ── Agent loop ─────────────────────────────────────────────────
def run_agent(user_message: str, max_turns: int = 10) -> str:
    messages = [{"role": "user", "content": user_message}]

    for _ in range(max_turns):
        response = client.messages.create(
            model=MODEL,
            max_tokens=4096,
            tools=TOOLS,
            messages=messages,
        )
        # Append assistant turn
        messages.append({"role": "assistant", "content": response.content})

        # If the model stopped normally, we are done
        if response.stop_reason == "end_turn":
            return "".join(
                blk.text for blk in response.content if blk.type == "text"
            )

        # Otherwise, process every tool_use block
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = dispatch_tool(block.name, block.input)
                tool_results.append(
                    {"type": "tool_result",
                     "tool_use_id": block.id,
                     "content": result}
                )
        messages.append({"role": "user", "content": tool_results})

    raise RuntimeError("Agent exceeded max turns")

if __name__ == "__main__":
    print(run_agent("What is the weather in Tokyo and Paris?"))

Key observations:

  1. The loop is a for with a budget. Every framework wraps this.
  2. Tool dispatch is a function mapping names → implementations. Every framework wraps this too.
  3. Messages accumulate. The full conversation is the agent’s “state.”

The OpenAI equivalent is structurally identical — swap anthropic.Anthropic for openai.OpenAI, adjust the message schema, and change stop_reason to finish_reason. The loop does not change.

User prompt LLM call

tool_use?

Tool dispatch Final answer no yes append result
Figure 131.1 — The minimal agent loop. Every framework decorates this cycle.

131.3 — Anthropic Claude Agent SDK

The Claude Agent SDK (claude-agent-sdk) is Anthropic’s opinionated layer on top of the Messages API. It targets single- and multi-agent workflows with first-class support for tool registration, hooks, guardrails, and sub-agents.

Core concepts

ConceptRole
AgentWraps a system prompt, model, tools, and optional sub-agents
AgentLoopThe run-time loop that calls the model, dispatches tools, and checks guardrails
ToolA Python callable decorated with @tool; schema is inferred from type hints
HookA callback fired at well-defined lifecycle points (pre-tool, post-tool, pre-response)
GuardrailA check that can block or rewrite a tool call or final response
SubAgentA child agent that can be invoked as a tool by the parent

Typical usage

"""claude_agent_sdk_example.py — multi-tool agent with guardrail."""
from claude_agent_sdk import Agent, tool, guardrail, run

@tool
def search_docs(query: str, top_k: int = 5) -> list[dict]:
    """Search internal documentation.

    Args:
        query: Natural-language search query.
        top_k: Number of results to return.
    """
    # In production, call your vector store here.
    return [{"title": "Deployment guide", "score": 0.92,
             "snippet": "To deploy, run `make deploy`..."}]

@tool
def run_sql(query: str) -> list[dict]:
    """Execute a read-only SQL query against the analytics warehouse."""
    # Placeholder — real implementation uses a DB connection pool.
    return [{"count": 42}]

@guardrail
def block_mutation(tool_name: str, tool_input: dict) -> None:
    """Reject any SQL that is not a SELECT."""
    if tool_name == "run_sql":
        sql = tool_input.get("query", "").strip().upper()
        if not sql.startswith("SELECT"):
            raise ValueError("Only SELECT queries are permitted.")

agent = Agent(
    name="analyst",
    model="claude-sonnet-4-20250514",
    system="You are a data analyst assistant. Use tools to answer questions.",
    tools=[search_docs, run_sql],
    guardrails=[block_mutation],
    max_turns=15,
)

result = run(agent, "How many deployments happened last week?")
print(result.final_text)

Hooks — lifecycle control

Hooks let you inject behaviour without subclassing:

from claude_agent_sdk import Agent, Hook, HookEvent

class AuditHook(Hook):
    """Log every tool invocation to an audit trail."""

    def on(self, event: HookEvent) -> None:
        if event.kind == "pre_tool":
            log_to_audit_trail(
                tool=event.tool_name,
                input=event.tool_input,
                agent=event.agent_name,
                timestamp=event.timestamp,
            )

agent = Agent(
    name="audited-agent",
    model="claude-sonnet-4-20250514",
    tools=[search_docs, run_sql],
    hooks=[AuditHook()],
)

Available hook points: pre_tool, post_tool, pre_response, post_response, on_error, on_turn_start, on_turn_end.

Sub-agents

A sub-agent is a child agent the parent can delegate to. The parent sees the sub-agent as a tool; the sub-agent runs its own loop with its own tools and guardrails, then returns a summary to the parent.

researcher = Agent(
    name="researcher",
    model="claude-sonnet-4-20250514",
    system="You are a researcher. Search docs and summarise findings.",
    tools=[search_docs],
)

orchestrator = Agent(
    name="orchestrator",
    model="claude-sonnet-4-20250514",
    system="You coordinate research and SQL analysis.",
    tools=[run_sql],
    sub_agents=[researcher],
    max_turns=20,
)

result = run(orchestrator, "Summarise last week's deployment failures.")

The SDK serialises the sub-agent boundary cleanly: the parent’s context never sees the child’s internal tool calls, only the final summary. This keeps context window budgets manageable.


131.4 — OpenAI Agents SDK

The OpenAI Agents SDK (openai-agents) takes a slightly different architectural stance. Where Anthropic centres on an agent-loop with hooks, OpenAI’s SDK builds around four primitives: Agent, Runner, Handoff, and Guardrail, plus a first-class tracing layer.

Architecture at a glance

PrimitivePurpose
AgentDeclares a model, instructions, tools, handoffs, and output schema
RunnerExecutes one or more agents; manages the turn loop and tool dispatch
HandoffA typed edge from one agent to another — the first agent yields control
GuardrailAn input or output validator that can reject, rewrite, or escalate
TraceStructured telemetry emitted automatically during a run

Code walkthrough

"""openai_agents_example.py — triage + specialist pattern."""
from openai_agents import Agent, Runner, Handoff, InputGuardrail

# ── Specialist agents ──────────────────────────────────────────
billing_agent = Agent(
    name="billing",
    instructions="You handle billing questions. Be concise.",
    model="gpt-4o",
)

tech_agent = Agent(
    name="tech_support",
    instructions="You handle technical support. Ask for logs if needed.",
    model="gpt-4o",
)

# ── Triage agent with handoffs ─────────────────────────────────
triage_agent = Agent(
    name="triage",
    instructions=(
        "You are a triage agent. Determine whether the user needs "
        "billing help or technical support, then hand off."
    ),
    model="gpt-4o",
    handoffs=[
        Handoff(target=billing_agent),
        Handoff(target=tech_agent),
    ],
)

# ── Guardrail ──────────────────────────────────────────────────
class TopicGuardrail(InputGuardrail):
    """Block requests that are not about our product."""

    async def run(self, text: str) -> None:
        if "competitor" in text.lower():
            raise self.reject("We can only help with our own products.")

# ── Execution ──────────────────────────────────────────────────
async def main():
    runner = Runner(
        agent=triage_agent,
        guardrails=[TopicGuardrail()],
    )
    result = await runner.run("I was double-charged on my last invoice.")
    print(result.final_output)
    # Inspect trace
    for span in result.trace.spans:
        print(f"  {span.name}: {span.duration_ms}ms")

Handoff vs. sub-agent

The handoff model differs from Anthropic’s sub-agent model in an important way: a handoff transfers the conversation, whereas a sub-agent delegates a subtask and returns. Handoffs suit customer-service routing; sub-agents suit divide-and-conquer research.

Tracing

Every Runner.run() call produces a Trace object with spans for each LLM call, tool invocation, guardrail check, and handoff. The trace is OpenTelemetry-compatible and can be exported to any OTLP backend:

from openai_agents.tracing import export_otlp

export_otlp(result.trace, endpoint="http://localhost:4318")

This is one of the strongest reasons to adopt the SDK even if you only need a single agent: production observability out of the box.


131.5 — LangGraph — graph-as-agent

LangGraph (from LangChain, Inc.) models agents as state machines expressed via a directed graph. Each node is a function that reads and writes a typed State object; edges can be conditional. This makes it the natural choice when your workflow has explicit branching, looping, human-in-the-loop pauses, or long-running persistence.

Core concepts

  • StateGraph — the graph definition. Parameterised by a typed State (usually a TypedDict).
  • Nodes — Python functions (state) -> partial_state.
  • Edges — static or conditional transitions between nodes.
  • Checkpointer — serialises state after every node so the graph can be paused, resumed, or replayed.
  • interrupt() — pauses execution and waits for external input (human-in-the-loop).

Example: research agent with human approval

"""langgraph_research.py — graph-based agent with human-in-the-loop."""
from __future__ import annotations

from typing import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import interrupt

class ResearchState(TypedDict):
    question: str
    sources: list[dict]
    draft: str
    approved: bool

def search_node(state: ResearchState) -> dict:
    """Call a retrieval API and store sources."""
    sources = vector_search(state["question"], top_k=8)
    return {"sources": sources}

def draft_node(state: ResearchState) -> dict:
    """Ask the LLM to draft an answer from sources."""
    draft = llm_draft(state["question"], state["sources"])
    return {"draft": draft}

def human_review_node(state: ResearchState) -> dict:
    """Pause for human approval."""
    decision = interrupt(
        {"draft": state["draft"], "prompt": "Approve this draft? (yes/no)"}
    )
    return {"approved": decision.lower().strip() == "yes"}

def route_after_review(state: ResearchState) -> str:
    return END if state["approved"] else "draft"

# ── Build graph ────────────────────────────────────────────────
graph = StateGraph(ResearchState)
graph.add_node("search", search_node)
graph.add_node("draft", draft_node)
graph.add_node("review", human_review_node)

graph.add_edge(START, "search")
graph.add_edge("search", "draft")
graph.add_edge("draft", "review")
graph.add_conditional_edges("review", route_after_review)

checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)

# First invocation — will pause at human_review_node
thread = {"configurable": {"thread_id": "t-001"}}
result = app.invoke({"question": "How does RLHF work?"}, config=thread)

# Later — resume with human input
result = app.invoke(
    None,                        # no new input, just resuming
    config=thread,
    input={"decision": "yes"},   # human approval
)

Checkpointing and persistence

The Checkpointer interface has pluggable backends — MemorySaver for development, SqliteSaver or PostgresSaver for production. Every node execution writes a checkpoint; the graph can be rewound to any prior state for debugging or replay.

When LangGraph earns its complexity

LangGraph adds real value when you need:

  1. Durable, resumable workflows — the checkpointer handles crash recovery.
  2. Human-in-the-loopinterrupt() is a first-class primitive.
  3. Explicit control flow — conditional edges make branching visible.
  4. Multi-agent graphs — each sub-graph is a node in a parent graph.

It is overkill for a single-turn tool-calling agent. Use the minimal loop (§131.2) or a thin SDK (§131.3/§131.4) instead.

START

search draft review human-in-the-loop

END

Checkpointer (state persisted)

approved rejected

Figure 131.2 — A LangGraph state machine with human-in-the-loop. Dashed lines indicate checkpoint writes after each node.

131.6 — Lighter-weight: Pydantic AI, Instructor, Mirascope, Magentic

Not every agent needs a state machine. Several libraries occupy the sweet spot between “raw SDK” and “full framework.”

Pydantic AI

Pydantic AI wires tool calls into Pydantic models with zero boilerplate. It supports Anthropic, OpenAI, and other providers behind a unified interface.

"""pydantic_ai_example.py — structured tool agent."""
from pydantic_ai import Agent
from pydantic import BaseModel

class WeatherReport(BaseModel):
    city: str
    temp_f: float
    condition: str

agent = Agent(
    "anthropic:claude-sonnet-4-20250514",
    system_prompt="You report weather data using the provided tool.",
    result_type=WeatherReport,
)

@agent.tool_plain
def get_weather(city: str) -> dict:
    """Fetch current weather for a city."""
    return {"city": city, "temp_f": 72.0, "condition": "sunny"}

result = agent.run_sync("Weather in Berlin?")
print(result.data)  # WeatherReport(city='Berlin', temp_f=72.0, ...)

Key selling point: the result is a validated Pydantic model, not a raw string. This eliminates a whole class of parsing bugs in downstream code.

Instructor

Instructor focuses on structured output extraction. It patches the underlying SDK client to add automatic retries, validation, and streaming of Pydantic models. It is not an agent framework per se, but it solves the “get JSON out of the model reliably” problem better than anything else:

import instructor
import anthropic
from pydantic import BaseModel

client = instructor.from_anthropic(anthropic.Anthropic())

class Entity(BaseModel):
    name: str
    entity_type: str
    confidence: float

entities = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Extract entities: 'Apple released the iPhone in Cupertino.'"}],
    response_model=list[Entity],
)
# [Entity(name='Apple', entity_type='COMPANY', confidence=0.98), ...]

Mirascope

Mirascope provides decorators that turn ordinary Python functions into LLM-backed calls. It supports multiple providers, tool calling, and structured extraction with a lighter API surface than LangChain.

from mirascope.core import anthropic, prompt_template

@anthropic.call("claude-sonnet-4-20250514")
@prompt_template("Summarise the following text in {n} bullet points: {text}")
def summarise(text: str, n: int = 3): ...

response = summarise("LangGraph models agents as state machines...")
print(response.content)

Magentic

Magentic uses Python’s type system to bind LLM outputs. Its signature feature is @prompt — a decorator that makes an LLM call look like a regular function:

from magentic import prompt

@prompt("Create a list of {n} names for a {animal} character.")
def character_names(animal: str, n: int) -> list[str]: ...

names = character_names("cat", 5)
# ['Whiskers', 'Luna', 'Shadow', 'Mittens', 'Cleo']

Comparison matrix

LibraryAgent loopTool callingStructured outputMulti-providerMulti-agent
Pydantic AIYesYesNativeYesNo
InstructorNoNoNativeYesNo
MirascopeMinimalYesYesYesNo
MagenticNoPartialNativePartialNo

Use these when your agent is one model, a few tools, and a need for typed outputs. They compose well with a hand-rolled outer loop if you need multi-step reasoning.


131.7 — CrewAI and AutoGen — when multi-agent frameworks earn their keep

When the problem genuinely decomposes into multiple personas with different tools and knowledge — think “researcher + writer + editor” — multi-agent frameworks can reduce the orchestration burden.

CrewAI

CrewAI organises agents into a Crew with a defined Process (sequential or hierarchical):

"""crewai_example.py — research crew."""
from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Senior Researcher",
    goal="Find the latest papers on RLHF.",
    backstory="You are an ML researcher with 10 years of experience.",
    tools=[arxiv_search],
    llm="anthropic/claude-sonnet-4-20250514",
)

writer = Agent(
    role="Technical Writer",
    goal="Write a clear summary for an engineering audience.",
    backstory="You translate complex research into actionable briefs.",
    llm="anthropic/claude-sonnet-4-20250514",
)

research_task = Task(
    description="Find 5 recent RLHF papers and summarise key findings.",
    agent=researcher,
    expected_output="A bullet-point summary of 5 papers.",
)

writing_task = Task(
    description="Write a one-page brief from the research summary.",
    agent=writer,
    expected_output="A one-page Markdown document.",
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    verbose=True,
)

result = crew.kickoff()
print(result)

AutoGen

AutoGen (Microsoft) supports conversational multi-agent patterns. Agents message each other in a group chat, with an optional GroupChatManager that controls turn-taking:

"""autogen_example.py — group chat with two specialists."""
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager

coder = AssistantAgent(
    name="coder",
    system_message="You write Python code to solve data analysis tasks.",
    llm_config={"model": "gpt-4o"},
)

reviewer = AssistantAgent(
    name="reviewer",
    system_message="You review code for correctness and style.",
    llm_config={"model": "gpt-4o"},
)

user_proxy = UserProxyAgent(
    name="user",
    human_input_mode="NEVER",
    code_execution_config={"work_dir": "/tmp/autogen"},
)

group_chat = GroupChat(
    agents=[user_proxy, coder, reviewer],
    messages=[],
    max_round=10,
)
manager = GroupChatManager(groupchat=group_chat)

user_proxy.initiate_chat(manager, message="Analyse sales.csv and plot trends.")

When multi-agent earns its keep

Multi-agent frameworks add value when:

  1. Different agents need different tools or permissions. A “reader” agent that can only query and an “executor” agent that can write reduces blast radius.
  2. The workflow is naturally conversational. Debate, critique, and refinement loops benefit from role separation.
  3. You need parallelism. CrewAI’s hierarchical process can fan out tasks.

They are not worth the complexity when a single agent with good tools can do the job. The overhead of serialising context between agents, managing turn order, and debugging cross-agent failures is significant.


131.8 — The framework decision tree

Use this decision tree to pick the right level of abstraction. Start at the top and follow the first yes.

 ┌─────────────────────────────────────────────────────────┐
 │  Do you need durable, resumable workflows with          │
 │  human-in-the-loop or complex branching?                │
 │                                                         │
 │  YES → LangGraph                                        │
 │  NO  ↓                                                  │
 ├─────────────────────────────────────────────────────────┤
 │  Do you need multiple agents with different roles,      │
 │  tools, or permissions collaborating?                   │
 │                                                         │
 │  YES → CrewAI / AutoGen / Claude sub-agents /           │
 │        OpenAI Handoffs                                  │
 │  NO  ↓                                                  │
 ├─────────────────────────────────────────────────────────┤
 │  Do you need structured (typed) outputs with            │
 │  automatic validation and retry?                        │
 │                                                         │
 │  YES → Pydantic AI / Instructor                         │
 │  NO  ↓                                                  │
 ├─────────────────────────────────────────────────────────┤
 │  Do you want hooks, guardrails, tracing, or sub-agents  │
 │  without building them yourself?                        │
 │                                                         │
 │  YES → Anthropic Claude Agent SDK / OpenAI Agents SDK   │
 │  NO  ↓                                                  │
 ├─────────────────────────────────────────────────────────┤
 │  Minimal agent loop (§ 131.2)                           │
 │  50 lines. Full control. Ship it.                       │
 └─────────────────────────────────────────────────────────┘

The right framework is the least framework that solves your structural problem. Features like “easy tool registration” do not justify a dependency; a decorator over a dict does the same thing in 10 lines.


131.9 — Building from scratch — when and how

Sometimes the answer is “none of the above.” You should build from scratch when:

  1. Latency is critical and the framework adds measurable overhead to the hot path (serialisation, checkpointing).
  2. Your orchestration topology is novel — e.g., a tree-of-thought search with backtracking, or a mixture-of-agents ensemble.
  3. You need deep integration with an existing system (job scheduler, internal RPC framework, custom auth) and wrapping a framework’s abstractions is harder than writing the loop.
  4. Compliance or security policy prohibits third-party orchestration libraries in your runtime.

A production-grade scratch build

If you go this route, build these layers explicitly:

"""scratch_agent/core.py — production agent loop from scratch."""
from __future__ import annotations

import json
import time
import logging
import uuid
from dataclasses import dataclass, field
from typing import Any, Callable, Protocol

import anthropic

logger = logging.getLogger(__name__)

# ── Types ──────────────────────────────────────────────────────
class ToolFn(Protocol):
    def __call__(self, **kwargs: Any) -> Any: ...

@dataclass
class ToolDef:
    name: str
    description: str
    input_schema: dict
    fn: ToolFn

@dataclass
class TurnRecord:
    turn: int
    role: str
    tool_calls: list[dict] = field(default_factory=list)
    latency_ms: float = 0.0

@dataclass
class AgentResult:
    text: str
    turns: list[TurnRecord]
    trace_id: str

# ── Tool registry ──────────────────────────────────────────────
class ToolRegistry:
    def __init__(self) -> None:
        self._tools: dict[str, ToolDef] = {}

    def register(self, tool_def: ToolDef) -> None:
        self._tools[tool_def.name] = tool_def

    def dispatch(self, name: str, args: dict) -> str:
        if name not in self._tools:
            raise ValueError(f"Unknown tool: {name}")
        result = self._tools[name].fn(**args)
        return json.dumps(result, default=str)

    def schemas(self) -> list[dict]:
        return [
            {"name": t.name, "description": t.description,
             "input_schema": t.input_schema}
            for t in self._tools.values()
        ]

# ── Guardrail interface ────────────────────────────────────────
class Guardrail(Protocol):
    def check_tool_call(self, name: str, args: dict) -> None: ...
    def check_response(self, text: str) -> None: ...

# ── Agent loop ─────────────────────────────────────────────────
class AgentLoop:
    def __init__(
        self,
        model: str,
        system: str,
        registry: ToolRegistry,
        guardrails: list[Guardrail] | None = None,
        max_turns: int = 15,
        pre_tool_hook: Callable | None = None,
        post_tool_hook: Callable | None = None,
    ):
        self.client = anthropic.Anthropic()
        self.model = model
        self.system = system
        self.registry = registry
        self.guardrails = guardrails or []
        self.max_turns = max_turns
        self.pre_tool_hook = pre_tool_hook
        self.post_tool_hook = post_tool_hook

    def run(self, user_message: str) -> AgentResult:
        trace_id = str(uuid.uuid4())
        messages: list[dict] = [{"role": "user", "content": user_message}]
        turns: list[TurnRecord] = []

        for turn_idx in range(self.max_turns):
            t0 = time.perf_counter()
            response = self.client.messages.create(
                model=self.model,
                system=self.system,
                max_tokens=4096,
                tools=self.registry.schemas(),
                messages=messages,
            )
            latency = (time.perf_counter() - t0) * 1000

            messages.append({"role": "assistant",
                             "content": response.content})

            record = TurnRecord(turn=turn_idx, role="assistant",
                                latency_ms=latency)

            if response.stop_reason == "end_turn":
                text = "".join(
                    b.text for b in response.content if b.type == "text"
                )
                for g in self.guardrails:
                    g.check_response(text)
                record.role = "final"
                turns.append(record)
                return AgentResult(text=text, turns=turns,
                                   trace_id=trace_id)

            # Process tool calls
            tool_results = []
            for block in response.content:
                if block.type != "tool_use":
                    continue
                for g in self.guardrails:
                    g.check_tool_call(block.name, block.input)
                if self.pre_tool_hook:
                    self.pre_tool_hook(block.name, block.input)

                result = self.registry.dispatch(block.name, block.input)

                if self.post_tool_hook:
                    self.post_tool_hook(block.name, block.input, result)

                record.tool_calls.append(
                    {"tool": block.name, "input": block.input}
                )
                tool_results.append(
                    {"type": "tool_result", "tool_use_id": block.id,
                     "content": result}
                )

            messages.append({"role": "user", "content": tool_results})
            turns.append(record)

            logger.info("turn=%d tools=%d latency=%.0fms trace=%s",
                        turn_idx, len(tool_results), latency, trace_id)

        raise RuntimeError(
            f"Agent exceeded {self.max_turns} turns (trace={trace_id})"
        )

This is roughly 120 lines and gives you:

  • Typed tool registry with schema generation.
  • Guardrails as a protocol — plug in any checker.
  • Pre/post tool hooks for logging, metrics, or side effects.
  • Turn-level telemetry (latency, tool call counts, trace IDs).
  • Budgeted loop with a hard turn limit.

You don’t get persistence, human-in-the-loop, or multi-agent orchestration. If you need those, reach for LangGraph or the SDKs. If you don’t, this loop is easier to debug than any framework.


131.10 — Mental model

  1. The agent loop is a while-loop with a budget. Every framework wraps this; none transcend it. Understand the loop before you abstract it.

  2. Tools are functions with JSON schemas. Registration is syntactic sugar. The hard part is designing good tools, not registering them.

  3. Guardrails belong at the loop boundary, not inside tools. Check inputs before dispatch and outputs before returning to the user.

  4. Hooks are the escape hatch. When a framework does 90% of what you need, hooks cover the remaining 10% without forking the library.

  5. Handoffs transfer; sub-agents delegate. Choose the pattern that matches your control flow — routing (handoff) vs. fan-out (sub-agent).

  6. State machines earn their cost when state must survive process restarts. If your agent is stateless and single-turn, a graph is overhead.

  7. Multi-agent is a last resort, not a first design. A single agent with well-chosen tools beats a committee of poorly-tooled specialists in most real-world benchmarks.

  8. The right framework is the least framework. Start with the minimal loop. Add a library when you hit a structural problem it solves. Eject when the library becomes the structural problem.


Read it yourself

  • Anthropic Claude Agent SDK — official documentation and source at github.com/anthropics/claude-code and the Anthropic docs site.
  • OpenAI Agents SDKgithub.com/openai/openai-agents-python and the OpenAI platform documentation.
  • LangGraphlangchain-ai.github.io/langgraph/ for the concepts guide, tutorials, and API reference.
  • Pydantic AIai.pydantic.dev for quickstart and tool-calling docs.
  • Instructorpython.useinstructor.com for structured output patterns.
  • CrewAIdocs.crewai.com for multi-agent orchestration patterns.
  • AutoGenmicrosoft.github.io/autogen/ for conversational agents.
  • Harrison Chase, “LangGraph: Multi-Actor Programs with LLMs” (2024) — design philosophy behind the graph-as-agent paradigm.

Practice

  1. Implement the minimal agent loop (§131.2) from memory, targeting the OpenAI API instead of Anthropic. Verify it handles parallel tool calls (where the model returns multiple tool_use blocks in a single turn).

  2. Add a guardrail to the minimal loop that rejects any tool call whose string-serialised input exceeds 10 KB. Where in the loop does the check belong, and why?

  3. Build a two-agent system using the Anthropic Claude Agent SDK where a “planner” agent breaks a complex question into sub-tasks and a “worker” sub-agent executes each one. Compare the token usage against a single-agent baseline that receives the same question.

  4. Port the LangGraph research example (§131.5) to use a PostgreSQL-backed checkpointer. Simulate a crash by killing the process mid-run, then resume from the last checkpoint. Measure how much state is recovered.

  5. Use Instructor to extract structured data from 100 product reviews (sentiment, key topics, star rating). Compare the retry/validation rate against a hand-rolled json.loads() approach with manual re-prompting.

  6. Benchmark latency overhead of three approaches for the same single-tool agent task: (a) raw Anthropic SDK, (b) Claude Agent SDK, (c) LangGraph with a two-node graph. Report per-turn overhead in milliseconds and identify where the framework spends it.

  7. Stretch: Design and implement a framework-ejection layer — a thin adapter interface that lets your agent code swap between the Anthropic Claude Agent SDK, the OpenAI Agents SDK, and a hand-rolled loop without changing any tool implementations or guardrail logic. Define the minimal interface, write a concrete adapter for each backend, and demonstrate with a three-tool agent that runs identically on all three.