Chapter 66: Tool calling and function calling: the wire protocols

We open Part V — Agents, Tool Use, and Workflow Orchestration — with the foundational mechanism: tool calling. This is the technique that lets an LLM invoke external functions (call APIs, query databases, run code) instead of just producing text. It’s the foundation of every agent, every plugin system, every “AI assistant that actually does something.”

By the end of this chapter you’ll know:

How tool calling is implemented at the wire-protocol level.
The major protocols (OpenAI function calling, Anthropic tool use, MCP).
How to define a tool schema.
How to handle tool calls in your application.
How parallel tool calls work.
The structured-output guarantees from Chapter 43 applied here.

Outline:

The tool-calling problem.
The basic mechanism.
OpenAI function calling.
Anthropic tool use.
The MCP protocol.
Tool schemas in detail.
Parallel tool calls.
Structured generation for tool calls.
Error handling and retries.

66.1 The tool-calling problem

A plain LLM can produce text. But text is the wrong output for many real tasks:

“What’s the weather in Tokyo?” — the LLM doesn’t know; it needs to call a weather API.
“Send a message to John saying I’ll be late” — the LLM doesn’t have access to messaging; it needs to call an MCP-style messaging tool.
“Search the database for orders from Q3” — the LLM doesn’t have database access; it needs to call a SQL query tool.
“Run this Python code and tell me the result” — the LLM doesn’t have an interpreter; it needs to call a code execution tool.

For all of these, the LLM’s job isn’t to answer the question — it’s to decide which tool to call and with what arguments. The actual work is done by the tool (the weather API, the messaging system, the database, the interpreter). The LLM is the orchestrator.

This is tool calling (or function calling). The model emits structured output describing a function invocation, the application code parses that output, calls the function, and returns the result to the model. The model then either calls another tool or produces a final answer.

This is the foundation of every agent system. Without tool calling, an LLM is just a text generator. With it, an LLM can interact with the world.

Tool calling inserts a tool-invocation round-trip between model and world; without it the LLM can only hallucinate answers to factual questions.

66.2 The basic mechanism

The flow:

Define the tools the model can call. Each tool has a name, a description, and a JSON schema for its arguments.
Send the tools to the model as part of the system prompt or via a structured API.
The model decides whether to call a tool. If it does, it emits a structured output that says “call function X with arguments Y.”
The application parses the output, calls the function, and gets the result.
The application sends the result back to the model as a “tool result” message.
The model incorporates the result into its next response, which may be another tool call or a final answer to the user.

The tool-calling round-trip is always six steps; the model never directly executes functions — it only emits structured call descriptions that application code carries out.

The key technical challenge: the model has to emit structured output reliably. This is where the structured generation techniques from Chapter 43 come in. Modern serving stacks (vLLM, Anthropic, OpenAI) use guided decoding to ensure the model emits valid tool-call JSON 100% of the time, not 95%.

The wire protocol specifies how tools are described and how tool calls are encoded. The major standards:

OpenAI function calling — the original, widely adopted.
Anthropic tool use — slightly different format, similar semantics.
MCP (Model Context Protocol) — Anthropic’s open standard for tool integration.

We’ll cover each in detail.

66.3 OpenAI function calling

OpenAI introduced function calling in mid-2023 as part of the chat completions API. The format:

You define functions (tools) with a JSON schema:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"]
                }
            },
            "required": ["location"]
        }
    }
}]

You pass these to the chat completion API:

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"  # or "required" to force a tool call
)

The model decides whether to call a tool. If it does, the response contains:

response.choices[0].message.tool_calls = [{
    "id": "call_abc123",
    "type": "function",
    "function": {
        "name": "get_weather",
        "arguments": '{"location": "Tokyo, Japan", "unit": "celsius"}'
    }
}]

The arguments field is a JSON string — it has to be parsed to get the actual arguments dict.

You parse the call, run the function, and send the result back:

result = get_weather(location="Tokyo, Japan", unit="celsius")
# returns: {"temperature": 22, "conditions": "sunny"}

messages.append({
    "role": "tool",
    "tool_call_id": "call_abc123",
    "content": json.dumps(result)
})

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

Now the model has the tool result and produces a final response: “It’s currently 22°C and sunny in Tokyo.”

That’s the OpenAI function calling protocol. It’s standardized and supported by every OpenAI-compatible API (vLLM, OpenRouter, Together, etc.).

66.4 Anthropic tool use

Anthropic’s protocol is similar but with different field names. Tools are defined as:

tools = [{
    "name": "get_weather",
    "description": "Get the current weather in a given location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string"},
            "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
        },
        "required": ["location"]
    }
}]

Sent to the messages API:

response = anthropic.messages.create(
    model="claude-3-5-sonnet",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

The response contains tool use blocks:

response.content = [
    {
        "type": "tool_use",
        "id": "toolu_abc123",
        "name": "get_weather",
        "input": {"location": "Tokyo, Japan", "unit": "celsius"}
    }
]

Note that input is a dict, not a JSON string (an improvement over OpenAI’s format).

You append the tool result:

messages.append({
    "role": "user",
    "content": [{
        "type": "tool_result",
        "tool_use_id": "toolu_abc123",
        "content": json.dumps(result)
    }]
})

The next model call produces the final answer.

The two protocols are functionally equivalent. The differences are stylistic (input as dict vs JSON string, naming conventions). Most agent frameworks abstract over both.

66.5 The MCP protocol

MCP (Model Context Protocol) is Anthropic’s open standard for tool integration, released in late 2024. It’s not just a wire format — it’s a protocol for connecting LLMs to external systems in a standardized way.

The pitch: instead of every application implementing its own tool integration, a tool provider can implement the MCP server interface once and any MCP-compatible client (Claude Desktop, agent frameworks, custom apps) can use it.

We’ll cover MCP in detail in Chapter 69. For this chapter, the key point: MCP defines how a tool is exposed to a model, including:

The tool’s name and description.
The input schema.
How the tool result is returned.
Optional resources, prompts, and sampling primitives.

MCP servers can be run locally (stdio transport) or remotely (HTTP transport). They can be implemented in any language. They expose tools that any MCP client can discover and call.

The protocol is gaining adoption in 2025 as the standard way to extend LLMs with external capabilities. Every major agent framework now has MCP support.

66.6 Tool schemas in detail

The schema you provide for a tool matters more than people think. The model uses the schema to understand what the tool does, when to call it, and how to format the arguments. A bad schema produces bad tool calls.

The components of a good tool schema:

(1) A clear, descriptive name. get_weather is clear; func1 is not. The name appears in the model’s output and influences its decisions.

(2) A complete description. What does the tool do? When should it be called? What does it return? The description is the model’s primary guide. Be explicit:

"Get the current weather conditions in a specific location. 
Use this when the user asks about weather, temperature, or atmospheric 
conditions in any city. Returns temperature, humidity, and conditions."

(3) Typed parameters with descriptions. Each parameter should have a type (string, integer, boolean, array, object) and a description. Example:

"location": {
    "type": "string",
    "description": "The city and state or country, e.g. 'San Francisco, CA' or 'Tokyo, Japan'"
}

The description tells the model what to put in the field.

(4) Constraints and enums. If a parameter has a fixed set of valid values, use enum:

"unit": {
    "type": "string",
    "enum": ["celsius", "fahrenheit"]
}

This both communicates the constraint to the model and lets the structured generation enforce it.

(5) Required vs optional fields. Mark required fields as required. Optional fields the model can omit.

(6) Nested objects. Complex tools can have nested object parameters. JSON Schema supports this; the model handles it.

(7) Default values where useful. Some tools have sensible defaults (e.g., unit: "celsius"). The schema can document them.

The schema is the contract between your code and the LLM. Spend time on it. A good schema can mean the difference between a tool that works 95% of the time and one that works 99.9%.

graph TD
  T[Tool definition]
  T --> N[name: clear verb-noun]
  T --> D[description: when to call + what it returns]
  T --> P[parameters: typed + described]
  P --> E[enum for fixed values]
  P --> R[required vs optional flags]
  style D fill:var(--fig-accent-soft),stroke:var(--fig-accent)

The description field has the highest leverage: a vague description causes the model to call the wrong tool or skip the right one — writing it as “when to call” prose rather than “what it does” significantly improves selection accuracy.

66.7 Parallel tool calls

Modern LLMs can request multiple tool calls in a single response. The model decides “I need to call get_weather and also call get_news” and emits both calls simultaneously. The application can then run both in parallel and return both results in the next message.

OpenAI’s protocol returns a list of tool calls:

response.choices[0].message.tool_calls = [
    {"id": "call_1", "function": {"name": "get_weather", "arguments": ...}},
    {"id": "call_2", "function": {"name": "get_news", "arguments": ...}}
]

The application runs them concurrently:

import asyncio
async def run_tool(tool_call):
    # ... call the function
    return result

results = await asyncio.gather(*[run_tool(tc) for tc in tool_calls])

And appends both results to the message history before the next model call.

Parallel tool calls are a real performance improvement for agent loops. Without them, the agent has to do a round-trip per tool call, adding LLM latency between each. With parallel calls, multiple independent tool calls happen in one round trip.

Parallel tool calls reduce wall-clock latency from 3 LLM round-trips to 2 by running independent tools concurrently; the model chooses which calls can be batched.

The model decides which calls can be parallelized. It correctly bundles independent calls (weather + news) and serializes dependent calls (search → fetch the result of the search). This is part of why modern LLMs are noticeably more useful for agent work than older ones.

66.8 Structured generation for tool calls

Tool calling is the most important production application of structured generation (Chapter 43). When you ask a model to emit a tool call, you need the JSON to be valid 100% of the time, matching the schema you defined. Naive prompting gets you 95-99%; the remaining failures cause production bugs.

The fix: use guided decoding (Outlines, XGrammar) to enforce the schema at the sampler level. The model literally cannot emit invalid JSON because invalid tokens are masked out.

Modern serving stacks have this built in. vLLM’s guided_json parameter takes a schema and enforces it. OpenAI’s tool calling API uses internal guided decoding. Anthropic’s tool use API does the same.

The result: tool calls are always parseable. You don’t have to write parsing error handlers for malformed JSON. The model is constrained to produce only valid output.

For self-hosted serving, make sure your runtime supports guided decoding for tool calls. vLLM since v0.4 does this; SGLang since v0.2; TensorRT-LLM has its own implementation.

66.9 Error handling and retries

Tool calls fail. The function might error, the API might be down, the arguments might be wrong. The application has to handle errors gracefully.

The standard pattern: catch errors and return them as the tool result. The model then sees the error and can decide what to do (retry with different arguments, try a different tool, give up).

try:
    result = get_weather(**tool_call.arguments)
except Exception as e:
    result = {"error": str(e)}

messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps(result)
})

The model is good at handling errors when they’re returned this way. It will often retry with corrected arguments, or use a different tool, or give up gracefully.

The bad pattern: silently fail and don’t tell the model. The model thinks the tool succeeded, produces a final answer based on imaginary data, and the user gets garbage.

A few specific error patterns:

Argument validation errors. The tool expects a specific format (e.g., a date in YYYY-MM-DD), and the model gives it something else. Return a clear error message; the model will retry with the right format.

Authentication errors. The tool requires permissions the user doesn’t have. Return the error; the model will tell the user.

Rate limit errors. The tool has been called too many times. Return the error with a retry-after; the model can tell the user to wait.

Tool unavailable. The backend is down. Return the error; the model can either retry, use a fallback, or tell the user.

Timeout. The tool didn’t return in time. Return a timeout error; the model can retry or give up.

The pattern is always: surface the error to the model, let the model decide what to do. The model is generally smarter than your hand-written error handling logic.

66.10 The mental model

Eight points to take into Chapter 67:

Tool calling turns LLMs from text generators into systems that do things.
The basic flow: define tools → model decides → emits tool call → application runs the tool → returns result → model continues.
OpenAI function calling and Anthropic tool use are the dominant protocols. Functionally equivalent.
MCP is the open standard for tool integration. Covered in Chapter 69.
Tool schemas matter. Clear names, descriptive parameters, constraints. The schema is the contract.
Parallel tool calls let the model bundle independent calls into one round trip.
Structured generation ensures tool calls are always parseable. Use it.
Surface errors to the model. Don’t silently fail.

In Chapter 67 we look at the agent loop — how tool calling is composed into multi-step reasoning systems.

Read it yourself

The OpenAI function calling documentation.
The Anthropic tool use documentation.
The MCP specification (modelcontextprotocol.io).
The vLLM documentation on tool calling.
Examples of tool calling in the OpenAI cookbook.

Practice

Define a tool schema for a calculator function (add, subtract, multiply, divide).
Why is the description field of a tool schema critical? Construct a case where a vague description leads to wrong tool selection.
Implement a simple tool calling loop with the OpenAI API and a fake weather function.
Why are parallel tool calls a performance improvement? Trace through the latency of serial vs parallel.
How does structured generation guarantee valid tool calls? Connect to Chapter 43.
What’s the right way to handle a tool error? Why is silent failure bad?
Stretch: Build a tool calling agent with three tools (weather, news, calculator) and test it on a multi-step query that requires calling multiple tools.