Function calling (or tool use, depending on which API you are using) is the feature that transforms LLMs from text generators into agents capable of taking real-world actions. I have implemented function calling integrations with both OpenAI and Anthropic APIs for production systems — ERP data lookups, external API calls, database writes, and multi-step orchestration. The APIs look similar on the surface but have meaningful differences in how they handle parallel calls, required vs optional tool use, and error reporting. This post compares them concretely and covers the production patterns that matter.
OpenAI's function calling (now called 'tools' in the v2 API) and Anthropic's tool use are both based on the same concept: you define a set of functions with JSON Schema descriptions, the model decides when to call them, and you execute the calls and return results. The philosophical difference is in control. OpenAI offers tool_choice: 'required' (force the model to use a tool), 'auto' (model decides), or specific tool forcing. Anthropic offers tool_choice with 'auto', 'any' (must use at least one tool), and specific tool forcing. Both support parallel tool calls, but Anthropic's implementation requires careful handling of the content block structure.
Anthropic's tool use returns results as content blocks rather than a separate function_call field. The response may contain mixed text and tool_use blocks. Your code must iterate through content blocks, identify tool_use blocks, execute each, and return tool_result blocks in the next message. This structure enables the model to naturally interleave explanation text with tool calls — a UX advantage, but it requires more careful response parsing than OpenAI's simpler structure.
┌─────────────────────────────────────────────────────────────┐
│ Parallel Tool Call Flow (Anthropic) │
│ │
│ User: "What is the status of order #123 and invoice #456?" │
│ │ │
│ ▼ │
│ Model returns TWO tool_use blocks in one response: │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ get_order_status │ │ get_invoice_status │ │
│ │ id: "tool_abc" │ │ id: "tool_def" │ │
│ │ order_id: "123" │ │ invoice_id: "456" │ │
│ └─────────┬──────────┘ └──────────┬─────────┘ │
│ │ parallel execution │ │
│ ▼ ▼ │
│ ┌────────────────────────────────────────────┐ │
│ │ Promise.all([getOrder(123), getInv(456)]) │ │
│ └────────────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ Return BOTH tool_result blocks → Model synthesizes answer │
└─────────────────────────────────────────────────────────────┘From my experience building multi-tool agents: implement a generic tool execution dispatcher from the start, not per-tool if/else chains. Map tool names to handler functions in a dictionary, validate inputs against the schema before executing, and wrap every handler in try/catch with structured error returns. This pattern scales to dozens of tools without code duplication and makes adding new tools a one-line registration.
Both APIs support parallel tool calls — the model returns multiple tool_call/tool_use blocks in a single response, and you execute them concurrently before returning all results. This dramatically reduces latency for operations that can run in parallel: fetching data from multiple tables, calling multiple external APIs, or running independent computations. Without parallel tool calls, an agent needing 3 data sources would require 3 round trips. With parallel calls, it is 1 round trip for tool selection plus 1 for results — roughly 60% latency reduction.
Production tool execution will fail. Network timeouts, database errors, external API rate limits — all of these must be communicated back to the model so it can handle them gracefully. Return a structured error in the tool result rather than propagating exceptions. Include: error_type, message, and a retry_suggested boolean. This lets the model decide whether to retry, use an alternative tool, or explain the failure to the user. Implement exponential backoff for transient failures before returning an error to the model.
import Anthropic from "@anthropic-ai/sdk"
const client = new Anthropic()
// Generic tool dispatcher
const toolHandlers: Record<string, (params: unknown) => Promise<unknown>> = {
get_order: async (p) => fetchOrder(p as { order_id: string }),
get_invoice: async (p) => fetchInvoice(p as { invoice_id: string }),
list_products: async (p) => fetchProducts(p as { category?: string }),
}
// Agentic loop with parallel tool execution
async function runAgent(userMessage: string, maxIterations = 10) {
const messages: Anthropic.Messages.MessageParam[] = [
{ role: "user", content: userMessage }
]
for (let i = 0; i < maxIterations; i++) {
const response = await client.messages.create({
model: "claude-opus-4-5",
max_tokens: 4096,
tools: Object.keys(toolHandlers).map(name => ({
name,
description: toolDescriptions[name],
input_schema: toolSchemas[name],
})),
messages,
})
// Collect all tool_use blocks
const toolUseBlocks = response.content.filter(b => b.type === "tool_use")
if (toolUseBlocks.length === 0 || response.stop_reason === "end_turn") {
// No more tool calls — return final text response
const textBlock = response.content.find(b => b.type === "text")
return textBlock?.type === "text" ? textBlock.text : ""
}
// Execute all tool calls in PARALLEL
const toolResults = await Promise.all(
toolUseBlocks.map(async (block) => {
if (block.type !== "tool_use") return null
try {
const handler = toolHandlers[block.name]
if (!handler) throw new Error(`Unknown tool: ${block.name}`)
const result = await handler(block.input)
return { type: "tool_result" as const, tool_use_id: block.id, content: JSON.stringify(result) }
} catch (err) {
return {
type: "tool_result" as const,
tool_use_id: block.type === "tool_use" ? block.id : "",
content: JSON.stringify({ error: String(err), retry_suggested: true }),
is_error: true,
}
}
})
)
// Append assistant response + all tool results to messages
messages.push({ role: "assistant", content: response.content })
messages.push({ role: "user", content: toolResults.filter(Boolean) as Anthropic.Messages.ToolResultBlockParam[] })
}
throw new Error("Max iterations reached")
}A powerful pattern that is often overlooked: use tool calling purely for structured output extraction, without any actual function execution. Define a tool with the schema of the structure you want to extract from text, set tool_choice to force that specific tool, and the model will always return a valid JSON structure matching your schema. This is more reliable than asking the model to return JSON in its text response — tool use schemas enforce structure at the API level.
Tool definitions count as input tokens on every API call, even when the model does not use those tools. A set of 20 detailed tool definitions can add 2,000-5,000 tokens per call. At GPT-4o pricing, that is $0.005-0.012 added per call — negligible at low volume but significant at scale. Optimization: use tool routing to only send relevant tool definitions based on the user's message intent. A pre-classification call with a cheap model (GPT-4o-mini, Claude Haiku) can route to the appropriate tool subset before calling the expensive model.
Complex agents require multiple rounds of tool use. The pattern is an agentic loop: call model, check for tool use, execute tools, append results, call model again, repeat until the model returns a text-only response. Implement a maximum iteration limit (typically 10-15 steps) to prevent infinite loops. Log each iteration with the full state for debugging. For long-running tasks, persist the conversation state to a database between iterations so the agent can be interrupted and resumed.
For new projects, I recommend Anthropic's tool use when you need the most capable model (Claude Opus 4) or when your use case benefits from the model naturally mixing explanation with tool calls. Use OpenAI's function calling when you need the widest ecosystem compatibility (most agent frameworks target OpenAI first), when you need JSON mode for strict structured output, or when you are using fine-tuned models. In practice, the patterns are similar enough that building an abstraction layer that supports both is worth the upfront investment for any system that might need to switch models.
Sources & Further Reading