Why does the quality of tool descriptions matter so much for Claude's tool use?

Claude relies on your tool descriptions and parameter schemas to decide when and how to call a tool. A vague description causes the model to guess argument values, which lowers accuracy. Adding enum constraints, min/max array bounds, and concrete examples in the description field lets the model self-correct and significantly improves call success rates.

What is the 'done' tool pattern and why is it useful in multi-tool agentic flows?

The 'done' tool is a sentinel tool the agent calls when it has finished its task, taking a 'summary' parameter describing what was accomplished. It solves the problem of knowing when an agentic loop should stop without relying on the model returning a free-text response. This pattern is recommended specifically for complex multi-tool workflows with the Claude API.

How should errors from tool execution be returned to Claude?

Both invalid-argument errors (caught by schema validation) and tool execution failures should be fed back to the model as a tool_result message with is_error set to true. This allows Claude to reason about what went wrong and attempt a corrected tool call or provide a useful response to the user.

Can parallel tool calls cause problems, and how can they be avoided?

Yes — Claude can return multiple tool calls in a single response, and executing them in parallel can cause race conditions when tools share state or have side effects. In AI Gymbro, two simultaneous log_workout calls created duplicate records because both read the same empty state before either had written. The fix is either enforcing sequential execution or designing tools to be idempotent.

How much do tool definitions cost in tokens, and how can that overhead be reduced?

Each tool definition with a detailed schema and description costs roughly 100–300 tokens, adding up to thousands of tokens of overhead per request when many tools are defined. The post reports an average of 180 tokens per tool definition in AI Gymbro. Implementing a context-aware tool selector that only includes relevant tools for each request reduced this overhead by 60%.

Claude API Tool Use in Production: A Real-World Guide

October 202510 min read

I've been using Claude's tool use in production for my AI Gymbro fitness app for several months. The Anthropic documentation is good, but there's a gap between the tutorial examples and the messiness of real production.

Tool Schema Design: The Details That Matter

Claude's tool use works by providing the model with a JSON Schema describing available tools. The quality of your tool descriptions is the single most important factor in tool call accuracy. A vague description like 'logs a workout' leads to the model guessing argument values.

Parameter Schema Best Practices

Use enum types wherever possible to constrain the model's choices. Use minItems/maxItems on arrays to prevent the model from generating empty or excessively long arrays. Add examples in the description field — the model uses these to self-correct when unsure of the correct format.

The tool_choice Parameter

Claude's API supports a tool_choice parameter that lets you control whether the model must use a tool. For agentic flows where you expect the model to always call a specific tool to return structured data, use tool_choice with the specific tool name. This eliminates the possibility of the model returning a free-text response instead of a tool call.

Claude Tool Use Flow

  User: "Log 3 sets of 8 reps bench press at 80kg"
        │
        ▼
  ┌──────────────────────────────────────┐
  │  Claude receives:                    │
  │  - User message                      │
  │  - Tool definitions (JSON Schema)    │
  │  - System prompt (cached)            │
  └──────────────────┬───────────────────┘
                     │
                     ▼
  ┌──────────────────────────────────────┐
  │  Claude returns:                     │
  │  stop_reason: "tool_use"             │
  │  content: [{                         │
  │    type: "tool_use",                 │
  │    name: "log_workout_set",          │
  │    input: {                          │
  │      exercise: "Barbell Bench Press",│
  │      sets: 3, reps: 8, weight_kg: 80│
  │    }                                 │
  │  }]                                  │
  └──────────────────┬───────────────────┘
                     │
                     ▼
  ┌──────────────────────────────────────┐
  │  Your App: Execute Tool              │
  │  - Validate args (enum, range)       │
  │  - Write to database                 │
  │  - Return tool_result                │
  └──────────────────┬───────────────────┘
                     │
                     ▼
  ┌──────────────────────────────────────┐
  │  Second Claude call with result      │
  │  → Natural language confirmation     │
  │  "Logged 3x8 Barbell Bench Press..."│
  └──────────────────────────────────────┘

For complex multi-tool workflows in Claude API, I define a 'done' tool that the agent calls when it has finished its task. This tool takes a 'summary' parameter describing what was accomplished. This pattern solves the 'how do I know when the agent is done' problem cleanly.

Error Handling for Tool Calls

Tool calls fail for two reasons: the model calls a tool with invalid arguments (your schema validation catches this), or the tool execution itself fails. Both cases require returning a result back to the model with the error information — done by returning a tool_result with is_error: true.

Streaming with Tool Use

Claude's streaming API sends tool use events as they're generated, but tool call arguments arrive in fragments. Building a streaming parser requires buffering the argument stream and only triggering tool execution once the stop_reason: 'tool_use' event arrives. Anthropic's official SDK handles this buffering automatically.

// Claude tool definition with precise schema
const tools = [
  {
    name: "log_workout_set",
    description:
      "Records a single completed exercise set to the user's workout log. " +
      "Call this ONCE per set, not once per exercise. " +
      "Use the exact exercise name from our library, e.g. 'Barbell Back Squat'.",
    input_schema: {
      type: "object",
      properties: {
        exercise_name: {
          type: "string",
          description: "Exercise name from library, e.g. 'Barbell Bench Press', 'Cable Row'",
        },
        reps: { type: "integer", minimum: 1, maximum: 100 },
        weight_kg: { type: "number", minimum: 0, maximum: 1000 },
        set_type: {
          type: "string",
          enum: ["working", "warmup", "dropset", "failure"],
          description: "Type of set — default 'working' if not specified",
        },
      },
      required: ["exercise_name", "reps"],
    },
  },
  {
    name: "done",
    description: "Call when you have finished all requested actions.",
    input_schema: {
      type: "object",
      properties: {
        summary: { type: "string", description: "Brief summary of what was accomplished" },
      },
      required: ["summary"],
    },
  },
]

// Handle tool call errors gracefully
async function executeTool(name: string, args: Record<string, unknown>) {
  try {
    const result = await toolHandlers[name](args)
    return { type: "tool_result", content: JSON.stringify(result), is_error: false }
  } catch (error) {
    return {
      type: "tool_result",
      content: JSON.stringify({
        error: String(error),
        suggestion: "Try using a different exercise name or check the arguments",
      }),
      is_error: true,
    }
  }
}

Token Cost of Tool Use

Tool definitions contribute to your input token count. A typical tool with a detailed schema and description costs 100-300 tokens. If you have 10 tools, that's 1,000-3,000 tokens of overhead on every request. I implemented a context-aware tool selector that reduced my tool overhead by 60%.

Claude sometimes returns multiple tool calls in a single response. If your tool implementations have side effects or read from shared state, parallel execution can cause race conditions. I learned this when two parallel log_workout calls created duplicate workout records because both read the same empty workout state before either had written. Either enforce sequential execution or design tools to be idempotent.

Testing Tool-Heavy Applications

Testing LLM applications that rely on tool calls requires a two-layer strategy: mock tests that stub the LLM and verify tool execution logic; and integration tests that use a real LLM call against test conversations with known correct tool call sequences.

Real Production Metrics from AI Gymbro

After six months of Claude tool use in production: tool call success rate is 94.2% on first attempt, 98.8% after one retry. Average tokens per tool definition: 180. Average tool calls per user session: 4.3. Most common failure mode: the model calls a tool with a valid schema but semantically wrong arguments.

Sources & Further Reading

Tool Schema Design: The Details That Matter

Parameter Schema Best Practices

The tool_choice Parameter

Claude Tool Use Flow User: "Log 3 sets of 8 reps bench press at 80kg" │ ▼ ┌──────────────────────────────────────┐ │ Claude receives: │ │ - User message │ │ - Tool definitions (JSON Schema) │ │ - System prompt (cached) │ └──────────────────┬───────────────────┘ │ ▼ ┌──────────────────────────────────────┐ │ Claude returns: │ │ stop_reason: "tool_use" │ │ content: [{ │ │ type: "tool_use", │ │ name: "log_workout_set", │ │ input: { │ │ exercise: "Barbell Bench Press",│ │ sets: 3, reps: 8, weight_kg: 80│ │ } │ │ }] │ └──────────────────┬───────────────────┘ │ ▼ ┌──────────────────────────────────────┐ │ Your App: Execute Tool │ │ - Validate args (enum, range) │ │ - Write to database │ │ - Return tool_result │ └──────────────────┬───────────────────┘ │ ▼ ┌──────────────────────────────────────┐ │ Second Claude call with result │ │ → Natural language confirmation │ │ "Logged 3x8 Barbell Bench Press..."│ └──────────────────────────────────────┘

Error Handling for Tool Calls

Streaming with Tool Use

// Claude tool definition with precise schema const tools = [ { name: "log_workout_set", description: "Records a single completed exercise set to the user's workout log. " + "Call this ONCE per set, not once per exercise. " + "Use the exact exercise name from our library, e.g. 'Barbell Back Squat'.", input_schema: { type: "object", properties: { exercise_name: { type: "string", description: "Exercise name from library, e.g. 'Barbell Bench Press', 'Cable Row'", }, reps: { type: "integer", minimum: 1, maximum: 100 }, weight_kg: { type: "number", minimum: 0, maximum: 1000 }, set_type: { type: "string", enum: ["working", "warmup", "dropset", "failure"], description: "Type of set — default 'working' if not specified", }, }, required: ["exercise_name", "reps"], }, }, { name: "done", description: "Call when you have finished all requested actions.", input_schema: { type: "object", properties: { summary: { type: "string", description: "Brief summary of what was accomplished" }, }, required: ["summary"], }, }, ] // Handle tool call errors gracefully async function executeTool(name: string, args: Record<string, unknown>) { try { const result = await toolHandlers[name](args) return { type: "tool_result", content: JSON.stringify(result), is_error: false } } catch (error) { return { type: "tool_result", content: JSON.stringify({ error: String(error), suggestion: "Try using a different exercise name or check the arguments", }), is_error: true, } } }

Token Cost of Tool Use

Frequently Asked Questions

Claude API Tool Use in Production: A Real-World Guide

Frequently Asked Questions

Claude API Tool Use in Production: A Real-World Guide

Tool Schema Design: The Details That Matter

Parameter Schema Best Practices

The tool_choice Parameter

Error Handling for Tool Calls

Streaming with Tool Use

Token Cost of Tool Use

Testing Tool-Heavy Applications

Real Production Metrics from AI Gymbro

Sources & Further Reading

Related Articles

Tool Schema Design: The Details That Matter

Parameter Schema Best Practices

The tool_choice Parameter

Error Handling for Tool Calls

Streaming with Tool Use

Token Cost of Tool Use

Testing Tool-Heavy Applications

Real Production Metrics from AI Gymbro

Sources & Further Reading

Related Articles