Skip to content

05.4 Tool Calling Architecture

Nikolay Vyahhi edited this page Feb 19, 2026 · 3 revisions

Tool Calling Architecture

Relevant source files

The following files were used as context for generating this wiki page:

Purpose and Scope

This document describes ZeroClaw's tool calling architecture, which enables LLM providers to invoke agent capabilities (shell commands, file operations, memory access, etc.) through both native API primitives and prompt-guided fallbacks. The architecture supports 28+ providers with varying levels of tool calling support, from fully native (OpenAI, Anthropic, Gemini) to prompt-guided (Ollama, custom endpoints).

For information about the tool registry and individual tool implementations, see Tools. For provider-specific authentication and configuration, see Providers.


Two-Mode Architecture

ZeroClaw's tool calling system operates in two modes based on provider capabilities:

  1. Native Tool Calling: Providers like OpenAI, Anthropic, and Gemini use API primitives (function calling, tool_use blocks, functionDeclarations) to handle tools. The provider returns structured ToolCall objects with IDs that enable precise multi-turn conversations.

  2. Prompt-Guided Tool Calling: Providers without native support receive tool documentation injected into the system prompt. The agent loop parses tool invocations from the LLM's text response using XML tags, JSON blocks, or provider-specific formats.

Sources: src/providers/traits.rs:195-227, src/agent/loop_.rs:873-890


Provider Capability Declaration

The Provider Trait

Every provider implements the Provider trait, which includes capability declaration methods:

fn supports_native_tools(&self) -> bool
fn capabilities(&self) -> ProviderCapabilities
fn convert_tools(&self, tools: &[ToolSpec]) -> ToolsPayload

The ProviderCapabilities struct declares whether a provider supports native tool calling:

pub struct ProviderCapabilities {
    pub native_tool_calling: bool,
}

Sources: src/providers/traits.rs:195-227, src/providers/traits.rs:229-249

Capability Query Flow

flowchart TD
    AgentLoop["Agent Loop<br/>(run_tool_call_loop)"]
    QueryCaps["provider.supports_native_tools()"]
    NativeCheck{native_tool_calling?}
    
    NativeMode["Native Mode:<br/>Pass tools via ChatRequest.tools"]
    PromptMode["Prompt-Guided Mode:<br/>Inject tools into system prompt"]
    
    ConvertTools["provider.convert_tools(specs)"]
    NativePayload["ToolsPayload::OpenAI<br/>ToolsPayload::Anthropic<br/>ToolsPayload::Gemini"]
    PromptPayload["ToolsPayload::PromptGuided<br/>{instructions}"]
    
    AgentLoop --> QueryCaps
    QueryCaps --> NativeCheck
    
    NativeCheck -->|true| NativeMode
    NativeCheck -->|false| PromptMode
    
    NativeMode --> ConvertTools
    PromptMode --> ConvertTools
    
    ConvertTools -->|native_tool_calling=true| NativePayload
    ConvertTools -->|native_tool_calling=false| PromptPayload
Loading

Sources: src/agent/loop_.rs:873-890, src/providers/traits.rs:244-249


Native Tool Calling Flow

OpenAI/OpenRouter Native Format

OpenAI and OpenRouter use the OpenAI function calling format:

{
  "model": "gpt-4",
  "messages": [...],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "shell",
        "description": "Execute a shell command",
        "parameters": { "type": "object", "properties": {...} }
      }
    }
  ]
}

The LLM response includes structured tool_calls:

{
  "choices": [{
    "message": {
      "content": "Let me check the date.",
      "tool_calls": [
        {
          "id": "call_abc123",
          "type": "function",
          "function": {
            "name": "shell",
            "arguments": "{\"command\": \"date\"}"
          }
        }
      ]
    }
  }]
}

Sources: src/providers/openai.rs:57-77, src/providers/openai.rs:92-105, src/providers/openrouter.rs:44-76

Anthropic Native Format

Anthropic uses content blocks with tool_use and tool_result types:

{
  "model": "claude-3-5-sonnet-20241022",
  "messages": [...],
  "tools": [
    {
      "name": "shell",
      "description": "Execute a shell command",
      "input_schema": { "type": "object", "properties": {...} }
    }
  ]
}

Response with tool calls:

{
  "content": [
    {
      "type": "text",
      "text": "Let me check the date."
    },
    {
      "type": "tool_use",
      "id": "toolu_123",
      "name": "shell",
      "input": { "command": "date" }
    }
  ]
}

Tool results are sent back as user messages with tool_result blocks:

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_123",
      "content": "Fri Jan 10 14:30:00 UTC 2025"
    }
  ]
}

Sources: src/providers/anthropic.rs:45-95, src/providers/anthropic.rs:126-145, src/providers/anthropic.rs:209-230

Gemini Native Format

Gemini uses functionDeclarations in the tools field:

{
  "model": "models/gemini-2.0-flash-exp",
  "contents": [...],
  "tools": [
    {
      "functionDeclarations": [
        {
          "name": "shell",
          "description": "Execute a shell command",
          "parameters": { "type": "OBJECT", "properties": {...} }
        }
      ]
    }
  ]
}

Sources: src/providers/gemini.rs:1-11 (Note: Full native tool calling support is referenced but implementation details are in the file)

Message History Encoding

For providers with native tool calling, the agent loop stores tool calls and results in a structured format that preserves call IDs:

Assistant message with tool calls:

{
  "content": "Let me check that for you.",
  "tool_calls": [
    {
      "id": "call_abc123",
      "name": "shell",
      "arguments": "{\"command\": \"date\"}"
    }
  ]
}

Tool result message:

{
  "tool_call_id": "call_abc123",
  "content": "Fri Jan 10 14:30:00 UTC 2025"
}

This JSON is stored in the content field of ChatMessage and parsed by provider-specific message converters.

Sources: src/agent/loop_.rs:764-787, src/agent/loop_.rs:789-808, src/providers/openai.rs:169-234, src/providers/anthropic.rs:232-261


Prompt-Guided Tool Calling

System Prompt Injection

When supports_native_tools() returns false, the agent loop injects tool documentation into the system prompt:

// From Provider::chat default implementation
if let Some(tools) = request.tools {
    if !tools.is_empty() && !self.supports_native_tools() {
        let tool_instructions = match self.convert_tools(tools) {
            ToolsPayload::PromptGuided { instructions } => instructions,
            _ => bail!("Expected PromptGuided payload")
        };
        // Inject into system message
        system_message.content.push_str("\n\n");
        system_message.content.push_str(&tool_instructions);
    }
}

The build_tool_instructions_text function generates documentation like:

You have access to the following tools:

## shell
Execute a shell command in the workspace.
Parameters:
- command (string, required): The shell command to execute

To use a tool, respond with:
<tool_call>
{"name": "shell", "arguments": {"command": "date"}}
</tool_call>

Sources: src/providers/traits.rs:305-327, src/providers/traits.rs:362-397


Tool Call Parsing Strategies

The agent loop parses tool calls from LLM responses using multiple strategies, tried in order:

Strategy 1: Native JSON with tool_calls Array

Minimax and some OpenAI-compatible providers return tool calls in native JSON format:

{
  "content": "Let me check that.",
  "tool_calls": [
    {
      "id": "call_123",
      "function": {
        "name": "shell",
        "arguments": "{\"command\": \"date\"}"
      }
    }
  ]
}

Sources: src/agent/loop_.rs:600-613, src/agent/loop_.rs:289-316

Strategy 2: XML-Style Tags

The primary prompt-guided format uses XML-style tags. Supported tags: <tool_call>, <toolcall>, <tool-call>, <invoke>:

Let me check the current date.

<tool_call>
{"name": "shell", "arguments": {"command": "date"}}
</tool_call>

The result will show the current time.

The parser extracts JSON from within tags and handles unclosed tags gracefully:

Sources: src/agent/loop_.rs:349-365, src/agent/loop_.rs:582-671

Strategy 3: Markdown Code Blocks

Models behind OpenRouter sometimes output tool calls in markdown code blocks:

Let me check that for you.

```tool_call
{"name": "shell", "arguments": {"command": "date"}}
```

The regex pattern r"(?s)```(?:tool[_-]?call|invoke)\s*\n(.*?)(?:```|</tool[_-]?call>)" matches hybrid formats.

Sources: src/agent/loop_.rs:676-709

Strategy 4: GLM-Style Line Formats

GLM (Zhipu) models use proprietary line-based formats:

browser_open/url>https://example.com
shell/command>ls -la
http_request/url>https://api.example.com

The parser maps aliases (browser_openshell), constructs arguments, and converts to standard format:

Sources: src/agent/loop_.rs:486-580

Parsing Security

The parser does not extract arbitrary JSON from responses to prevent prompt injection attacks. Tool calls must be wrapped in one of the recognized formats:

// SECURITY: We do NOT fall back to extracting arbitrary JSON from the response
// here. That would enable prompt injection attacks where malicious content
// (e.g., in emails, files, or web pages) could include JSON that mimics a
// tool call.

This prevents an attacker from injecting tool call payloads into content that the LLM reads (e.g., email bodies, web pages, file contents).

Sources: src/agent/loop_.rs:732-740


Tool Call Execution Flow

sequenceDiagram
    participant Loop as "run_tool_call_loop"
    participant Provider as "Provider::chat"
    participant Parser as "parse_tool_calls<br/>parse_structured_tool_calls"
    participant Tool as "Tool Registry"
    participant Security as "ApprovalManager"
    
    Loop->>Provider: "chat(messages, tools, model, temp)"
    Provider->>Provider: "Check supports_native_tools()"
    
    alt Native Tools Supported
        Provider->>Provider: "Send tools in API request"
        Provider-->>Loop: "ChatResponse{tool_calls: [...]}"
        Loop->>Parser: "parse_structured_tool_calls(resp.tool_calls)"
    else Prompt-Guided
        Provider->>Provider: "Inject tools into system prompt"
        Provider-->>Loop: "ChatResponse{text: '...'}"
        Loop->>Parser: "parse_tool_calls(resp.text)"
    end
    
    Parser-->>Loop: "Vec<ParsedToolCall>"
    
    loop For each ParsedToolCall
        Loop->>Security: "needs_approval(tool_name)?"
        alt Approval Required
            Security-->>Loop: "Prompt user (CLI) or auto-approve (channels)"
        end
        
        Loop->>Tool: "find_tool(registry, name)"
        Tool-->>Loop: "dyn Tool"
        
        Loop->>Tool: "tool.execute(arguments)"
        Tool-->>Loop: "ToolResult{success, output}"
        
        Loop->>Loop: "scrub_credentials(output)"
        Loop->>Loop: "Append to history"
    end
    
    alt Has Tool Calls
        Loop->>Loop: "Continue loop (next iteration)"
    else Text Only
        Loop->>Loop: "Return final text response"
    end
Loading

Sources: src/agent/loop_.rs:851-1094


Provider-Specific Implementation Details

OpenAI and OpenRouter

Both providers use identical native tool calling formats. Message conversion logic reconstructs tool_calls from JSON-encoded assistant messages:

fn convert_messages(messages: &[ChatMessage]) -> Vec<NativeMessage> {
    messages.iter().map(|m| {
        if m.role == "assistant" {
            if let Ok(value) = serde_json::from_str::<serde_json::Value>(&m.content) {
                if let Some(tool_calls_value) = value.get("tool_calls") {
                    // Parse and reconstruct NativeToolCall structs
                }
            }
        }
        // ... handle tool results similarly
    })
}

Sources: src/providers/openai.rs:169-234, src/providers/openrouter.rs:138-203

Anthropic

Anthropic's message converter parses the native assistant history format to reconstruct tool_use and tool_result content blocks:

fn convert_messages(messages: &[ChatMessage]) -> (Option<SystemPrompt>, Vec<NativeMessage>) {
    for msg in messages {
        match msg.role.as_str() {
            "assistant" => {
                if let Some(blocks) = Self::parse_assistant_tool_call_message(&msg.content) {
                    native_messages.push(NativeMessage {
                        role: "assistant",
                        content: blocks, // Vec<NativeContentOut::ToolUse>
                    });
                }
            }
            "tool" => {
                if let Some(tool_result) = Self::parse_tool_result_message(&msg.content) {
                    native_messages.push(tool_result);
                }
            }
        }
    }
}

Anthropic also supports prompt caching via cache_control fields on tool definitions, system prompts, and conversation messages.

Sources: src/providers/anthropic.rs:232-282, src/providers/anthropic.rs:284-350

OpenAI-Compatible Providers

The OpenAiCompatibleProvider implements native tool calling for 20+ providers (Venice, Groq, Mistral, DeepSeek, xAI, etc.) using the OpenAI function calling format:

fn convert_tools(tools: Option<&[ToolSpec]>) -> Option<Vec<serde_json::Value>> {
    tools.map(|items| {
        items.iter().map(|tool| {
            serde_json::json!({
                "type": "function",
                "function": {
                    "name": tool.name,
                    "description": tool.description,
                    "parameters": tool.parameters
                }
            })
        }).collect()
    })
}

The provider handles responses with reasoning_content fallback for thinking models (Qwen3, GLM-4):

fn effective_content(&self) -> String {
    match &self.content {
        Some(c) if !c.is_empty() => c.clone(),
        _ => self.reasoning_content.clone().unwrap_or_default(),
    }
}

Sources: src/providers/compatible.rs:187-201, src/providers/compatible.rs:234-262

Ollama

Ollama returns false for supports_native_tools(), forcing prompt-guided mode. However, some Ollama models (especially those fine-tuned on tool calling) return native tool_calls in responses. The provider converts these to JSON format that parse_tool_calls understands:

fn format_tool_calls_for_loop(&self, tool_calls: &[OllamaToolCall]) -> String {
    let formatted_calls: Vec<serde_json::Value> = tool_calls.iter().map(|tc| {
        let (tool_name, tool_args) = self.extract_tool_name_and_args(tc);
        serde_json::json!({
            "id": tc.id,
            "type": "function",
            "function": {
                "name": tool_name,
                "arguments": serde_json::to_string(&tool_args).unwrap_or("{}".to_string())
            }
        })
    }).collect();
    
    serde_json::json!({
        "content": "",
        "tool_calls": formatted_calls
    }).to_string()
}

The provider also handles quirky model behavior where tool calls are wrapped:

  • {"name": "tool_call", "arguments": {"name": "shell", "arguments": {...}}}
  • {"name": "tool.shell", "arguments": {...}}

Sources: src/providers/ollama.rs:191-217, src/providers/ollama.rs:219-256, src/providers/ollama.rs:369-375


Tool Call Loop Iteration

The agent loop executes tool calls recursively until the LLM produces a text-only response:

for _iteration in 0..max_iterations {
    let (response_text, parsed_text, tool_calls, assistant_history_content, native_tool_calls) =
        provider.chat(ChatRequest { messages: history, tools: request_tools }, model, temperature).await?;
    
    if tool_calls.is_empty() {
        // No tool calls — final response
        history.push(ChatMessage::assistant(response_text));
        return Ok(display_text);
    }
    
    // Execute each tool call
    let mut tool_results = String::new();
    for call in &tool_calls {
        let result = find_tool(tools_registry, &call.name)?.execute(call.arguments.clone()).await?;
        let scrubbed = scrub_credentials(&result.output);
        tool_results.push_str(&format!("<tool_result name=\"{}\">\n{}\n</tool_result>", call.name, scrubbed));
    }
    
    // Append tool calls and results to history
    history.push(ChatMessage::assistant(assistant_history_content));
    history.push(ChatMessage::user(tool_results));
}

The loop terminates when:

  1. The LLM produces text without tool calls
  2. max_tool_iterations is reached (default: 10)
  3. A tool execution error occurs (non-retryable)

Sources: src/agent/loop_.rs:851-1094, src/agent/loop_.rs:960-1094


Capability Matrix

Provider Native Tools Format Notes
OpenAI OpenAI function calling Supports tool_calls array with IDs
Anthropic tool_use content blocks Supports tool_result blocks with IDs
OpenRouter OpenAI function calling Proxies native tool support from backend models
Gemini functionDeclarations Uses functionCall responses
Venice OpenAI-compatible Via OpenAiCompatibleProvider
Groq OpenAI-compatible Via OpenAiCompatibleProvider
Mistral OpenAI-compatible Via OpenAiCompatibleProvider
DeepSeek OpenAI-compatible Via OpenAiCompatibleProvider
xAI (Grok) OpenAI-compatible Via OpenAiCompatibleProvider
Moonshot (Kimi) OpenAI-compatible Via OpenAiCompatibleProvider
GLM (Zhipu) OpenAI-compatible + GLM-style Dual format support
MiniMax Native JSON tool_calls Parsed via Strategy 1
Ollama Prompt-guided + quirk handling Converts model tool_calls to JSON
Custom Prompt-guided XML tags, markdown blocks

Sources: src/providers/openai.rs:257-377, src/providers/anthropic.rs:387-466, src/providers/openrouter.rs:228-403, src/providers/compatible.rs:552-1061, src/providers/ollama.rs:259-375


History Management and Fidelity

Native vs. Prompt-Guided History

For native tool calling, the agent loop stores assistant messages with structured tool call data:

let assistant_history_content = if resp.tool_calls.is_empty() {
    response_text.clone()
} else {
    build_native_assistant_history(&response_text, &resp.tool_calls)
};
history.push(ChatMessage::assistant(assistant_history_content));

This preserves call IDs and enables proper role: tool responses.

For prompt-guided providers, assistant messages store XML-wrapped tool calls:

fn build_assistant_history_with_tool_calls(text: &str, tool_calls: &[ToolCall]) -> String {
    let mut parts = Vec::new();
    if !text.trim().is_empty() {
        parts.push(text.trim().to_string());
    }
    for call in tool_calls {
        parts.push(format!("<tool_call>\n{}\n</tool_call>", serde_json::to_string(&call)?));
    }
    parts.join("\n")
}

Sources: src/agent/loop_.rs:927-931, src/agent/loop_.rs:789-808

Credential Scrubbing

Tool outputs are scrubbed before being sent back to the LLM to prevent credential exfiltration:

fn scrub_credentials(input: &str) -> String {
    SENSITIVE_KV_REGEX.replace_all(input, |caps: &regex::Captures| {
        let key = &caps[1];
        let val = /* extract value */;
        let prefix = if val.len() > 4 { &val[..4] } else { "" };
        format!("{}: {}*[REDACTED]", key, prefix)
    }).to_string()
}

Patterns matched: token, api_key, password, secret, bearer, credential.

Sources: src/agent/loop_.rs:25-77, src/agent/loop_.rs:1039-1044


Key Takeaways

  1. Unified Interface: The Provider trait abstracts tool calling differences across 28+ providers through supports_native_tools() and convert_tools().

  2. Multiple Parsing Strategies: The agent loop parses tool calls from native JSON, XML tags, markdown blocks, and GLM-style formats in a single unified flow.

  3. Security by Design: Tool calls must be wrapped in recognized formats to prevent prompt injection. Credentials are scrubbed from tool outputs before being sent to the LLM.

  4. History Fidelity: Native providers preserve tool call IDs for multi-turn conversations. Prompt-guided providers use XML-wrapped JSON for history reconstruction.

  5. Graceful Degradation: Providers without native support fall back to prompt-guided mode with tool documentation injected into the system prompt.

  6. Provider-Specific Quirks: Ollama converts native tool_calls to JSON format. GLM supports dual formats. Anthropic uses content blocks with caching.

Sources: src/agent/loop_.rs:851-1094, src/providers/traits.rs:195-397, src/providers/mod.rs:572-840


Clone this wiki locally