Skip to content

08.2 Browser And HTTP Tools

Nikolay Vyahhi edited this page Feb 19, 2026 · 3 revisions

Browser and HTTP Tools

Relevant source files

The following files were used as context for generating this wiki page:

This document covers ZeroClaw's browser automation, HTTP request, and web search tools. These tools enable the agent to interact with web content, automate browser tasks, make HTTP API calls, and search the internet.

For information about tool registration and the overall tool system architecture, see Tools. For configuration of these tools, see Configuration File Reference.

Overview

The browser and HTTP tools provide three distinct capabilities:

Tool Purpose Backend Options
browser Full browser automation with DOM interaction agent-browser CLI, rust-native (fantoccini), computer-use sidecar
http_request Direct HTTP/HTTPS API calls without browser overhead reqwest client with proxy support
web_search Internet search via Brave Search API or GLM search Brave API, GLM built-in

All three tools enforce domain allowlisting, SSRF protection, and autonomy-level security checks.

Sources: src/tools/browser.rs:1-60, src/tools/mod.rs:186-204

Browser Tool Architecture

Backend Selection and Resolution

The BrowserTool supports three pluggable backends, selected via configuration or auto-detection:

flowchart TD
    Config["BrowserConfig::backend<br/>(from config.toml)"]
    Parse["BrowserBackendKind::parse()"]
    
    Agent["Agent Browser<br/>(Vercel CLI)"]
    Native["Rust Native<br/>(Fantoccini)"]
    Computer["Computer Use<br/>(OS-level sidecar)"]
    Auto["Auto<br/>(try all)"]
    
    CheckAgent{"agent-browser<br/>CLI available?"}
    CheckNative{"browser-native<br/>feature enabled?"}
    CheckComputer{"computer_use.endpoint<br/>reachable?"}
    
    ResolveAgent["ResolvedBackend::AgentBrowser"]
    ResolveNative["ResolvedBackend::RustNative"]
    ResolveComputer["ResolvedBackend::ComputerUse"]
    Error["Error:<br/>No backend available"]
    
    Config --> Parse
    Parse -->|"agent_browser"| CheckAgent
    Parse -->|"rust_native"| CheckNative
    Parse -->|"computer_use"| CheckComputer
    Parse -->|"auto"| Auto
    
    CheckAgent -->|"yes"| ResolveAgent
    CheckAgent -->|"no"| Error
    
    CheckNative -->|"yes"| ResolveNative
    CheckNative -->|"no"| Error
    
    CheckComputer -->|"yes"| ResolveComputer
    CheckComputer -->|"no"| Error
    
    Auto --> CheckNative
    Auto --> CheckAgent
    Auto --> CheckComputer
Loading

Sources: src/tools/browser.rs:61-98, src/tools/browser.rs:315-386

The resolve_backend() method performs runtime detection:

  1. Explicit backend: Validates the configured backend is available
  2. Auto mode: Tries rust-native → agent-browser → computer-use in order
  3. Availability checks:
    • agent-browser: Runs agent-browser --version via subprocess
    • rust-native: Checks #[cfg(feature = "browser-native")] and WebDriver endpoint
    • computer-use: Validates endpoint URL and performs connectivity check

Sources: src/tools/browser.rs:229-267, src/tools/browser.rs:315-386

Backend Comparison

Feature Agent Browser Rust Native Computer Use
Implementation Node.js CLI (agent-browser) Fantoccini WebDriver client External HTTP sidecar
Setup npm install -g agent-browser Build with --features browser-native Deploy sidecar + configure endpoint
DOM Access Accessibility tree snapshot Full WebDriver protocol OS-level screenshot + coordinate-based
Performance Moderate (subprocess overhead) Fast (native Rust) Slow (image processing)
Headless Mode Yes Configurable via native_headless N/A (OS-level)
Session Persistence --session <name> flag Managed via NativeBrowserState Sidecar-managed
Action Types Browser-specific Browser-specific OS-level (mouse, keyboard, window)

Sources: src/tools/browser.rs:48-59, src/tools/browser.rs:186-227

Agent Browser Backend

Uses the Vercel agent-browser CLI for automation. Each action spawns a subprocess:

sequenceDiagram
    participant Tool as BrowserTool
    participant CLI as agent-browser CLI
    participant Browser as Headless Chrome
    
    Tool->>CLI: Command::new("agent-browser")
    Tool->>CLI: Add session flag (optional)
    Tool->>CLI: Add action args + "--json"
    CLI->>Browser: Launch/connect via Playwright
    Browser-->>CLI: Execute action
    CLI-->>Tool: JSON response {success, data, error}
    Tool->>Tool: Parse AgentBrowserResponse
    Tool-->>Agent: ToolResult
Loading

Example commands generated:

  • agent-browser open https://example.com --session my-session --json
  • agent-browser snapshot -i -c --json
  • agent-browser click [data-ref="42"] --json

Sources: src/tools/browser.rs:426-472, src/tools/browser.rs:476-612

Rust Native Backend

Directly uses the fantoccini WebDriver client when compiled with --features browser-native:

// Conceptual structure (actual implementation in browser_native.rs, not provided)
struct NativeBrowserState {
    client: Option<fantoccini::Client>,
    session_url: String,
}

impl NativeBrowserState {
    async fn execute_action(&mut self, action: BrowserAction, ...) -> Result<Value> {
        let client = self.ensure_connected(...).await?;
        match action {
            BrowserAction::Open { url } => client.goto(&url).await?,
            BrowserAction::Click { selector } => {
                let elem = client.find(Locator::Css(&selector)).await?;
                elem.click().await?;
            }
            // ... other actions
        }
    }
}

Sources: src/tools/browser.rs:614-646, src/tools/browser.rs:251-267

Computer Use Backend

Delegates to an external HTTP sidecar for OS-level automation:

sequenceDiagram
    participant Tool as BrowserTool
    participant Sidecar as Computer Use Sidecar
    participant OS as Operating System
    
    Tool->>Tool: validate_computer_use_action()
    Tool->>Tool: Check coordinate limits
    Tool->>Sidecar: POST {endpoint}/v1/actions
    Note over Tool,Sidecar: JSON payload with action, params, policy, metadata
    Sidecar->>OS: Execute mouse_move/mouse_click/type_text
    Sidecar->>OS: Capture screenshot
    OS-->>Sidecar: Result + image data
    Sidecar-->>Tool: ComputerUseResponse {success, data, error}
    Tool-->>Agent: ToolResult with screenshot base64
Loading

Payload structure:

{
  "action": "mouse_click",
  "params": {"x": 100, "y": 200},
  "policy": {
    "allowed_domains": ["example.com"],
    "window_allowlist": ["Firefox"],
    "max_coordinate_x": 1920,
    "max_coordinate_y": 1080
  },
  "metadata": {
    "session_name": "agent-1",
    "source": "zeroclaw.browser",
    "version": "0.x.x"
  }
}

Sources: src/tools/browser.rs:21-45, src/tools/browser.rs:708-819

Browser Actions

The BrowserAction enum defines all supported operations:

Navigation and State

Action Backend Support Description
Open { url } All Navigate to URL (allowlist-checked)
GetUrl agent-browser, rust-native Get current page URL
GetTitle agent-browser, rust-native Get page title
Close agent-browser, rust-native Close browser session

DOM Interaction

Action Backend Support Parameters
Snapshot { interactive_only, compact, depth } agent-browser Get accessibility tree with refs
Click { selector } All (OS-level for computer-use) Click element by CSS selector or ref
Fill { selector, value } All Fill form field
Type { selector, text } All Type text into element
GetText { selector } agent-browser, rust-native Extract text content
Hover { selector } agent-browser, rust-native Hover over element
Press { key } All Press keyboard key

Semantic Locators

Action Parameters Description
Find { by, value, action, fill_value } by: role/text/label/placeholder/testid Find element semantically and perform action

Example: Find { by: "role", value: "button", action: "click", fill_value: None }

Sources: src/tools/browser.rs:119-184

Visual and Timing

Action Parameters Description
Screenshot { path, full_page } Optional path, full-page flag Capture screenshot
Wait { selector, ms, text } Selector, milliseconds, or text Wait for element/time/text
Scroll { direction, pixels } Direction (up/down), optional pixels Scroll page
IsVisible { selector } Selector Check element visibility

Sources: src/tools/browser.rs:119-184

HTTP Request Tool

The HttpRequestTool provides direct HTTP/HTTPS access without browser overhead:

flowchart LR
    Agent["Agent"]
    Tool["HttpRequestTool"]
    Security["SecurityPolicy"]
    Allowlist["Domain Allowlist"]
    Proxy["Runtime Proxy"]
    Target["Target Server"]
    
    Agent -->|"execute({url, method, body})"| Tool
    Tool -->|"enforce_tool_operation()"| Security
    Tool -->|"validate_url()"| Allowlist
    Tool -->|"build_runtime_proxy_client()"| Proxy
    Proxy -->|"reqwest::Client::request()"| Target
    Target -->|"Response"| Tool
    Tool -->|"ToolResult"| Agent
Loading

Configuration

[http_request]
enabled = true
allowed_domains = ["api.example.com", "*.github.com"]
max_response_size = 5242880  # 5MB
timeout_secs = 30

Sources: src/tools/mod.rs:187-194

Security Features

  1. Domain Allowlisting: Validates Host header against allowed_domains patterns
  2. SSRF Protection: Blocks private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
  3. Response Size Limits: Enforces max_response_size to prevent memory exhaustion
  4. Timeout: Hard timeout via timeout_secs configuration
  5. Proxy Support: Respects ProxyConfig for corporate environments

The tool uses crate::config::build_runtime_proxy_client("tool.http_request") to automatically apply proxy configuration.

Sources: src/tools/mod.rs:187-194

Web Search Tool

The WebSearchTool provides internet search capabilities via external APIs:

Brave Search Integration

[web_search]
enabled = true
provider = "brave"
brave_api_key = "enc2:..."  # Encrypted in secret store
max_results = 10
timeout_secs = 15

Request flow:

  1. Agent calls web_search tool with query
  2. Tool validates can_act() and record_action()
  3. Makes GET request to https://api.search.brave.com/res/v1/web/search
  4. Parses JSON response and extracts web.results array
  5. Returns formatted results with titles, URLs, descriptions

GLM Provider (Built-in)

Some models (e.g., GLM-4) have native web search:

[web_search]
provider = "glm"
# No API key needed - model has built-in capability

The tool passes the query to the provider's built-in search function instead of external APIs.

Sources: src/tools/mod.rs:196-204

Tool Registration

Browser and HTTP tools are conditionally registered based on configuration:

flowchart TD
    Start["all_tools() function"]
    
    BrowserCheck{"browser_config.enabled?"}
    BrowserAdd["Add BrowserOpenTool<br/>Add BrowserTool"]
    
    HttpCheck{"http_config.enabled?"}
    HttpAdd["Add HttpRequestTool"]
    
    SearchCheck{"web_search.enabled?"}
    SearchAdd["Add WebSearchTool"]
    
    Registry["Tool Registry<br/>Vec<Box<dyn Tool>>"]
    
    Start --> BrowserCheck
    BrowserCheck -->|"true"| BrowserAdd
    BrowserCheck -->|"false"| HttpCheck
    BrowserAdd --> HttpCheck
    
    HttpCheck -->|"true"| HttpAdd
    HttpCheck -->|"false"| SearchCheck
    HttpAdd --> SearchCheck
    
    SearchCheck -->|"true"| SearchAdd
    SearchCheck -->|"false"| Registry
    SearchAdd --> Registry
Loading

Code reference:

// src/tools/mod.rs:160-204
if browser_config.enabled {
    tools.push(Box::new(BrowserOpenTool::new(...)));  // Legacy simple URL opener
    tools.push(Box::new(BrowserTool::new_with_backend(...)));  // Full automation
}

if http_config.enabled {
    tools.push(Box::new(HttpRequestTool::new(...)));
}

if root_config.web_search.enabled {
    tools.push(Box::new(WebSearchTool::new(...)));
}

Sources: src/tools/mod.rs:160-204

Security and Validation

URL Validation Pipeline

flowchart TD
    Input["URL from Agent"]
    
    Empty{"Empty?"}
    Protocol{"Starts with<br/>http:// or https://?"}
    FileCheck{"file:// URL?"}
    ExtractHost["extract_host()"]
    PrivateCheck{"is_private_host()?"}
    AllowlistCheck{"host_matches_allowlist()?"}
    
    Valid["✓ Valid URL"]
    
    Error1["✗ Empty URL"]
    Error2["✗ Invalid protocol"]
    Error3["✗ file:// blocked"]
    Error4["✗ Private host"]
    Error5["✗ Not in allowlist"]
    
    Input --> Empty
    Empty -->|"yes"| Error1
    Empty -->|"no"| FileCheck
    FileCheck -->|"yes"| Error3
    FileCheck -->|"no"| Protocol
    Protocol -->|"no"| Error2
    Protocol -->|"yes"| ExtractHost
    ExtractHost --> PrivateCheck
    PrivateCheck -->|"yes"| Error4
    PrivateCheck -->|"no"| AllowlistCheck
    AllowlistCheck -->|"no"| Error5
    AllowlistCheck -->|"yes"| Valid
Loading

Sources: src/tools/browser.rs:388-424

Private Host Detection

The is_private_host() function blocks:

  • Loopback: 127.0.0.1, localhost, ::1
  • RFC 1918 Private: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
  • Link-local: 169.254.0.0/16, fe80::/10
  • Special addresses: 0.0.0.0, ::

Implementation:

fn is_private_host(host: &str) -> bool {
    if let Ok(addr) = host.parse::<IpAddr>() {
        match addr {
            IpAddr::V4(ip) => ip.is_private() || ip.is_loopback() || ip.is_link_local(),
            IpAddr::V6(ip) => ip.is_loopback() || ip.is_unicast_link_local(),
        }
    } else {
        matches!(host, "localhost" | "*.local")
    }
}

This prevents Server-Side Request Forgery (SSRF) attacks against internal services.

Sources: src/tools/browser.rs:388-424

Computer-Use Coordinate Validation

The computer-use backend enforces screen coordinate limits:

fn validate_coordinate(&self, key: &str, value: i64, max: Option<i64>) -> Result<()> {
    if value < 0 {
        bail!("'{key}' must be >= 0")
    }
    if let Some(limit) = max {
        if limit < 0 {
            bail!("Configured coordinate limit for '{key}' must be >= 0")
        }
        if value > limit {
            bail!("'{key}'={value} exceeds configured limit {limit}")
        }
    }
    Ok(())
}

Configuration example:

[browser.computer_use]
max_coordinate_x = 1920
max_coordinate_y = 1080

This prevents the agent from interacting with UI elements outside the intended screen region.

Sources: src/tools/browser.rs:648-661, src/tools/browser.rs:674-706

Autonomy and Rate Limiting

All action tools (browser, http_request, web_search, composio) enforce:

  1. Autonomy Check: security.can_act() returns false for AutonomyLevel::ReadOnly
  2. Rate Limiting: security.record_action() returns false when max_actions_per_hour is exceeded

Example from tool execution:

if !self.security.can_act() {
    return Ok(ToolResult {
        success: false,
        error: Some("Action blocked: autonomy is read-only".into()),
        ..
    });
}

if !self.security.record_action() {
    return Ok(ToolResult {
        success: false,
        error: Some("Action blocked: rate limit exceeded".into()),
        ..
    });
}

Sources: src/tools/pushover.rs:114-129

Configuration Reference

Browser Configuration

[browser]
enabled = true
backend = "auto"  # agent_browser | rust_native | computer_use | auto
allowed_domains = ["*.example.com", "github.com"]
session_name = "my-session"  # Optional: persist browser state

# Rust-native backend settings (requires --features browser-native)
native_headless = true
native_webdriver_url = "http://127.0.0.1:9515"
native_chrome_path = "/usr/bin/google-chrome"  # Optional

[browser.computer_use]
endpoint = "http://127.0.0.1:8787/v1/actions"
api_key = "enc2:..."  # Optional bearer token
timeout_ms = 15000
allow_remote_endpoint = false  # Enforce HTTPS for non-localhost
window_allowlist = ["Firefox", "Chrome"]  # OS window title filter
max_coordinate_x = 1920
max_coordinate_y = 1080

HTTP Request Configuration

[http_request]
enabled = true
allowed_domains = ["api.example.com", "*.github.com", "httpbin.org"]
max_response_size = 5242880  # 5MB in bytes
timeout_secs = 30

Web Search Configuration

[web_search]
enabled = true
provider = "brave"  # brave | glm
brave_api_key = "enc2:..."  # Encrypted via SecretStore
max_results = 10
timeout_secs = 15

Sources: src/config/mod.rs:1-17, src/tools/mod.rs:160-204

Integration with Secret Store

API keys and tokens are encrypted using SecretStore with ChaCha20-Poly1305 AEAD:

[secrets]
encrypt = true  # Default: true

[web_search]
brave_api_key = "enc2:a1b2c3..."  # Encrypted format

[browser.computer_use]
api_key = "enc2:d4e5f6..."  # Bearer token for sidecar

The enc2: prefix indicates ChaCha20-Poly1305 encryption. Legacy enc: (XOR cipher) values are automatically migrated. See Secret Management for details.

Sources: src/security/secrets.rs:53-93


Clone this wiki locally