08.2 Browser And HTTP Tools

Browser and HTTP Tools

Relevant source files

The following files were used as context for generating this wiki page:

This document covers ZeroClaw's browser automation, HTTP request, and web search tools. These tools enable the agent to interact with web content, automate browser tasks, make HTTP API calls, and search the internet.

For information about tool registration and the overall tool system architecture, see Tools. For configuration of these tools, see Configuration File Reference.

Overview

The browser and HTTP tools provide three distinct capabilities:

Tool	Purpose	Backend Options
`browser`	Full browser automation with DOM interaction	agent-browser CLI, rust-native (fantoccini), computer-use sidecar
`http_request`	Direct HTTP/HTTPS API calls without browser overhead	reqwest client with proxy support
`web_search`	Internet search via Brave Search API or GLM search	Brave API, GLM built-in

All three tools enforce domain allowlisting, SSRF protection, and autonomy-level security checks.

Sources: src/tools/browser.rs:1-60, src/tools/mod.rs:186-204

Browser Tool Architecture

Backend Selection and Resolution

The BrowserTool supports three pluggable backends, selected via configuration or auto-detection:

flowchart TD
    Config["BrowserConfig::backend<br/>(from config.toml)"]
    Parse["BrowserBackendKind::parse()"]
    
    Agent["Agent Browser<br/>(Vercel CLI)"]
    Native["Rust Native<br/>(Fantoccini)"]
    Computer["Computer Use<br/>(OS-level sidecar)"]
    Auto["Auto<br/>(try all)"]
    
    CheckAgent{"agent-browser<br/>CLI available?"}
    CheckNative{"browser-native<br/>feature enabled?"}
    CheckComputer{"computer_use.endpoint<br/>reachable?"}
    
    ResolveAgent["ResolvedBackend::AgentBrowser"]
    ResolveNative["ResolvedBackend::RustNative"]
    ResolveComputer["ResolvedBackend::ComputerUse"]
    Error["Error:<br/>No backend available"]
    
    Config --> Parse
    Parse -->|"agent_browser"| CheckAgent
    Parse -->|"rust_native"| CheckNative
    Parse -->|"computer_use"| CheckComputer
    Parse -->|"auto"| Auto
    
    CheckAgent -->|"yes"| ResolveAgent
    CheckAgent -->|"no"| Error
    
    CheckNative -->|"yes"| ResolveNative
    CheckNative -->|"no"| Error
    
    CheckComputer -->|"yes"| ResolveComputer
    CheckComputer -->|"no"| Error
    
    Auto --> CheckNative
    Auto --> CheckAgent
    Auto --> CheckComputer

Sources: src/tools/browser.rs:61-98, src/tools/browser.rs:315-386

The resolve_backend() method performs runtime detection:

Explicit backend: Validates the configured backend is available
Auto mode: Tries rust-native → agent-browser → computer-use in order
Availability checks:
- agent-browser: Runs agent-browser --version via subprocess
- rust-native: Checks #[cfg(feature = "browser-native")] and WebDriver endpoint
- computer-use: Validates endpoint URL and performs connectivity check

Sources: src/tools/browser.rs:229-267, src/tools/browser.rs:315-386

Backend Comparison

Feature	Agent Browser	Rust Native	Computer Use
Implementation	Node.js CLI (`agent-browser`)	Fantoccini WebDriver client	External HTTP sidecar
Setup	`npm install -g agent-browser`	Build with `--features browser-native`	Deploy sidecar + configure endpoint
DOM Access	Accessibility tree snapshot	Full WebDriver protocol	OS-level screenshot + coordinate-based
Performance	Moderate (subprocess overhead)	Fast (native Rust)	Slow (image processing)
Headless Mode	Yes	Configurable via `native_headless`	N/A (OS-level)
Session Persistence	`--session <name>` flag	Managed via `NativeBrowserState`	Sidecar-managed
Action Types	Browser-specific	Browser-specific	OS-level (mouse, keyboard, window)

Sources: src/tools/browser.rs:48-59, src/tools/browser.rs:186-227

Agent Browser Backend

Uses the Vercel agent-browser CLI for automation. Each action spawns a subprocess:

sequenceDiagram
    participant Tool as BrowserTool
    participant CLI as agent-browser CLI
    participant Browser as Headless Chrome
    
    Tool->>CLI: Command::new("agent-browser")
    Tool->>CLI: Add session flag (optional)
    Tool->>CLI: Add action args + "--json"
    CLI->>Browser: Launch/connect via Playwright
    Browser-->>CLI: Execute action
    CLI-->>Tool: JSON response {success, data, error}
    Tool->>Tool: Parse AgentBrowserResponse
    Tool-->>Agent: ToolResult

Example commands generated:

agent-browser open https://example.com --session my-session --json
agent-browser snapshot -i -c --json
agent-browser click [data-ref="42"] --json

Sources: src/tools/browser.rs:426-472, src/tools/browser.rs:476-612

Rust Native Backend

Directly uses the fantoccini WebDriver client when compiled with --features browser-native:

// Conceptual structure (actual implementation in browser_native.rs, not provided)
struct NativeBrowserState {
    client: Option<fantoccini::Client>,
    session_url: String,
}

impl NativeBrowserState {
    async fn execute_action(&mut self, action: BrowserAction, ...) -> Result<Value> {
        let client = self.ensure_connected(...).await?;
        match action {
            BrowserAction::Open { url } => client.goto(&url).await?,
            BrowserAction::Click { selector } => {
                let elem = client.find(Locator::Css(&selector)).await?;
                elem.click().await?;
            }
            // ... other actions
        }
    }
}

Sources: src/tools/browser.rs:614-646, src/tools/browser.rs:251-267

Computer Use Backend

Delegates to an external HTTP sidecar for OS-level automation:

sequenceDiagram
    participant Tool as BrowserTool
    participant Sidecar as Computer Use Sidecar
    participant OS as Operating System
    
    Tool->>Tool: validate_computer_use_action()
    Tool->>Tool: Check coordinate limits
    Tool->>Sidecar: POST {endpoint}/v1/actions
    Note over Tool,Sidecar: JSON payload with action, params, policy, metadata
    Sidecar->>OS: Execute mouse_move/mouse_click/type_text
    Sidecar->>OS: Capture screenshot
    OS-->>Sidecar: Result + image data
    Sidecar-->>Tool: ComputerUseResponse {success, data, error}
    Tool-->>Agent: ToolResult with screenshot base64

Payload structure:

{
  "action": "mouse_click",
  "params": {"x": 100, "y": 200},
  "policy": {
    "allowed_domains": ["example.com"],
    "window_allowlist": ["Firefox"],
    "max_coordinate_x": 1920,
    "max_coordinate_y": 1080
  },
  "metadata": {
    "session_name": "agent-1",
    "source": "zeroclaw.browser",
    "version": "0.x.x"
  }
}

Sources: src/tools/browser.rs:21-45, src/tools/browser.rs:708-819

Browser Actions

The BrowserAction enum defines all supported operations:

Navigation and State

Action	Backend Support	Description
`Open { url }`	All	Navigate to URL (allowlist-checked)
`GetUrl`	agent-browser, rust-native	Get current page URL
`GetTitle`	agent-browser, rust-native	Get page title
`Close`	agent-browser, rust-native	Close browser session

DOM Interaction

Action	Backend Support	Parameters
`Snapshot { interactive_only, compact, depth }`	agent-browser	Get accessibility tree with refs
`Click { selector }`	All (OS-level for computer-use)	Click element by CSS selector or ref
`Fill { selector, value }`	All	Fill form field
`Type { selector, text }`	All	Type text into element
`GetText { selector }`	agent-browser, rust-native	Extract text content
`Hover { selector }`	agent-browser, rust-native	Hover over element
`Press { key }`	All	Press keyboard key

Semantic Locators

Action	Parameters	Description
`Find { by, value, action, fill_value }`	by: role/text/label/placeholder/testid	Find element semantically and perform action

Example: Find { by: "role", value: "button", action: "click", fill_value: None }

Sources: src/tools/browser.rs:119-184

Visual and Timing

Action	Parameters	Description
`Screenshot { path, full_page }`	Optional path, full-page flag	Capture screenshot
`Wait { selector, ms, text }`	Selector, milliseconds, or text	Wait for element/time/text
`Scroll { direction, pixels }`	Direction (up/down), optional pixels	Scroll page
`IsVisible { selector }`	Selector	Check element visibility

Sources: src/tools/browser.rs:119-184

HTTP Request Tool

The HttpRequestTool provides direct HTTP/HTTPS access without browser overhead:

flowchart LR
    Agent["Agent"]
    Tool["HttpRequestTool"]
    Security["SecurityPolicy"]
    Allowlist["Domain Allowlist"]
    Proxy["Runtime Proxy"]
    Target["Target Server"]
    
    Agent -->|"execute({url, method, body})"| Tool
    Tool -->|"enforce_tool_operation()"| Security
    Tool -->|"validate_url()"| Allowlist
    Tool -->|"build_runtime_proxy_client()"| Proxy
    Proxy -->|"reqwest::Client::request()"| Target
    Target -->|"Response"| Tool
    Tool -->|"ToolResult"| Agent

Configuration

[http_request]
enabled = true
allowed_domains = ["api.example.com", "*.github.com"]
max_response_size = 5242880  # 5MB
timeout_secs = 30

Sources: src/tools/mod.rs:187-194

Security Features

Domain Allowlisting: Validates Host header against allowed_domains patterns
SSRF Protection: Blocks private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
Response Size Limits: Enforces max_response_size to prevent memory exhaustion
Timeout: Hard timeout via timeout_secs configuration
Proxy Support: Respects ProxyConfig for corporate environments

The tool uses crate::config::build_runtime_proxy_client("tool.http_request") to automatically apply proxy configuration.

Sources: src/tools/mod.rs:187-194

Web Search Tool

The WebSearchTool provides internet search capabilities via external APIs:

Brave Search Integration

[web_search]
enabled = true
provider = "brave"
brave_api_key = "enc2:..."  # Encrypted in secret store
max_results = 10
timeout_secs = 15

Request flow:

Agent calls web_search tool with query
Tool validates can_act() and record_action()
Makes GET request to https://api.search.brave.com/res/v1/web/search
Parses JSON response and extracts web.results array
Returns formatted results with titles, URLs, descriptions

GLM Provider (Built-in)

Some models (e.g., GLM-4) have native web search:

[web_search]
provider = "glm"
# No API key needed - model has built-in capability

The tool passes the query to the provider's built-in search function instead of external APIs.

Sources: src/tools/mod.rs:196-204

Tool Registration

Browser and HTTP tools are conditionally registered based on configuration:

flowchart TD
    Start["all_tools() function"]
    
    BrowserCheck{"browser_config.enabled?"}
    BrowserAdd["Add BrowserOpenTool<br/>Add BrowserTool"]
    
    HttpCheck{"http_config.enabled?"}
    HttpAdd["Add HttpRequestTool"]
    
    SearchCheck{"web_search.enabled?"}
    SearchAdd["Add WebSearchTool"]
    
    Registry["Tool Registry<br/>Vec<Box<dyn Tool>>"]
    
    Start --> BrowserCheck
    BrowserCheck -->|"true"| BrowserAdd
    BrowserCheck -->|"false"| HttpCheck
    BrowserAdd --> HttpCheck
    
    HttpCheck -->|"true"| HttpAdd
    HttpCheck -->|"false"| SearchCheck
    HttpAdd --> SearchCheck
    
    SearchCheck -->|"true"| SearchAdd
    SearchCheck -->|"false"| Registry
    SearchAdd --> Registry

Code reference:

// src/tools/mod.rs:160-204
if browser_config.enabled {
    tools.push(Box::new(BrowserOpenTool::new(...)));  // Legacy simple URL opener
    tools.push(Box::new(BrowserTool::new_with_backend(...)));  // Full automation
}

if http_config.enabled {
    tools.push(Box::new(HttpRequestTool::new(...)));
}

if root_config.web_search.enabled {
    tools.push(Box::new(WebSearchTool::new(...)));
}

Sources: src/tools/mod.rs:160-204

Security and Validation

URL Validation Pipeline

flowchart TD
    Input["URL from Agent"]
    
    Empty{"Empty?"}
    Protocol{"Starts with<br/>http:// or https://?"}
    FileCheck{"file:// URL?"}
    ExtractHost["extract_host()"]
    PrivateCheck{"is_private_host()?"}
    AllowlistCheck{"host_matches_allowlist()?"}
    
    Valid["✓ Valid URL"]
    
    Error1["✗ Empty URL"]
    Error2["✗ Invalid protocol"]
    Error3["✗ file:// blocked"]
    Error4["✗ Private host"]
    Error5["✗ Not in allowlist"]
    
    Input --> Empty
    Empty -->|"yes"| Error1
    Empty -->|"no"| FileCheck
    FileCheck -->|"yes"| Error3
    FileCheck -->|"no"| Protocol
    Protocol -->|"no"| Error2
    Protocol -->|"yes"| ExtractHost
    ExtractHost --> PrivateCheck
    PrivateCheck -->|"yes"| Error4
    PrivateCheck -->|"no"| AllowlistCheck
    AllowlistCheck -->|"no"| Error5
    AllowlistCheck -->|"yes"| Valid

Sources: src/tools/browser.rs:388-424

Private Host Detection

The is_private_host() function blocks:

Loopback: 127.0.0.1, localhost, ::1
RFC 1918 Private: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
Link-local: 169.254.0.0/16, fe80::/10
Special addresses: 0.0.0.0, ::

Implementation:

fn is_private_host(host: &str) -> bool {
    if let Ok(addr) = host.parse::<IpAddr>() {
        match addr {
            IpAddr::V4(ip) => ip.is_private() || ip.is_loopback() || ip.is_link_local(),
            IpAddr::V6(ip) => ip.is_loopback() || ip.is_unicast_link_local(),
        }
    } else {
        matches!(host, "localhost" | "*.local")
    }
}

This prevents Server-Side Request Forgery (SSRF) attacks against internal services.

Sources: src/tools/browser.rs:388-424

Computer-Use Coordinate Validation

The computer-use backend enforces screen coordinate limits:

fn validate_coordinate(&self, key: &str, value: i64, max: Option<i64>) -> Result<()> {
    if value < 0 {
        bail!("'{key}' must be >= 0")
    }
    if let Some(limit) = max {
        if limit < 0 {
            bail!("Configured coordinate limit for '{key}' must be >= 0")
        }
        if value > limit {
            bail!("'{key}'={value} exceeds configured limit {limit}")
        }
    }
    Ok(())
}

Configuration example:

[browser.computer_use]
max_coordinate_x = 1920
max_coordinate_y = 1080

This prevents the agent from interacting with UI elements outside the intended screen region.

Sources: src/tools/browser.rs:648-661, src/tools/browser.rs:674-706

Autonomy and Rate Limiting

All action tools (browser, http_request, web_search, composio) enforce:

Autonomy Check: security.can_act() returns false for AutonomyLevel::ReadOnly
Rate Limiting: security.record_action() returns false when max_actions_per_hour is exceeded

Example from tool execution:

if !self.security.can_act() {
    return Ok(ToolResult {
        success: false,
        error: Some("Action blocked: autonomy is read-only".into()),
        ..
    });
}

if !self.security.record_action() {
    return Ok(ToolResult {
        success: false,
        error: Some("Action blocked: rate limit exceeded".into()),
        ..
    });
}

Sources: src/tools/pushover.rs:114-129

Configuration Reference

Browser Configuration

[browser]
enabled = true
backend = "auto"  # agent_browser | rust_native | computer_use | auto
allowed_domains = ["*.example.com", "github.com"]
session_name = "my-session"  # Optional: persist browser state

# Rust-native backend settings (requires --features browser-native)
native_headless = true
native_webdriver_url = "http://127.0.0.1:9515"
native_chrome_path = "/usr/bin/google-chrome"  # Optional

[browser.computer_use]
endpoint = "http://127.0.0.1:8787/v1/actions"
api_key = "enc2:..."  # Optional bearer token
timeout_ms = 15000
allow_remote_endpoint = false  # Enforce HTTPS for non-localhost
window_allowlist = ["Firefox", "Chrome"]  # OS window title filter
max_coordinate_x = 1920
max_coordinate_y = 1080

HTTP Request Configuration

[http_request]
enabled = true
allowed_domains = ["api.example.com", "*.github.com", "httpbin.org"]
max_response_size = 5242880  # 5MB in bytes
timeout_secs = 30

Web Search Configuration

[web_search]
enabled = true
provider = "brave"  # brave | glm
brave_api_key = "enc2:..."  # Encrypted via SecretStore
max_results = 10
timeout_secs = 15

Sources: src/config/mod.rs:1-17, src/tools/mod.rs:160-204

Integration with Secret Store

API keys and tokens are encrypted using SecretStore with ChaCha20-Poly1305 AEAD:

[secrets]
encrypt = true  # Default: true

[web_search]
brave_api_key = "enc2:a1b2c3..."  # Encrypted format

[browser.computer_use]
api_key = "enc2:d4e5f6..."  # Bearer token for sidecar

The enc2: prefix indicates ChaCha20-Poly1305 encryption. Legacy enc: (XOR cipher) values are automatically migrated. See Secret Management for details.

Sources: src/security/secrets.rs:53-93

Home

08.2 Browser And HTTP Tools

Browser and HTTP Tools

Overview

Browser Tool Architecture

Backend Selection and Resolution

Backend Comparison

Agent Browser Backend

Rust Native Backend

Computer Use Backend

Browser Actions

Navigation and State

DOM Interaction

Semantic Locators

Visual and Timing

HTTP Request Tool

Configuration

Security Features

Web Search Tool

Brave Search Integration

GLM Provider (Built-in)

Tool Registration

Security and Validation

URL Validation Pipeline

Private Host Detection

Computer-Use Coordinate Validation

Autonomy and Rate Limiting

Configuration Reference

Browser Configuration

HTTP Request Configuration

Web Search Configuration

Integration with Secret Store

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!