-
Notifications
You must be signed in to change notification settings - Fork 4.4k
08.2 Browser And HTTP Tools
Relevant source files
The following files were used as context for generating this wiki page:
This document covers ZeroClaw's browser automation, HTTP request, and web search tools. These tools enable the agent to interact with web content, automate browser tasks, make HTTP API calls, and search the internet.
For information about tool registration and the overall tool system architecture, see Tools. For configuration of these tools, see Configuration File Reference.
The browser and HTTP tools provide three distinct capabilities:
| Tool | Purpose | Backend Options |
|---|---|---|
browser |
Full browser automation with DOM interaction | agent-browser CLI, rust-native (fantoccini), computer-use sidecar |
http_request |
Direct HTTP/HTTPS API calls without browser overhead | reqwest client with proxy support |
web_search |
Internet search via Brave Search API or GLM search | Brave API, GLM built-in |
All three tools enforce domain allowlisting, SSRF protection, and autonomy-level security checks.
Sources: src/tools/browser.rs:1-60, src/tools/mod.rs:186-204
The BrowserTool supports three pluggable backends, selected via configuration or auto-detection:
flowchart TD
Config["BrowserConfig::backend<br/>(from config.toml)"]
Parse["BrowserBackendKind::parse()"]
Agent["Agent Browser<br/>(Vercel CLI)"]
Native["Rust Native<br/>(Fantoccini)"]
Computer["Computer Use<br/>(OS-level sidecar)"]
Auto["Auto<br/>(try all)"]
CheckAgent{"agent-browser<br/>CLI available?"}
CheckNative{"browser-native<br/>feature enabled?"}
CheckComputer{"computer_use.endpoint<br/>reachable?"}
ResolveAgent["ResolvedBackend::AgentBrowser"]
ResolveNative["ResolvedBackend::RustNative"]
ResolveComputer["ResolvedBackend::ComputerUse"]
Error["Error:<br/>No backend available"]
Config --> Parse
Parse -->|"agent_browser"| CheckAgent
Parse -->|"rust_native"| CheckNative
Parse -->|"computer_use"| CheckComputer
Parse -->|"auto"| Auto
CheckAgent -->|"yes"| ResolveAgent
CheckAgent -->|"no"| Error
CheckNative -->|"yes"| ResolveNative
CheckNative -->|"no"| Error
CheckComputer -->|"yes"| ResolveComputer
CheckComputer -->|"no"| Error
Auto --> CheckNative
Auto --> CheckAgent
Auto --> CheckComputer
Sources: src/tools/browser.rs:61-98, src/tools/browser.rs:315-386
The resolve_backend() method performs runtime detection:
- Explicit backend: Validates the configured backend is available
- Auto mode: Tries rust-native → agent-browser → computer-use in order
-
Availability checks:
-
agent-browser: Runs
agent-browser --versionvia subprocess -
rust-native: Checks
#[cfg(feature = "browser-native")]and WebDriver endpoint - computer-use: Validates endpoint URL and performs connectivity check
-
agent-browser: Runs
Sources: src/tools/browser.rs:229-267, src/tools/browser.rs:315-386
| Feature | Agent Browser | Rust Native | Computer Use |
|---|---|---|---|
| Implementation | Node.js CLI (agent-browser) |
Fantoccini WebDriver client | External HTTP sidecar |
| Setup | npm install -g agent-browser |
Build with --features browser-native
|
Deploy sidecar + configure endpoint |
| DOM Access | Accessibility tree snapshot | Full WebDriver protocol | OS-level screenshot + coordinate-based |
| Performance | Moderate (subprocess overhead) | Fast (native Rust) | Slow (image processing) |
| Headless Mode | Yes | Configurable via native_headless
|
N/A (OS-level) |
| Session Persistence |
--session <name> flag |
Managed via NativeBrowserState
|
Sidecar-managed |
| Action Types | Browser-specific | Browser-specific | OS-level (mouse, keyboard, window) |
Sources: src/tools/browser.rs:48-59, src/tools/browser.rs:186-227
Uses the Vercel agent-browser CLI for automation. Each action spawns a subprocess:
sequenceDiagram
participant Tool as BrowserTool
participant CLI as agent-browser CLI
participant Browser as Headless Chrome
Tool->>CLI: Command::new("agent-browser")
Tool->>CLI: Add session flag (optional)
Tool->>CLI: Add action args + "--json"
CLI->>Browser: Launch/connect via Playwright
Browser-->>CLI: Execute action
CLI-->>Tool: JSON response {success, data, error}
Tool->>Tool: Parse AgentBrowserResponse
Tool-->>Agent: ToolResult
Example commands generated:
agent-browser open https://example.com --session my-session --jsonagent-browser snapshot -i -c --jsonagent-browser click [data-ref="42"] --json
Sources: src/tools/browser.rs:426-472, src/tools/browser.rs:476-612
Directly uses the fantoccini WebDriver client when compiled with --features browser-native:
// Conceptual structure (actual implementation in browser_native.rs, not provided)
struct NativeBrowserState {
client: Option<fantoccini::Client>,
session_url: String,
}
impl NativeBrowserState {
async fn execute_action(&mut self, action: BrowserAction, ...) -> Result<Value> {
let client = self.ensure_connected(...).await?;
match action {
BrowserAction::Open { url } => client.goto(&url).await?,
BrowserAction::Click { selector } => {
let elem = client.find(Locator::Css(&selector)).await?;
elem.click().await?;
}
// ... other actions
}
}
}Sources: src/tools/browser.rs:614-646, src/tools/browser.rs:251-267
Delegates to an external HTTP sidecar for OS-level automation:
sequenceDiagram
participant Tool as BrowserTool
participant Sidecar as Computer Use Sidecar
participant OS as Operating System
Tool->>Tool: validate_computer_use_action()
Tool->>Tool: Check coordinate limits
Tool->>Sidecar: POST {endpoint}/v1/actions
Note over Tool,Sidecar: JSON payload with action, params, policy, metadata
Sidecar->>OS: Execute mouse_move/mouse_click/type_text
Sidecar->>OS: Capture screenshot
OS-->>Sidecar: Result + image data
Sidecar-->>Tool: ComputerUseResponse {success, data, error}
Tool-->>Agent: ToolResult with screenshot base64
Payload structure:
{
"action": "mouse_click",
"params": {"x": 100, "y": 200},
"policy": {
"allowed_domains": ["example.com"],
"window_allowlist": ["Firefox"],
"max_coordinate_x": 1920,
"max_coordinate_y": 1080
},
"metadata": {
"session_name": "agent-1",
"source": "zeroclaw.browser",
"version": "0.x.x"
}
}Sources: src/tools/browser.rs:21-45, src/tools/browser.rs:708-819
The BrowserAction enum defines all supported operations:
| Action | Backend Support | Description |
|---|---|---|
Open { url } |
All | Navigate to URL (allowlist-checked) |
GetUrl |
agent-browser, rust-native | Get current page URL |
GetTitle |
agent-browser, rust-native | Get page title |
Close |
agent-browser, rust-native | Close browser session |
| Action | Backend Support | Parameters |
|---|---|---|
Snapshot { interactive_only, compact, depth } |
agent-browser | Get accessibility tree with refs |
Click { selector } |
All (OS-level for computer-use) | Click element by CSS selector or ref |
Fill { selector, value } |
All | Fill form field |
Type { selector, text } |
All | Type text into element |
GetText { selector } |
agent-browser, rust-native | Extract text content |
Hover { selector } |
agent-browser, rust-native | Hover over element |
Press { key } |
All | Press keyboard key |
| Action | Parameters | Description |
|---|---|---|
Find { by, value, action, fill_value } |
by: role/text/label/placeholder/testid | Find element semantically and perform action |
Example: Find { by: "role", value: "button", action: "click", fill_value: None }
Sources: src/tools/browser.rs:119-184
| Action | Parameters | Description |
|---|---|---|
Screenshot { path, full_page } |
Optional path, full-page flag | Capture screenshot |
Wait { selector, ms, text } |
Selector, milliseconds, or text | Wait for element/time/text |
Scroll { direction, pixels } |
Direction (up/down), optional pixels | Scroll page |
IsVisible { selector } |
Selector | Check element visibility |
Sources: src/tools/browser.rs:119-184
The HttpRequestTool provides direct HTTP/HTTPS access without browser overhead:
flowchart LR
Agent["Agent"]
Tool["HttpRequestTool"]
Security["SecurityPolicy"]
Allowlist["Domain Allowlist"]
Proxy["Runtime Proxy"]
Target["Target Server"]
Agent -->|"execute({url, method, body})"| Tool
Tool -->|"enforce_tool_operation()"| Security
Tool -->|"validate_url()"| Allowlist
Tool -->|"build_runtime_proxy_client()"| Proxy
Proxy -->|"reqwest::Client::request()"| Target
Target -->|"Response"| Tool
Tool -->|"ToolResult"| Agent
[http_request]
enabled = true
allowed_domains = ["api.example.com", "*.github.com"]
max_response_size = 5242880 # 5MB
timeout_secs = 30Sources: src/tools/mod.rs:187-194
-
Domain Allowlisting: Validates
Hostheader againstallowed_domainspatterns - SSRF Protection: Blocks private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
-
Response Size Limits: Enforces
max_response_sizeto prevent memory exhaustion -
Timeout: Hard timeout via
timeout_secsconfiguration -
Proxy Support: Respects
ProxyConfigfor corporate environments
The tool uses crate::config::build_runtime_proxy_client("tool.http_request") to automatically apply proxy configuration.
Sources: src/tools/mod.rs:187-194
The WebSearchTool provides internet search capabilities via external APIs:
[web_search]
enabled = true
provider = "brave"
brave_api_key = "enc2:..." # Encrypted in secret store
max_results = 10
timeout_secs = 15Request flow:
- Agent calls
web_searchtool with query - Tool validates
can_act()andrecord_action() - Makes GET request to
https://api.search.brave.com/res/v1/web/search - Parses JSON response and extracts
web.resultsarray - Returns formatted results with titles, URLs, descriptions
Some models (e.g., GLM-4) have native web search:
[web_search]
provider = "glm"
# No API key needed - model has built-in capabilityThe tool passes the query to the provider's built-in search function instead of external APIs.
Sources: src/tools/mod.rs:196-204
Browser and HTTP tools are conditionally registered based on configuration:
flowchart TD
Start["all_tools() function"]
BrowserCheck{"browser_config.enabled?"}
BrowserAdd["Add BrowserOpenTool<br/>Add BrowserTool"]
HttpCheck{"http_config.enabled?"}
HttpAdd["Add HttpRequestTool"]
SearchCheck{"web_search.enabled?"}
SearchAdd["Add WebSearchTool"]
Registry["Tool Registry<br/>Vec<Box<dyn Tool>>"]
Start --> BrowserCheck
BrowserCheck -->|"true"| BrowserAdd
BrowserCheck -->|"false"| HttpCheck
BrowserAdd --> HttpCheck
HttpCheck -->|"true"| HttpAdd
HttpCheck -->|"false"| SearchCheck
HttpAdd --> SearchCheck
SearchCheck -->|"true"| SearchAdd
SearchCheck -->|"false"| Registry
SearchAdd --> Registry
Code reference:
// src/tools/mod.rs:160-204
if browser_config.enabled {
tools.push(Box::new(BrowserOpenTool::new(...))); // Legacy simple URL opener
tools.push(Box::new(BrowserTool::new_with_backend(...))); // Full automation
}
if http_config.enabled {
tools.push(Box::new(HttpRequestTool::new(...)));
}
if root_config.web_search.enabled {
tools.push(Box::new(WebSearchTool::new(...)));
}Sources: src/tools/mod.rs:160-204
flowchart TD
Input["URL from Agent"]
Empty{"Empty?"}
Protocol{"Starts with<br/>http:// or https://?"}
FileCheck{"file:// URL?"}
ExtractHost["extract_host()"]
PrivateCheck{"is_private_host()?"}
AllowlistCheck{"host_matches_allowlist()?"}
Valid["✓ Valid URL"]
Error1["✗ Empty URL"]
Error2["✗ Invalid protocol"]
Error3["✗ file:// blocked"]
Error4["✗ Private host"]
Error5["✗ Not in allowlist"]
Input --> Empty
Empty -->|"yes"| Error1
Empty -->|"no"| FileCheck
FileCheck -->|"yes"| Error3
FileCheck -->|"no"| Protocol
Protocol -->|"no"| Error2
Protocol -->|"yes"| ExtractHost
ExtractHost --> PrivateCheck
PrivateCheck -->|"yes"| Error4
PrivateCheck -->|"no"| AllowlistCheck
AllowlistCheck -->|"no"| Error5
AllowlistCheck -->|"yes"| Valid
Sources: src/tools/browser.rs:388-424
The is_private_host() function blocks:
- Loopback:
127.0.0.1,localhost,::1 - RFC 1918 Private:
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16 - Link-local:
169.254.0.0/16,fe80::/10 - Special addresses:
0.0.0.0,::
Implementation:
fn is_private_host(host: &str) -> bool {
if let Ok(addr) = host.parse::<IpAddr>() {
match addr {
IpAddr::V4(ip) => ip.is_private() || ip.is_loopback() || ip.is_link_local(),
IpAddr::V6(ip) => ip.is_loopback() || ip.is_unicast_link_local(),
}
} else {
matches!(host, "localhost" | "*.local")
}
}This prevents Server-Side Request Forgery (SSRF) attacks against internal services.
Sources: src/tools/browser.rs:388-424
The computer-use backend enforces screen coordinate limits:
fn validate_coordinate(&self, key: &str, value: i64, max: Option<i64>) -> Result<()> {
if value < 0 {
bail!("'{key}' must be >= 0")
}
if let Some(limit) = max {
if limit < 0 {
bail!("Configured coordinate limit for '{key}' must be >= 0")
}
if value > limit {
bail!("'{key}'={value} exceeds configured limit {limit}")
}
}
Ok(())
}Configuration example:
[browser.computer_use]
max_coordinate_x = 1920
max_coordinate_y = 1080This prevents the agent from interacting with UI elements outside the intended screen region.
Sources: src/tools/browser.rs:648-661, src/tools/browser.rs:674-706
All action tools (browser, http_request, web_search, composio) enforce:
-
Autonomy Check:
security.can_act()returns false forAutonomyLevel::ReadOnly -
Rate Limiting:
security.record_action()returns false whenmax_actions_per_houris exceeded
Example from tool execution:
if !self.security.can_act() {
return Ok(ToolResult {
success: false,
error: Some("Action blocked: autonomy is read-only".into()),
..
});
}
if !self.security.record_action() {
return Ok(ToolResult {
success: false,
error: Some("Action blocked: rate limit exceeded".into()),
..
});
}Sources: src/tools/pushover.rs:114-129
[browser]
enabled = true
backend = "auto" # agent_browser | rust_native | computer_use | auto
allowed_domains = ["*.example.com", "github.com"]
session_name = "my-session" # Optional: persist browser state
# Rust-native backend settings (requires --features browser-native)
native_headless = true
native_webdriver_url = "http://127.0.0.1:9515"
native_chrome_path = "/usr/bin/google-chrome" # Optional
[browser.computer_use]
endpoint = "http://127.0.0.1:8787/v1/actions"
api_key = "enc2:..." # Optional bearer token
timeout_ms = 15000
allow_remote_endpoint = false # Enforce HTTPS for non-localhost
window_allowlist = ["Firefox", "Chrome"] # OS window title filter
max_coordinate_x = 1920
max_coordinate_y = 1080[http_request]
enabled = true
allowed_domains = ["api.example.com", "*.github.com", "httpbin.org"]
max_response_size = 5242880 # 5MB in bytes
timeout_secs = 30[web_search]
enabled = true
provider = "brave" # brave | glm
brave_api_key = "enc2:..." # Encrypted via SecretStore
max_results = 10
timeout_secs = 15Sources: src/config/mod.rs:1-17, src/tools/mod.rs:160-204
API keys and tokens are encrypted using SecretStore with ChaCha20-Poly1305 AEAD:
[secrets]
encrypt = true # Default: true
[web_search]
brave_api_key = "enc2:a1b2c3..." # Encrypted format
[browser.computer_use]
api_key = "enc2:d4e5f6..." # Bearer token for sidecarThe enc2: prefix indicates ChaCha20-Poly1305 encryption. Legacy enc: (XOR cipher) values are automatically migrated. See Secret Management for details.
Sources: src/security/secrets.rs:53-93