Skip to content

Commit 403a7e5

Browse files
authored
usage with agents (#15)
1 parent 277feba commit 403a7e5

3 files changed

Lines changed: 189 additions & 1 deletion

File tree

README.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -403,6 +403,50 @@ The daemon starts automatically on first command and persists between commands f
403403
| Linux x64 | ✅ Native Rust | Node.js |
404404
| Windows | - | Node.js |
405405
406+
## Usage with AI Agents
407+
408+
### Just ask the agent
409+
410+
The simplest approach - just tell your agent to use it:
411+
412+
```
413+
Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
414+
```
415+
416+
The `--help` output is comprehensive and most agents can figure it out from there.
417+
418+
### AGENTS.md / CLAUDE.md
419+
420+
For more consistent results, add to your project or global instructions file:
421+
422+
```markdown
423+
## Browser Automation
424+
425+
Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
426+
427+
Core workflow:
428+
1. `agent-browser open <url>` - Navigate to page
429+
2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
430+
3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
431+
4. Re-snapshot after page changes
432+
```
433+
434+
### Claude Code Skill
435+
436+
For Claude Code, a [skill](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices) provides richer context:
437+
438+
```bash
439+
cp -r node_modules/agent-browser/skills/browsing-web .claude/skills/
440+
```
441+
442+
Or download:
443+
444+
```bash
445+
mkdir -p .claude/skills/browsing-web
446+
curl -o .claude/skills/browsing-web/SKILL.md \
447+
https://raw.githubusercontent.com/vercel-labs/agent-browser/main/skills/browsing-web/SKILL.md
448+
```
449+
406450
## License
407451
408452
Apache-2.0

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@
77
"files": [
88
"dist",
99
"bin",
10-
"scripts"
10+
"scripts",
11+
"skills"
1112
],
1213
"bin": {
1314
"agent-browser": "./bin/agent-browser"

skills/browsing-web/SKILL.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
---
2+
name: browsing-web
3+
description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
4+
---
5+
6+
# Browser Automation with agent-browser
7+
8+
## Quick start
9+
10+
```bash
11+
agent-browser open <url> # Navigate to page
12+
agent-browser snapshot -i # Get interactive elements with refs
13+
agent-browser click @e1 # Click element by ref
14+
agent-browser fill @e2 "text" # Fill input by ref
15+
agent-browser close # Close browser
16+
```
17+
18+
## Core workflow
19+
20+
1. Navigate: `agent-browser open <url>`
21+
2. Snapshot: `agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`)
22+
3. Interact using refs from the snapshot
23+
4. Re-snapshot after navigation or significant DOM changes
24+
25+
## Commands
26+
27+
### Navigation
28+
```bash
29+
agent-browser open <url> # Navigate to URL
30+
agent-browser back # Go back
31+
agent-browser forward # Go forward
32+
agent-browser reload # Reload page
33+
agent-browser close # Close browser
34+
```
35+
36+
### Snapshot (page analysis)
37+
```bash
38+
agent-browser snapshot # Full accessibility tree
39+
agent-browser snapshot -i # Interactive elements only (recommended)
40+
agent-browser snapshot -c # Compact output
41+
agent-browser snapshot -d 3 # Limit depth to 3
42+
```
43+
44+
### Interactions (use @refs from snapshot)
45+
```bash
46+
agent-browser click @e1 # Click
47+
agent-browser dblclick @e1 # Double-click
48+
agent-browser fill @e2 "text" # Clear and type
49+
agent-browser type @e2 "text" # Type without clearing
50+
agent-browser press Enter # Press key
51+
agent-browser press Control+a # Key combination
52+
agent-browser hover @e1 # Hover
53+
agent-browser check @e1 # Check checkbox
54+
agent-browser uncheck @e1 # Uncheck checkbox
55+
agent-browser select @e1 "value" # Select dropdown
56+
agent-browser scroll down 500 # Scroll page
57+
agent-browser scrollintoview @e1 # Scroll element into view
58+
```
59+
60+
### Get information
61+
```bash
62+
agent-browser get text @e1 # Get element text
63+
agent-browser get value @e1 # Get input value
64+
agent-browser get title # Get page title
65+
agent-browser get url # Get current URL
66+
```
67+
68+
### Screenshots
69+
```bash
70+
agent-browser screenshot # Screenshot to stdout
71+
agent-browser screenshot path.png # Save to file
72+
agent-browser screenshot --full # Full page
73+
```
74+
75+
### Wait
76+
```bash
77+
agent-browser wait @e1 # Wait for element
78+
agent-browser wait 2000 # Wait milliseconds
79+
agent-browser wait --text "Success" # Wait for text
80+
agent-browser wait --load networkidle # Wait for network idle
81+
```
82+
83+
### Semantic locators (alternative to refs)
84+
```bash
85+
agent-browser find role button click --name "Submit"
86+
agent-browser find text "Sign In" click
87+
agent-browser find label "Email" fill "user@test.com"
88+
```
89+
90+
## Example: Form submission
91+
92+
```bash
93+
agent-browser open https://example.com/form
94+
agent-browser snapshot -i
95+
# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
96+
97+
agent-browser fill @e1 "user@example.com"
98+
agent-browser fill @e2 "password123"
99+
agent-browser click @e3
100+
agent-browser wait --load networkidle
101+
agent-browser snapshot -i # Check result
102+
```
103+
104+
## Example: Authentication with saved state
105+
106+
```bash
107+
# Login once
108+
agent-browser open https://app.example.com/login
109+
agent-browser snapshot -i
110+
agent-browser fill @e1 "username"
111+
agent-browser fill @e2 "password"
112+
agent-browser click @e3
113+
agent-browser wait --url "**/dashboard"
114+
agent-browser state save auth.json
115+
116+
# Later sessions: load saved state
117+
agent-browser state load auth.json
118+
agent-browser open https://app.example.com/dashboard
119+
```
120+
121+
## Sessions (parallel browsers)
122+
123+
```bash
124+
agent-browser --session test1 open site-a.com
125+
agent-browser --session test2 open site-b.com
126+
agent-browser session list
127+
```
128+
129+
## JSON output (for parsing)
130+
131+
Add `--json` for machine-readable output:
132+
```bash
133+
agent-browser snapshot -i --json
134+
agent-browser get text @e1 --json
135+
```
136+
137+
## Debugging
138+
139+
```bash
140+
agent-browser open example.com --headed # Show browser window
141+
agent-browser console # View console messages
142+
agent-browser errors # View page errors
143+
```

0 commit comments

Comments
 (0)