Skip to content

Commit 7134306

Browse files
authored
Add agent-browser skills command with evals (#1225)
* Add `agent-browser skills` command Adds a `skills` CLI command that serves bundled skill content at runtime, always matching the installed CLI version. This solves the problem of agents relying on stale cached SKILL.md files after CLI upgrades. The `npx skills add vercel-labs/agent-browser` flow now installs a single thin discovery skill with trigger words for all use cases (browser automation, dogfooding, Electron apps, Slack, etc.) that directs agents to `agent-browser skills get <name>` for current instructions. The other five skills (dogfood, electron, slack, vercel-sandbox, agentcore) are marked `metadata.internal: true` so they are not installed by default but remain accessible via the CLI command. Subcommands: skills [list] List available skills skills get <name> [--full] Get skill content (with optional references) skills get --all Get all skill content skills path [name] Print skill directory path * Fix skills command robustness: UTF-8 safety, flag handling, path output - Make truncate_description UTF-8-safe using char_indices() instead of byte-indexed slicing that panics on multi-byte codepoints - Pass get_all as a bool parameter to run_get instead of embedding --all as a sentinel string in the names list - Canonicalize skills_dir path so `skills path` output is clean - Warn on unrecognized flags in `skills get` instead of silently ignoring them * Add evals framework and strengthen SKILL.md for better agent compliance Strengthen SKILL.md loading instructions to require `skills get` before running commands, and trim skill descriptions to prevent agents from guessing at command syntax. Add TypeScript/Bun eval framework that tests skill-loading, skill-selection, and command-usage via Claude CLI with Vercel AI Gateway. Evals pass 20/20 (100%), up from 85% baseline. * Fix formatting in skills.rs * Add Codex provider to evals framework Add multi-provider support with a shared Provider interface. Codex provider spawns `codex exec --json`, parses JSONL output, and writes ~/.codex/config.toml for AI Gateway routing. Use `--provider codex` to run evals with Codex (default model: openai/o3). First run scores 19/20 (95%) with 100% on skill-loading and skill-selection. * Use scoped temp dir for Codex config instead of overwriting ~/.codex
1 parent fa043a4 commit 7134306

29 files changed

Lines changed: 2074 additions & 866 deletions

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,19 @@ agent-browser install --with-deps # Also install system deps (Linux)
371371
agent-browser upgrade # Upgrade agent-browser to the latest version
372372
```
373373
374+
### Skills
375+
376+
```bash
377+
agent-browser skills # List available skills
378+
agent-browser skills list # Same as above
379+
agent-browser skills get <name> # Output a skill's full content
380+
agent-browser skills get <name> --full # Include references and templates
381+
agent-browser skills get --all # Output every skill
382+
agent-browser skills path [name] # Print skill directory path
383+
```
384+
385+
Serves bundled skill content that always matches the installed CLI version. AI agents use this to get current instructions rather than relying on cached copies. Set `AGENT_BROWSER_SKILLS_DIR` to override the skills directory path.
386+
374387
## Authentication
375388
376389
agent-browser provides multiple ways to persist login sessions so you don't re-authenticate every run.

cli/Cargo.lock

Lines changed: 39 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

cli/Cargo.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,9 @@ libc = "0.2"
4242
[target.'cfg(windows)'.dependencies]
4343
windows-sys = { version = "0.52", features = ["Win32_System_Threading", "Win32_Foundation"] }
4444

45+
[dev-dependencies]
46+
tempfile = "3"
47+
4548
[build-dependencies]
4649
serde = { version = "1.0", features = ["derive"] }
4750
serde_json = "1.0"

cli/src/main.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ mod flags;
66
mod install;
77
mod native;
88
mod output;
9+
mod skills;
910
#[cfg(test)]
1011
mod test_utils;
1112
mod upgrade;
@@ -685,6 +686,12 @@ fn main() {
685686
return;
686687
}
687688

689+
// Handle skills command (doesn't need daemon)
690+
if clean.first().map(|s| s.as_str()) == Some("skills") {
691+
skills::run_skills(&clean, flags.json);
692+
return;
693+
}
694+
688695
// Handle session separately (doesn't need daemon)
689696
if clean.first().map(|s| s.as_str()) == Some("session") {
690697
run_session(&clean, &flags.session, flags.json);

cli/src/output.rs

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2712,6 +2712,40 @@ Examples:
27122712
"##
27132713
}
27142714

2715+
"skills" => {
2716+
r##"
2717+
agent-browser skills - List and retrieve bundled skill content
2718+
2719+
Usage: agent-browser skills [subcommand] [options]
2720+
2721+
Subcommands:
2722+
list List all available skills (default)
2723+
get <name> [name...] Output a skill's full content
2724+
get <name> --full Include references and templates
2725+
get --all Output every skill
2726+
path [name] Print filesystem path to skill directory
2727+
2728+
Options:
2729+
--json Output as JSON
2730+
2731+
The skills command serves bundled skill content that always matches the
2732+
installed CLI version. Agents should use this to get current instructions
2733+
rather than relying on cached copies.
2734+
2735+
Examples:
2736+
agent-browser skills
2737+
agent-browser skills list
2738+
agent-browser skills get agent-browser
2739+
agent-browser skills get electron --full
2740+
agent-browser skills get --all
2741+
agent-browser skills path agent-browser
2742+
agent-browser skills list --json
2743+
2744+
Environment:
2745+
AGENT_BROWSER_SKILLS_DIR Override the skills directory path
2746+
"##
2747+
}
2748+
27152749
_ => return false,
27162750
};
27172751
println!("{}", help.trim());
@@ -2844,6 +2878,12 @@ Setup:
28442878
dashboard start Start the observability dashboard
28452879
profiles List available Chrome profiles
28462880
2881+
Skills:
2882+
skills [list] List available skills
2883+
skills get <name> [--full] Get skill content (--full includes references)
2884+
skills get --all Get all skill content
2885+
skills path [name] Print skill directory path
2886+
28472887
Snapshot Options:
28482888
-i, --interactive Only interactive elements
28492889
-c, --compact Remove empty structural elements

0 commit comments

Comments
 (0)