Most browser automation tools launch a clean, isolated browser.
chrome-cdp-exconnects to your real browser session: tabs, logins, cookies, and current page state.
- Perceive-first workflow: one call gives structure, layout, styles, coordinates, and console health.
- CSS origin tracing:
cascadetells the agent exactly which file and line to edit — not just what the style is, but where it comes from. - Low round-trip cost: understand in 1 call, act in 1 call, verify automatically.
- Live prototyping:
injectCSS/JS into the page, test changes visually, remove when done — no dev server restart. - Real-session automation: no separate Chromium profile unless you want one.
- Production-ready ergonomics: daemon-per-tab, background event collection, WSL2-to-Windows support, Electron support.
- The Redesign Experiment
- Quick Start
- How It Works
- Commands (44 total)
- WSL2 -> Windows Browser Control
- Credits
- License
Same ugly page. Same prompt. 5 rounds. Three independent AI agents, each with a different browser observation tool. Only one variable changed: how much visual state each tool exposes.
| Before | chrome-cdp-ex |
Playwright | Other CDP |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
The agent using perceive (layout + colors + spacing + coordinates) produced the most polished result because it could actually see what needed fixing, not just parse source code.
View the live comparison ->
chrome-cdp-ex |
Playwright | Other CDP tools | |
|---|---|---|---|
| Calls to fully understand a page | 1 (perceive) |
3+ (snapshot + console + viewport) | 2+ (snap + console) |
| Tokens per page snapshot | ~800 (with layout + styles) | ~3,500 (no layout, no styles) | ~400 (no layout, no styles) |
| Calls to act and verify | 1 (auto feedback) | 2+ (act + re-snapshot) | 2+ (act + re-snapshot) |
@ref with coordinates |
Yes - @3 (200,350 200x30) |
No - ref=e376 (ID only) |
No |
| Your real browser session | Yes - tabs, cookies, logins | No - isolated Chromium | Varies |
| CSS origin tracing | Yes - cascade shows file:line |
No | No |
| Live CSS/JS injection | Yes - inject with tracking + removal |
No (page.evaluate only) | No |
| Background event collection | Yes - console, errors, navigations | Only while connected | No |
| Electron app support | Yes - CDP_PORT=9222 |
No | No |
| WSL2 -> Windows | Yes - built-in | No | No |
| Dependencies | 0 | Playwright + Chromium binary | Varies |
| Commands | 44 | N/A (programmatic API) | ~14 |
Other tools either give a screenshot and say "figure it out" or dump an AX tree without context.
perceive gives agents everything needed in one call:
$ cdp perceive abc1
📍 My App (1280x720 scroll:0/2400) — https://app.example.com
[banner] ↕80px bg:rgb(26,26,46) ↑above fold
[nav] flex
@1 [link] "Home" (12,8 60x20)
@2 [link] "Settings" (80,8 70x20)
[main] ↕2920px
@3 [textbox] "Email" (200,350 200x30)
@4 [button] "Submit" (200,400 100x40)
[contentinfo] ↕160px ↓below fold
Console: 2 errors | Interactive: 12 a, 3 button, 2 input
Structure. Layout. Styles. Scroll position. Console health. Interactive counts.
Each @ref includes bounding coordinates, all in about ~800 tokens.
Other tools can tell an agent what the page looks like. Only cascade tells it why:
$ cdp cascade abc1 @4 background-color
background-color: #2563eb
✓ .btn-primary { background-color: #2563eb }
→ src/styles/components.css:142
✗ button { background-color: #e5e7eb } [overridden]
→ src/styles/base.css:28
One command. Source file. Line number. Full cascade. The agent can now go directly to components.css:142 and make the change — no guessing, no grepping through stylesheets.
Pair it with inject for live prototyping:
$ cdp inject abc1 --css ".btn-primary { background: #dc2626 }"
inject-1
$ cdp inject abc1 --remove inject-1 # undo when done
sequenceDiagram
participant Agent
participant Chrome
Agent->>Chrome: perceive
Chrome-->>Agent: AX tree + layout + @refs with coordinates<br/>+ console health + interactive counts
Agent->>Chrome: cascade @4 background-color
Chrome-->>Agent: ✓ .btn-primary → components.css:142<br/>✗ button → base.css:28 [overridden]
Agent->>Chrome: click @4
Chrome-->>Agent: △ [dialog] "Submitted successfully"<br/>△ @4 [button] → disabled
One call to understand. One call to trace CSS origin. One call to act. Zero extra calls to verify. Action feedback is automatic.
- Clone and enter the repo.
git clone https://github.com/EndeavorYen/chrome-cdp-ex.git
cd chrome-cdp-ex- Install in Claude Code (choose one option).
# Option A: load in Claude Code for the current project/session
claude --plugin-dir .
# Option B: install globally for all projects
mkdir -p ~/.claude/skills
cp -r skills/chrome-cdp-ex ~/.claude/skills/- Enable Chrome debugging at
chrome://inspect/#remote-debuggingand toggle it on. Do not restart Chrome with--remote-debugging-port.
Requires: Node.js 22+ (uses built-in WebSocket). Auto-detects Chrome, Chromium, Brave, Edge, and Vivaldi on macOS, Linux (including Flatpak), and Windows.
Electron App Support
Connect to Electron apps exactly like Chrome, as long as remote debugging is enabled.
Step 1: Enable CDP in your Electron app (dev mode only)
// In your main process (e.g. src/main/index.ts)
if (process.env.NODE_ENV === 'development') {
app.commandLine.appendSwitch('remote-debugging-port', '9222');
}Or launch with a flag:
# macOS/Linux
electron . --remote-debugging-port=9222
# Windows (PowerShell)
electron . --remote-debugging-port=9222Step 2: Connect
CDP_PORT=9222 cdp.mjs listOutput:
[Electron 33.4.11]
1ED3DBAA My App http://localhost:5173/#/menu
All 44 commands work: perceive, click, fill, cascade, inject, and more.
Advanced Configuration
CDP_PORT- connect to a specific port (Electron, Chrome with--remote-debugging-port, etc.)CDP_PORT_FILE- override theDevToolsActivePortfile pathCDP_HOST- override the target host (default:127.0.0.1)
graph TB
subgraph Agent["AI Agent (Claude Code / Cursor / Amp)"]
CLI["cdp.mjs CLI"]
end
subgraph Daemons["Background Daemons (one per tab)"]
D1["Daemon A<br/><small>RingBuffer: console, exceptions, navigations</small>"]
D2["Daemon B"]
end
subgraph Chrome["Chrome (user's browser)"]
T1["Tab A"]
T2["Tab B"]
T3["Tab C <small>(no daemon)</small>"]
end
CLI -- "list (direct CDP)" --> Chrome
CLI -- "Unix socket /<br/>named pipe" --> D1
CLI -- "Unix socket /<br/>named pipe" --> D2
D1 -- "WebSocket<br/>CDP session" --> T1
D2 -- "WebSocket<br/>CDP session" --> T2
Each tab gets its own daemon process that keeps the CDP session open. Chrome's "Allow debugging" dialog appears once per tab, not once per command. Daemons auto-exit after 20 minutes of inactivity and passively collect console/exception/navigation events into ring buffers.
Tip: start with perceive, then use click/fill/select; use status or console when you need debugging context.
Discovery & Lifecycle
list # list open tabs (shows targetId prefixes)
open [url] # open new tab (default: about:blank)
stop [target] # stop daemon(s)
closetab <target> # close a browser tabPerception - start here
perceive <target> [flags] # enriched AX tree with @ref indices + coordinates
# --diff: show only changes since last perceive
# -s <sel>: scope to CSS selector subtree
# -i: interactive elements only
# -d N: limit tree depth
# -C: include non-ARIA clickable elements
snap <target> [--full] # accessibility tree (compact by default)
summary <target> # token-efficient overview (~100 tokens)
status <target> # URL, title + new console/exception entries
console <target> [--all|--errors] # console buffer (default: unread only)
text <target> # clean text content (strips scripts/styles/SVG)
table <target> [selector] # full table data extraction (tab-separated)Visual Capture
shot <target> [file|--annotate] # viewport screenshot; --annotate overlays @ref labels
elshot <target> <sel|@ref> # element screenshot (auto scroll + clip, no DPR issues)
scanshot <target> # segmented full-page (readable viewport-sized images)
fullshot <target> [file] # single full-page image (may be tiny on long pages)Inspection
html <target> [selector] # full HTML or scoped to CSS selector
eval <target> <expr> # evaluate JS in page context
styles <target> <selector> # computed styles (meaningful props only)
net <target> # network performance entries
netlog <target> [--clear] # network request log (XHR/Fetch with status + timing)
cookies <target> # list cookies for current page
cookieset <target> <cookie> # set a cookie ("name=value; domain=...")
cookiedel <target> <name> # delete a cookie by nameInteraction
click <target> <sel|@ref> # click element (CDP mouse events, not el.click())
clickxy <target> <x> <y> # click at CSS pixel coordinates
type <target> <text> # type at focused element (cross-origin safe)
press <target> <key> # press key (Enter, Tab, Escape, etc.)
scroll <target> <dir|x,y> [px] # scroll (down/up/left/right; default 500px)
hover <target> <sel|@ref> # hover (triggers :hover, tooltips)
fill <target> <sel|@ref> <text> # clear field + type (form filling)
select <target> <selector> <val> # select dropdown option by value
waitfor <target> <selector> [ms] # wait for element to appear (default 10s)
loadall <target> <selector> [ms] # click "load more" until gone
upload <target> <selector> <paths> # upload file(s) to <input type="file">
dialog <target> [accept|dismiss] # dialog history; set auto-accept or auto-dismissNavigation & Viewport
nav <target> <url> # navigate to URL and wait for load
back <target> # navigate back in browser history
forward <target> # navigate forward
reload <target> # reload current page
viewport <target> [WxH] # show or set viewport size (e.g. 375x812)Frontend Development (v2.2.0)
inject <target> --css "<text>" # inject inline <style> with tracking
inject <target> --css-file <url> # inject <link rel="stylesheet">
inject <target> --js-file <url> # inject <script src> and wait for load
inject <target> --remove [id] # remove injected element(s) — all or by id
cascade <target> <sel|@ref> # CSS origin tracing: full cascade with source file + line
cascade <target> <sel|@ref> <prop> # filter to one property (e.g. "background-color")inject returns an ID (inject-1, inject-2...) for targeted removal. URLs are validated (blocks data:, file:, cloud metadata).
cascade shows which CSS rule won, which were overridden, inline styles, and inherited properties — with source locations. Answers "which file do I edit?" in one call.
Advanced
batch <target> <json> # execute multiple commands in one call
# [{"cmd":"click","args":["@1"]},{"cmd":"perceive","args":["--diff"]}]
evalraw <target> <method> [json] # raw CDP command passthrough
# e.g. evalraw <t> "DOM.getDocument" '{}'Action feedback: click, clickxy, press (Enter/Escape/Tab), and select automatically wait for DOM to settle and return a perceive diff showing what changed. You usually do not need to run perceive --diff manually after these actions.
<target> is a unique targetId prefix from list. See SKILL.md for detailed usage patterns and coordinate-system notes.
This tool works across the WSL2-to-Windows boundary, where many CDP tools fail.
graph LR
subgraph WSL2["WSL2 (Linux)"]
Agent["AI Agent<br/>(Claude Code)"]
Script["cdp.mjs"]
end
subgraph Windows["Windows"]
Node["node.exe"]
Chrome["Chrome<br/>(user's browser)"]
end
Agent -- "invokes" --> Script
Script -- "/mnt/c/.../node.exe" --> Node
Node -- "CDP WebSocket<br/>localhost:port" --> Chrome
The key insight: WSL2 cannot connect to Windows localhost directly, so the script runs Windows-side node.exe via /mnt/c/... and lets that process connect to Chrome natively.
Proven pattern:
- Start Chrome on Windows and enable debugging at
chrome://inspect/#remote-debugging. - Use Windows-side Node.js to run the CDP script.
- Locate Node.js:
powershell.exe -NoProfile -Command "(Get-Command node -ErrorAction SilentlyContinue).Source" - Convert to a WSL mount path and invoke:
"/mnt/c/.../node.exe" scripts/cdp.mjs list
See SKILL.md for full WSL2 setup instructions.
- Original: pasky/chrome-cdp-skill by Petr Baudis (daemon-per-tab architecture and core CDP client)
- Contributors: ynezz (Flatpak paths), Jah-yee, Rolf Fredheim
- This fork:
@refsystem, perceive-first workflow, action feedback, background observation, realistic input simulation, form automation, WSL2 support, and 28 additional commands



