chrome-cdp-ex

Most browser automation tools launch a clean, isolated browser. chrome-cdp-ex connects to your real browser session: tabs, logins, cookies, and current page state.

Why this exists

Perceive-first workflow: one call gives structure, layout, styles, coordinates, and console health.
CSS origin tracing: cascade tells the agent exactly which file and line to edit — not just what the style is, but where it comes from.
Low round-trip cost: understand in 1 call, act in 1 call, verify automatically.
Live prototyping: inject CSS/JS into the page, test changes visually, remove when done — no dev server restart.
Real-session automation: no separate Chromium profile unless you want one.
Production-ready ergonomics: daemon-per-tab, background event collection, WSL2-to-Windows support, Electron support.

The redesign experiment

Same ugly page. Same prompt. 5 rounds. Three independent AI agents, each with a different browser observation tool. Only one variable changed: how much visual state each tool exposes.

Before	`chrome-cdp-ex`	Playwright	Other CDP

The agent using perceive (layout + colors + spacing + coordinates) produced the most polished result because it could actually see what needed fixing, not just parse source code. View the live comparison ->

The numbers

	`chrome-cdp-ex`	Playwright	Other CDP tools
Calls to fully understand a page	1 (`perceive`)	3+ (snapshot + console + viewport)	2+ (snap + console)
Tokens per page snapshot	~800 (with layout + styles)	~3,500 (no layout, no styles)	~400 (no layout, no styles)
Calls to act and verify	1 (auto feedback)	2+ (act + re-snapshot)	2+ (act + re-snapshot)
`@ref` with coordinates	Yes - `@3 (200,350 200x30)`	No - `ref=e376` (ID only)	No
Your real browser session	Yes - tabs, cookies, logins	No - isolated Chromium	Varies
CSS origin tracing	Yes - `cascade` shows file:line	No	No
Live CSS/JS injection	Yes - `inject` with tracking + removal	No (page.evaluate only)	No
Background event collection	Yes - console, errors, navigations	Only while connected	No
Electron app support	Yes - `CDP_PORT=9222`	No	No
WSL2 -> Windows	Yes - built-in	No	No
Dependencies	0	Playwright + Chromium binary	Varies
Commands	44	N/A (programmatic API)	~14

One command, complete page understanding

Other tools either give a screenshot and say "figure it out" or dump an AX tree without context. perceive gives agents everything needed in one call:

$ cdp perceive abc1
📍 My App (1280x720 scroll:0/2400) — https://app.example.com
  [banner] ↕80px bg:rgb(26,26,46) ↑above fold
    [nav] flex
      @1 [link] "Home" (12,8 60x20)
      @2 [link] "Settings" (80,8 70x20)
  [main] ↕2920px
    @3 [textbox] "Email" (200,350 200x30)
    @4 [button] "Submit" (200,400 100x40)
  [contentinfo] ↕160px ↓below fold
Console: 2 errors | Interactive: 12 a, 3 button, 2 input

Structure. Layout. Styles. Scroll position. Console health. Interactive counts. Each @ref includes bounding coordinates, all in about ~800 tokens.

"Which file do I edit to change this blue?"

Other tools can tell an agent what the page looks like. Only cascade tells it why:

$ cdp cascade abc1 @4 background-color

background-color: #2563eb
  ✓ .btn-primary { background-color: #2563eb }
    → src/styles/components.css:142
  ✗ button { background-color: #e5e7eb }  [overridden]
    → src/styles/base.css:28

One command. Source file. Line number. Full cascade. The agent can now go directly to components.css:142 and make the change — no guessing, no grepping through stylesheets.

Pair it with inject for live prototyping:

$ cdp inject abc1 --css ".btn-primary { background: #dc2626 }"
inject-1

$ cdp inject abc1 --remove inject-1    # undo when done

Why agents choose this

sequenceDiagram
    participant Agent
    participant Chrome

    Agent->>Chrome: perceive
    Chrome-->>Agent: AX tree + layout + @refs with coordinates<br/>+ console health + interactive counts

    Agent->>Chrome: cascade @4 background-color
    Chrome-->>Agent: ✓ .btn-primary → components.css:142<br/>✗ button → base.css:28 [overridden]

    Agent->>Chrome: click @4
    Chrome-->>Agent: △ [dialog] "Submitted successfully"<br/>△ @4 [button] → disabled

One call to understand. One call to trace CSS origin. One call to act. Zero extra calls to verify. Action feedback is automatic.

Quick start

Clone and enter the repo.

git clone https://github.com/EndeavorYen/chrome-cdp-ex.git
cd chrome-cdp-ex

Install in Claude Code (choose one option).

# Option A: load in Claude Code for the current project/session
claude --plugin-dir .

# Option B: install globally for all projects
mkdir -p ~/.claude/skills
cp -r skills/chrome-cdp-ex ~/.claude/skills/

Enable Chrome debugging at chrome://inspect/#remote-debugging and toggle it on. Do not restart Chrome with --remote-debugging-port.

Requires: Node.js 22+ (uses built-in WebSocket). Auto-detects Chrome, Chromium, Brave, Edge, and Vivaldi on macOS, Linux (including Flatpak), and Windows.

Electron App Support

Connect to Electron apps exactly like Chrome, as long as remote debugging is enabled.

Step 1: Enable CDP in your Electron app (dev mode only)

// In your main process (e.g. src/main/index.ts)
if (process.env.NODE_ENV === 'development') {
  app.commandLine.appendSwitch('remote-debugging-port', '9222');
}

Or launch with a flag:

# macOS/Linux
electron . --remote-debugging-port=9222

# Windows (PowerShell)
electron . --remote-debugging-port=9222

Step 2: Connect

CDP_PORT=9222 cdp.mjs list

Output:

[Electron 33.4.11]
1ED3DBAA  My App                                                  http://localhost:5173/#/menu

All 44 commands work: perceive, click, fill, cascade, inject, and more.

Advanced Configuration

CDP_PORT - connect to a specific port (Electron, Chrome with --remote-debugging-port, etc.)
CDP_PORT_FILE - override the DevToolsActivePort file path
CDP_HOST - override the target host (default: 127.0.0.1)

How It Works

graph TB
    subgraph Agent["AI Agent (Claude Code / Cursor / Amp)"]
        CLI["cdp.mjs CLI"]
    end

    subgraph Daemons["Background Daemons (one per tab)"]
        D1["Daemon A<br/><small>RingBuffer: console, exceptions, navigations</small>"]
        D2["Daemon B"]
    end

    subgraph Chrome["Chrome (user's browser)"]
        T1["Tab A"]
        T2["Tab B"]
        T3["Tab C <small>(no daemon)</small>"]
    end

    CLI -- "list (direct CDP)" --> Chrome
    CLI -- "Unix socket /<br/>named pipe" --> D1
    CLI -- "Unix socket /<br/>named pipe" --> D2
    D1 -- "WebSocket<br/>CDP session" --> T1
    D2 -- "WebSocket<br/>CDP session" --> T2

Each tab gets its own daemon process that keeps the CDP session open. Chrome's "Allow debugging" dialog appears once per tab, not once per command. Daemons auto-exit after 20 minutes of inactivity and passively collect console/exception/navigation events into ring buffers.

Commands (44 total)

Tip: start with perceive, then use click/fill/select; use status or console when you need debugging context.

Discovery & Lifecycle

list                               # list open tabs (shows targetId prefixes)
open   [url]                       # open new tab (default: about:blank)
stop   [target]                    # stop daemon(s)
closetab <target>                  # close a browser tab

Perception - start here

perceive <target> [flags]          # enriched AX tree with @ref indices + coordinates
                                   #   --diff: show only changes since last perceive
                                   #   -s <sel>: scope to CSS selector subtree
                                   #   -i: interactive elements only
                                   #   -d N: limit tree depth
                                   #   -C: include non-ARIA clickable elements
snap     <target> [--full]         # accessibility tree (compact by default)
summary  <target>                  # token-efficient overview (~100 tokens)
status   <target>                  # URL, title + new console/exception entries
console  <target> [--all|--errors] # console buffer (default: unread only)
text     <target>                  # clean text content (strips scripts/styles/SVG)
table    <target> [selector]       # full table data extraction (tab-separated)

Visual Capture

shot     <target> [file|--annotate] # viewport screenshot; --annotate overlays @ref labels
elshot   <target> <sel|@ref>        # element screenshot (auto scroll + clip, no DPR issues)
scanshot <target>                   # segmented full-page (readable viewport-sized images)
fullshot <target> [file]            # single full-page image (may be tiny on long pages)

Inspection

html      <target> [selector]       # full HTML or scoped to CSS selector
eval      <target> <expr>           # evaluate JS in page context
styles    <target> <selector>       # computed styles (meaningful props only)
net       <target>                  # network performance entries
netlog    <target> [--clear]        # network request log (XHR/Fetch with status + timing)
cookies   <target>                  # list cookies for current page
cookieset <target> <cookie>         # set a cookie ("name=value; domain=...")
cookiedel <target> <name>           # delete a cookie by name

Interaction

click   <target> <sel|@ref>         # click element (CDP mouse events, not el.click())
clickxy <target> <x> <y>            # click at CSS pixel coordinates
type    <target> <text>             # type at focused element (cross-origin safe)
press   <target> <key>              # press key (Enter, Tab, Escape, etc.)
scroll  <target> <dir|x,y> [px]     # scroll (down/up/left/right; default 500px)
hover   <target> <sel|@ref>         # hover (triggers :hover, tooltips)
fill    <target> <sel|@ref> <text>  # clear field + type (form filling)
select  <target> <selector> <val>   # select dropdown option by value
waitfor <target> <selector> [ms]    # wait for element to appear (default 10s)
loadall <target> <selector> [ms]    # click "load more" until gone
upload  <target> <selector> <paths> # upload file(s) to <input type="file">
dialog  <target> [accept|dismiss]   # dialog history; set auto-accept or auto-dismiss

Navigation & Viewport

nav      <target> <url>             # navigate to URL and wait for load
back     <target>                   # navigate back in browser history
forward  <target>                   # navigate forward
reload   <target>                   # reload current page
viewport <target> [WxH]             # show or set viewport size (e.g. 375x812)

Frontend Development (v2.2.0)

inject  <target> --css "<text>"     # inject inline <style> with tracking
inject  <target> --css-file <url>   # inject <link rel="stylesheet">
inject  <target> --js-file <url>    # inject <script src> and wait for load
inject  <target> --remove [id]      # remove injected element(s) — all or by id
cascade <target> <sel|@ref>         # CSS origin tracing: full cascade with source file + line
cascade <target> <sel|@ref> <prop>  # filter to one property (e.g. "background-color")

inject returns an ID (inject-1, inject-2...) for targeted removal. URLs are validated (blocks data:, file:, cloud metadata).

cascade shows which CSS rule won, which were overridden, inline styles, and inherited properties — with source locations. Answers "which file do I edit?" in one call.

Advanced

batch   <target> <json>             # execute multiple commands in one call
                                    # [{"cmd":"click","args":["@1"]},{"cmd":"perceive","args":["--diff"]}]
evalraw <target> <method> [json]    # raw CDP command passthrough
                                    # e.g. evalraw <t> "DOM.getDocument" '{}'

Action feedback: click, clickxy, press (Enter/Escape/Tab), and select automatically wait for DOM to settle and return a perceive diff showing what changed. You usually do not need to run perceive --diff manually after these actions.

<target> is a unique targetId prefix from list. See SKILL.md for detailed usage patterns and coordinate-system notes.

WSL2 -> Windows Browser Control

This tool works across the WSL2-to-Windows boundary, where many CDP tools fail.

graph LR
    subgraph WSL2["WSL2 (Linux)"]
        Agent["AI Agent<br/>(Claude Code)"]
        Script["cdp.mjs"]
    end

    subgraph Windows["Windows"]
        Node["node.exe"]
        Chrome["Chrome<br/>(user's browser)"]
    end

    Agent -- "invokes" --> Script
    Script -- "/mnt/c/.../node.exe" --> Node
    Node -- "CDP WebSocket<br/>localhost:port" --> Chrome

The key insight: WSL2 cannot connect to Windows localhost directly, so the script runs Windows-side node.exe via /mnt/c/... and lets that process connect to Chrome natively.

Proven pattern:

Start Chrome on Windows and enable debugging at chrome://inspect/#remote-debugging.
Use Windows-side Node.js to run the CDP script.

Locate Node.js:

powershell.exe -NoProfile -Command "(Get-Command node -ErrorAction SilentlyContinue).Source"

Convert to a WSL mount path and invoke:

"/mnt/c/.../node.exe" scripts/cdp.mjs list

See SKILL.md for full WSL2 setup instructions.

Credits

Original: pasky/chrome-cdp-skill by Petr Baudis (daemon-per-tab architecture and core CDP client)
Contributors: ynezz (Flatpak paths), Jah-yee, Rolf Fredheim
This fork: @ref system, perceive-first workflow, action feedback, background observation, realistic input simulation, form automation, WSL2 support, and 28 additional commands

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
experiment		experiment
skills/chrome-cdp-ex		skills/chrome-cdp-ex
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN.md		DESIGN.md
LICENSE		LICENSE
README.md		README.md
TODOS.md		TODOS.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
vitest.config.js		vitest.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chrome-cdp-ex

Why this exists

Contents

The redesign experiment

The numbers

One command, complete page understanding

"Which file do I edit to change this blue?"

Why agents choose this

Quick start

How It Works

Commands (44 total)

WSL2 -> Windows Browser Control

Credits

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

chrome-cdp-ex

Why this exists

Contents

The redesign experiment

The numbers

One command, complete page understanding

"Which file do I edit to change this blue?"

Why agents choose this

Quick start

How It Works

Commands (44 total)

WSL2 -> Windows Browser Control

Credits

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages