Skip to content

EndeavorYen/chrome-cdp-ex

 
 

Repository files navigation

chrome-cdp-ex

44 Commands Zero Dependencies Node 22+ MIT License

Most browser automation tools launch a clean, isolated browser. chrome-cdp-ex connects to your real browser session: tabs, logins, cookies, and current page state.

Why this exists

  • Perceive-first workflow: one call gives structure, layout, styles, coordinates, and console health.
  • CSS origin tracing: cascade tells the agent exactly which file and line to edit — not just what the style is, but where it comes from.
  • Low round-trip cost: understand in 1 call, act in 1 call, verify automatically.
  • Live prototyping: inject CSS/JS into the page, test changes visually, remove when done — no dev server restart.
  • Real-session automation: no separate Chromium profile unless you want one.
  • Production-ready ergonomics: daemon-per-tab, background event collection, WSL2-to-Windows support, Electron support.

Contents

The redesign experiment

Same ugly page. Same prompt. 5 rounds. Three independent AI agents, each with a different browser observation tool. Only one variable changed: how much visual state each tool exposes.

Before chrome-cdp-ex Playwright Other CDP
before chrome-cdp result playwright result other cdp result

The agent using perceive (layout + colors + spacing + coordinates) produced the most polished result because it could actually see what needed fixing, not just parse source code. View the live comparison ->

The numbers

chrome-cdp-ex Playwright Other CDP tools
Calls to fully understand a page 1 (perceive) 3+ (snapshot + console + viewport) 2+ (snap + console)
Tokens per page snapshot ~800 (with layout + styles) ~3,500 (no layout, no styles) ~400 (no layout, no styles)
Calls to act and verify 1 (auto feedback) 2+ (act + re-snapshot) 2+ (act + re-snapshot)
@ref with coordinates Yes - @3 (200,350 200x30) No - ref=e376 (ID only) No
Your real browser session Yes - tabs, cookies, logins No - isolated Chromium Varies
CSS origin tracing Yes - cascade shows file:line No No
Live CSS/JS injection Yes - inject with tracking + removal No (page.evaluate only) No
Background event collection Yes - console, errors, navigations Only while connected No
Electron app support Yes - CDP_PORT=9222 No No
WSL2 -> Windows Yes - built-in No No
Dependencies 0 Playwright + Chromium binary Varies
Commands 44 N/A (programmatic API) ~14

One command, complete page understanding

Other tools either give a screenshot and say "figure it out" or dump an AX tree without context. perceive gives agents everything needed in one call:

$ cdp perceive abc1
📍 My App (1280x720 scroll:0/2400) — https://app.example.com
  [banner] ↕80px bg:rgb(26,26,46) ↑above fold
    [nav] flex
      @1 [link] "Home" (12,8 60x20)
      @2 [link] "Settings" (80,8 70x20)
  [main] ↕2920px
    @3 [textbox] "Email" (200,350 200x30)
    @4 [button] "Submit" (200,400 100x40)
  [contentinfo] ↕160px ↓below fold
Console: 2 errors | Interactive: 12 a, 3 button, 2 input

Structure. Layout. Styles. Scroll position. Console health. Interactive counts. Each @ref includes bounding coordinates, all in about ~800 tokens.

"Which file do I edit to change this blue?"

Other tools can tell an agent what the page looks like. Only cascade tells it why:

$ cdp cascade abc1 @4 background-color

background-color: #2563eb
  ✓ .btn-primary { background-color: #2563eb }
    → src/styles/components.css:142
  ✗ button { background-color: #e5e7eb }  [overridden]
    → src/styles/base.css:28

One command. Source file. Line number. Full cascade. The agent can now go directly to components.css:142 and make the change — no guessing, no grepping through stylesheets.

Pair it with inject for live prototyping:

$ cdp inject abc1 --css ".btn-primary { background: #dc2626 }"
inject-1

$ cdp inject abc1 --remove inject-1    # undo when done

Why agents choose this

sequenceDiagram
    participant Agent
    participant Chrome

    Agent->>Chrome: perceive
    Chrome-->>Agent: AX tree + layout + @refs with coordinates<br/>+ console health + interactive counts

    Agent->>Chrome: cascade @4 background-color
    Chrome-->>Agent: ✓ .btn-primary → components.css:142<br/>✗ button → base.css:28 [overridden]

    Agent->>Chrome: click @4
    Chrome-->>Agent: △ [dialog] "Submitted successfully"<br/>△ @4 [button] → disabled
Loading

One call to understand. One call to trace CSS origin. One call to act. Zero extra calls to verify. Action feedback is automatic.

Quick start

  1. Clone and enter the repo.
git clone https://github.com/EndeavorYen/chrome-cdp-ex.git
cd chrome-cdp-ex
  1. Install in Claude Code (choose one option).
# Option A: load in Claude Code for the current project/session
claude --plugin-dir .

# Option B: install globally for all projects
mkdir -p ~/.claude/skills
cp -r skills/chrome-cdp-ex ~/.claude/skills/
  1. Enable Chrome debugging at chrome://inspect/#remote-debugging and toggle it on. Do not restart Chrome with --remote-debugging-port.

Requires: Node.js 22+ (uses built-in WebSocket). Auto-detects Chrome, Chromium, Brave, Edge, and Vivaldi on macOS, Linux (including Flatpak), and Windows.

Electron App Support

Connect to Electron apps exactly like Chrome, as long as remote debugging is enabled.

Step 1: Enable CDP in your Electron app (dev mode only)

// In your main process (e.g. src/main/index.ts)
if (process.env.NODE_ENV === 'development') {
  app.commandLine.appendSwitch('remote-debugging-port', '9222');
}

Or launch with a flag:

# macOS/Linux
electron . --remote-debugging-port=9222

# Windows (PowerShell)
electron . --remote-debugging-port=9222

Step 2: Connect

CDP_PORT=9222 cdp.mjs list

Output:

[Electron 33.4.11]
1ED3DBAA  My App                                                  http://localhost:5173/#/menu

All 44 commands work: perceive, click, fill, cascade, inject, and more.

Advanced Configuration
  • CDP_PORT - connect to a specific port (Electron, Chrome with --remote-debugging-port, etc.)
  • CDP_PORT_FILE - override the DevToolsActivePort file path
  • CDP_HOST - override the target host (default: 127.0.0.1)

How It Works

graph TB
    subgraph Agent["AI Agent (Claude Code / Cursor / Amp)"]
        CLI["cdp.mjs CLI"]
    end

    subgraph Daemons["Background Daemons (one per tab)"]
        D1["Daemon A<br/><small>RingBuffer: console, exceptions, navigations</small>"]
        D2["Daemon B"]
    end

    subgraph Chrome["Chrome (user's browser)"]
        T1["Tab A"]
        T2["Tab B"]
        T3["Tab C <small>(no daemon)</small>"]
    end

    CLI -- "list (direct CDP)" --> Chrome
    CLI -- "Unix socket /<br/>named pipe" --> D1
    CLI -- "Unix socket /<br/>named pipe" --> D2
    D1 -- "WebSocket<br/>CDP session" --> T1
    D2 -- "WebSocket<br/>CDP session" --> T2
Loading

Each tab gets its own daemon process that keeps the CDP session open. Chrome's "Allow debugging" dialog appears once per tab, not once per command. Daemons auto-exit after 20 minutes of inactivity and passively collect console/exception/navigation events into ring buffers.

Commands (44 total)

Tip: start with perceive, then use click/fill/select; use status or console when you need debugging context.

Discovery & Lifecycle
list                               # list open tabs (shows targetId prefixes)
open   [url]                       # open new tab (default: about:blank)
stop   [target]                    # stop daemon(s)
closetab <target>                  # close a browser tab
Perception - start here
perceive <target> [flags]          # enriched AX tree with @ref indices + coordinates
                                   #   --diff: show only changes since last perceive
                                   #   -s <sel>: scope to CSS selector subtree
                                   #   -i: interactive elements only
                                   #   -d N: limit tree depth
                                   #   -C: include non-ARIA clickable elements
snap     <target> [--full]         # accessibility tree (compact by default)
summary  <target>                  # token-efficient overview (~100 tokens)
status   <target>                  # URL, title + new console/exception entries
console  <target> [--all|--errors] # console buffer (default: unread only)
text     <target>                  # clean text content (strips scripts/styles/SVG)
table    <target> [selector]       # full table data extraction (tab-separated)
Visual Capture
shot     <target> [file|--annotate] # viewport screenshot; --annotate overlays @ref labels
elshot   <target> <sel|@ref>        # element screenshot (auto scroll + clip, no DPR issues)
scanshot <target>                   # segmented full-page (readable viewport-sized images)
fullshot <target> [file]            # single full-page image (may be tiny on long pages)
Inspection
html      <target> [selector]       # full HTML or scoped to CSS selector
eval      <target> <expr>           # evaluate JS in page context
styles    <target> <selector>       # computed styles (meaningful props only)
net       <target>                  # network performance entries
netlog    <target> [--clear]        # network request log (XHR/Fetch with status + timing)
cookies   <target>                  # list cookies for current page
cookieset <target> <cookie>         # set a cookie ("name=value; domain=...")
cookiedel <target> <name>           # delete a cookie by name
Interaction
click   <target> <sel|@ref>         # click element (CDP mouse events, not el.click())
clickxy <target> <x> <y>            # click at CSS pixel coordinates
type    <target> <text>             # type at focused element (cross-origin safe)
press   <target> <key>              # press key (Enter, Tab, Escape, etc.)
scroll  <target> <dir|x,y> [px]     # scroll (down/up/left/right; default 500px)
hover   <target> <sel|@ref>         # hover (triggers :hover, tooltips)
fill    <target> <sel|@ref> <text>  # clear field + type (form filling)
select  <target> <selector> <val>   # select dropdown option by value
waitfor <target> <selector> [ms]    # wait for element to appear (default 10s)
loadall <target> <selector> [ms]    # click "load more" until gone
upload  <target> <selector> <paths> # upload file(s) to <input type="file">
dialog  <target> [accept|dismiss]   # dialog history; set auto-accept or auto-dismiss
Navigation & Viewport
nav      <target> <url>             # navigate to URL and wait for load
back     <target>                   # navigate back in browser history
forward  <target>                   # navigate forward
reload   <target>                   # reload current page
viewport <target> [WxH]             # show or set viewport size (e.g. 375x812)
Frontend Development (v2.2.0)
inject  <target> --css "<text>"     # inject inline <style> with tracking
inject  <target> --css-file <url>   # inject <link rel="stylesheet">
inject  <target> --js-file <url>    # inject <script src> and wait for load
inject  <target> --remove [id]      # remove injected element(s) — all or by id
cascade <target> <sel|@ref>         # CSS origin tracing: full cascade with source file + line
cascade <target> <sel|@ref> <prop>  # filter to one property (e.g. "background-color")

inject returns an ID (inject-1, inject-2...) for targeted removal. URLs are validated (blocks data:, file:, cloud metadata).

cascade shows which CSS rule won, which were overridden, inline styles, and inherited properties — with source locations. Answers "which file do I edit?" in one call.

Advanced
batch   <target> <json>             # execute multiple commands in one call
                                    # [{"cmd":"click","args":["@1"]},{"cmd":"perceive","args":["--diff"]}]
evalraw <target> <method> [json]    # raw CDP command passthrough
                                    # e.g. evalraw <t> "DOM.getDocument" '{}'

Action feedback: click, clickxy, press (Enter/Escape/Tab), and select automatically wait for DOM to settle and return a perceive diff showing what changed. You usually do not need to run perceive --diff manually after these actions.

<target> is a unique targetId prefix from list. See SKILL.md for detailed usage patterns and coordinate-system notes.

WSL2 -> Windows Browser Control

This tool works across the WSL2-to-Windows boundary, where many CDP tools fail.

graph LR
    subgraph WSL2["WSL2 (Linux)"]
        Agent["AI Agent<br/>(Claude Code)"]
        Script["cdp.mjs"]
    end

    subgraph Windows["Windows"]
        Node["node.exe"]
        Chrome["Chrome<br/>(user's browser)"]
    end

    Agent -- "invokes" --> Script
    Script -- "/mnt/c/.../node.exe" --> Node
    Node -- "CDP WebSocket<br/>localhost:port" --> Chrome
Loading

The key insight: WSL2 cannot connect to Windows localhost directly, so the script runs Windows-side node.exe via /mnt/c/... and lets that process connect to Chrome natively.

Proven pattern:

  1. Start Chrome on Windows and enable debugging at chrome://inspect/#remote-debugging.
  2. Use Windows-side Node.js to run the CDP script.
  3. Locate Node.js:
    powershell.exe -NoProfile -Command "(Get-Command node -ErrorAction SilentlyContinue).Source"
  4. Convert to a WSL mount path and invoke:
    "/mnt/c/.../node.exe" scripts/cdp.mjs list

See SKILL.md for full WSL2 setup instructions.

Credits

  • Original: pasky/chrome-cdp-skill by Petr Baudis (daemon-per-tab architecture and core CDP client)
  • Contributors: ynezz (Flatpak paths), Jah-yee, Rolf Fredheim
  • This fork: @ref system, perceive-first workflow, action feedback, background observation, realistic input simulation, form automation, WSL2 support, and 28 additional commands

License

MIT

About

Give your AI agent eyes and hands on your real Chrome browser — your tabs, your logins, your page state. 42 commands, zero dependencies. Extended fork of pasky/chrome-cdp-skill.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • JavaScript 80.5%
  • HTML 19.5%