|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Commands |
| 6 | + |
| 7 | +```bash |
| 8 | +npm run build # Compile TypeScript to build/ |
| 9 | +npm run watch # Watch mode compilation |
| 10 | +npm run lint # ESLint validation |
| 11 | +npm run test # Unit tests (Vitest) |
| 12 | +npm run test:e2e # E2E tests (requires Windows + Appium setup) |
| 13 | +npm run mcp:start # Launch MCP server |
| 14 | +``` |
| 15 | + |
| 16 | +Run a single test file: |
| 17 | +```bash |
| 18 | +npx vitest run test/path/to/file.test.ts |
| 19 | +``` |
| 20 | + |
| 21 | +## Architecture |
| 22 | + |
| 23 | +This is an **Appium driver** for Windows desktop UI automation. It exposes two interfaces: |
| 24 | + |
| 25 | +1. **Appium WebDriver API** — used by test frameworks (Selenium-style) |
| 26 | +2. **MCP Server** (`lib/mcp/`) — exposes 30+ tools over Model Context Protocol for AI agent use |
| 27 | + |
| 28 | +### Core driver flow |
| 29 | + |
| 30 | +`lib/driver.ts` — `NovaWindowsDriver` extends `BaseDriver`. On `createSession()`, it starts a persistent PowerShell process that remains open for the session lifetime. All UI Automation operations are executed by sending PowerShell commands through this process and reading stdout. |
| 31 | + |
| 32 | +### Element finding |
| 33 | + |
| 34 | +Element searches go through `lib/powershell/` which builds PowerShell scripts using Windows UI Automation APIs. The driver converts Appium locator strategies (XPath, accessibility id, class name, etc.) into `UIA3` conditions via `lib/powershell/conditions.ts` and `converter.ts`. XPath is evaluated in `lib/xpath/` against the live UI Automation tree. |
| 35 | + |
| 36 | +### Input simulation |
| 37 | + |
| 38 | +Low-level mouse and keyboard events use native Windows API bindings in `lib/winapi/user32.ts` via the `koffi` FFI library. Higher-level action sequences (W3C Actions) are handled in `lib/commands/actions.ts` which translates WebDriver action chains into `user32` calls with optional easing/delay curves. |
| 39 | + |
| 40 | +### Commands |
| 41 | + |
| 42 | +All driver commands live in `lib/commands/` and are mixed into the driver class via `lib/commands/index.ts`. Key files: |
| 43 | +- `actions.ts` — mouse, keyboard, wheel via W3C ActionSequence |
| 44 | +- `element.ts` — element finding and attribute retrieval |
| 45 | +- `app.ts` — app launch/close/window management |
| 46 | +- `extension.ts` — `executeScript()` platform-specific commands |
| 47 | +- `powershell.ts` — raw PowerShell execution |
| 48 | +- `screen-recorder.ts` — FFmpeg-based recording |
| 49 | + |
| 50 | +### MCP server |
| 51 | + |
| 52 | +`lib/mcp/` is an independent MCP server binary (`novawindows-mcp`). It auto-starts and manages an Appium server process, creates WebdriverIO sessions, and exposes tools grouped by domain in `lib/mcp/tools/`. The server communicates via stdio using the `@modelcontextprotocol/sdk`. |
| 53 | + |
| 54 | +### TypeScript paths |
| 55 | + |
| 56 | +`@/` resolves to `lib/` (configured in both `tsconfig.json` and Vitest configs). |
| 57 | + |
| 58 | +## Key capabilities |
| 59 | + |
| 60 | +- `platformName`: `"Windows"`, `automationName`: `"NovaWindows"` |
| 61 | +- Supported locator strategies: `xpath`, `accessibility id`, `id`, `name`, `class name`, `tag name`, `-windows uiautomation` |
| 62 | +- Custom `executeScript()` commands listed in README.md |
| 63 | +- Prerun/postrun PowerShell scripts via session capabilities |
0 commit comments