Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
6a98b5c
feat: custom env variables capabilities
y-schwab Mar 10, 2026
71b6545
docs: add documentation for appEnvironment capability
y-schwab Mar 10, 2026
cf3d464
feat(mcp): add MCP server with 39 tools and unit test suite
y-schwab Mar 14, 2026
5b35ff7
chore: prepare package for verisoft npm distribution
y-schwab Mar 16, 2026
4c1d3d7
"Claude PR Assistant workflow"
y-schwab Mar 16, 2026
6d62808
"Claude Code Review workflow"
y-schwab Mar 16, 2026
48ce54f
Merge pull request #2 from verisoft-ai/add-claude-github-actions-1773…
y-schwab Mar 16, 2026
e95c6b2
chore(claude): Add constraints and context for claude code review
y-schwab Mar 17, 2026
e940fd1
chore: remove auto publish on push to main
y-schwab Mar 17, 2026
3acee1c
chore(claude): change constraints for claude code review
y-schwab Mar 17, 2026
dd585b9
fix: replace NovaWindows automation name with DesktopDriver
y-schwab Mar 17, 2026
3083639
Merge pull request #6 from verisoft-ai/fix/replace-automation-name
y-schwab Mar 17, 2026
c4c18e9
fix: Add allowed tools to claude code reviewer
y-schwab Mar 17, 2026
88f921d
fix: Remove claude code review workflow
y-schwab Mar 17, 2026
d7bebd9
fix: fix code review comments
y-schwab Mar 17, 2026
8d80d52
Merge pull request #9 from verisoft-ai/feat/custom-env-vars
y-schwab Mar 17, 2026
8960843
fix: fix attaching to wrong application window
y-schwab Mar 17, 2026
92eedfa
fix: Remove outerloops
y-schwab Mar 17, 2026
18d4538
Merge pull request #10 from verisoft-ai/fix/classic-app-startup-fix
y-schwab Mar 17, 2026
fd73365
fix(mcp): resolve bugs, add tool annotations, and new UIA tools
y-schwab Mar 17, 2026
46592db
Merge branch 'main' of github.com:verisoft-ai/appium-desktop-driver i…
y-schwab Mar 17, 2026
8b76810
refactor(mcp): remove auto-start, require Appium to be running extern…
y-schwab Mar 18, 2026
393bdae
chore: bump version to 1.1.0
y-schwab Mar 18, 2026
5b72f12
fix(lint): resolve lint errors
y-schwab Mar 18, 2026
a88ff7b
Merge pull request #11 from verisoft-ai/feat/mcp-server
y-schwab Mar 18, 2026
c9d3529
chore(npm): Ignore build artifacts and local mcp/claude config
y-schwab Mar 18, 2026
eefb804
fix(window): window handles access capability added
y-schwab Mar 18, 2026
52cd5d8
docs: add returnAllWindowHandles capability to README
y-schwab Mar 18, 2026
d281444
fix(window): narrow appProcessIds to the attached window's PID
y-schwab Mar 18, 2026
a7ce03f
Merge pull request #12 from verisoft-ai/fix/window-handle-access
y-schwab Mar 18, 2026
a831b3a
feat(capability): Add capabilities windowSwitchRetries and windowSwit…
y-schwab Mar 19, 2026
f1e6bff
fix(window): track child processes spawned from launched apps.
y-schwab Mar 19, 2026
3eda6b2
Merge pull request #13 from verisoft-ai/fix/child-process-tracking
y-schwab Mar 19, 2026
b53a984
Merge pull request #14 from verisoft-ai/feature/configurable-window-r…
y-schwab Mar 19, 2026
4434b99
feat: Implemented missing commands
y-schwab Mar 19, 2026
94a4f04
chore: bump version to 1.2.0
y-schwab Mar 19, 2026
c0cb0e8
fix: add tabbing
y-schwab Mar 19, 2026
8dff1b1
Merge pull request #15 from verisoft-ai/feat/new-commands-stubs
y-schwab Mar 19, 2026
a372ea3
Initial plan
Copilot Mar 20, 2026
112e644
Rename NovaWindows to Appium Desktop in documentation
Copilot Mar 20, 2026
1029aec
feat(display): add support for multi monitor testing
y-schwab Mar 23, 2026
1151851
Merge pull request #16 from verisoft-ai/copilot/update-novawindows-to…
y-schwab Mar 23, 2026
810fc4a
Merge pull request #17 from verisoft-ai/feat/multi-monitor
y-schwab Mar 23, 2026
8555903
chore(release): bump version and re-added the auto release workflow
y-schwab Mar 23, 2026
538bc4c
Merge pull request #18 from verisoft-ai/chore/auto-deploy
y-schwab Mar 23, 2026
8a7a3cf
chore(release): 1.4.0 [skip ci]
semantic-release-bot Mar 23, 2026
e77a7b0
fix(mcp): change mcp naming to match new Desktop Driver convention
y-schwab Mar 23, 2026
0993aeb
Merge pull request #19 from verisoft-ai/fix/mcp-naming
y-schwab Mar 23, 2026
f3ae9a0
chore(release): 1.4.1 [skip ci]
semantic-release-bot Mar 23, 2026
5e8e910
chore(website): create website with mcp and driver demos
y-schwab Mar 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#

## Proposed changes

Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request. If it fixes a bug or resolves a feature request, be sure to link to that issue.
Expand Down
78 changes: 78 additions & 0 deletions .github/claude-review-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Claude Review Guide — appium-desktop-driver

## Project Context

This is a **Windows desktop UI automation Appium driver** (`Appium Desktop`). It bridges WebDriver protocol to Windows UI Automation (UIA3) via a persistent PowerShell process. An MCP server layered on top exposes tools for AI agent use.

Key stack: TypeScript, Node.js, Appium BaseDriver, PowerShell, koffi FFI (user32.dll), WebdriverIO (MCP client).

---

## Severity Format

Use this format for findings:

```
[BLOCKER] — Must be fixed before merge. Security or correctness issue.
[HIGH] — Significant bug or reliability issue; should be fixed.
[MEDIUM] — Non-critical issue worth addressing.
[LOW] — Minor style, naming, or improvement suggestion.
[INFO] — Observation or question, no action required.
```

---

## Security Checklist

### PowerShell Injection
- [ ] User-supplied strings (capability values, element attributes, script arguments) are **never** interpolated raw into PowerShell strings
- [ ] `executeScript` payloads that build PS commands use proper escaping or parameterized construction
- [ ] Capability values used in pre/postrun scripts are validated and sanitized

### FFI / native bindings
- [ ] `user32.dll` calls in `lib/winapi/user32.ts` validate coordinate ranges and handle types before passing to native
- [ ] koffi struct definitions match actual Windows API signatures

### Secrets & credentials
- [ ] No API keys, tokens, or passwords in source code or test fixtures
- [ ] Capability values for app launch do not log sensitive data

---

## Testing Standards

- Unit tests live in `test/` and use **Vitest**
- New utility functions in `lib/` should have corresponding unit tests
- PowerShell condition builders (`lib/powershell/conditions.ts`, `converter.ts`) and XPath evaluator (`lib/xpath/`) are well-covered — changes here need tests
- E2E tests require a real Windows environment; don't flag missing E2E coverage for pure logic changes

---

## Architecture Rules

### Session lifecycle
- `createSession()` must start the PowerShell process cleanly
- `deleteSession()` must kill the PS process and clear all session state (element cache, capabilities)
- Any async work initiated during session must be awaited or cancelled on teardown

### Element handles
- Element IDs are ephemeral — they map to live UIA3 elements that can become stale
- Code that caches element references must handle `ElementNotFound` / stale element gracefully

### Command routing
- All new driver commands must be exported from `lib/commands/index.ts` and follow the existing mixin pattern
- MCP tools in `lib/mcp/tools/` must map cleanly to existing driver commands — avoid duplicating logic

### Error handling
- Driver errors must be wrapped in Appium error classes (e.g., `NoSuchElementError`, `InvalidArgumentError`)
- Raw PowerShell stderr should not be surfaced verbatim to the WebDriver client
- MCP tool errors should return structured error responses, not throw

---

## Code Style

- TypeScript strict mode is on — no `any` unless unavoidable and justified
- Prefer `async/await` over raw Promise chains
- `@/` path alias resolves to `lib/` — use it for imports within the library
- Avoid adding unnecessary abstraction layers for single-use logic
50 changes: 50 additions & 0 deletions .github/workflows/claude.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Claude Code

on:
issue_comment:
types: [created]
pull_request_review_comment:
types: [created]
issues:
types: [opened, assigned]
pull_request_review:
types: [submitted]

jobs:
claude:
if: |
(github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
(github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude')))
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: read
issues: read
id-token: write
actions: read # Required for Claude to read CI results on PRs
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 1

- name: Run Claude Code
id: claude
uses: anthropics/claude-code-action@v1
with:
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}

# This is an optional setting that allows Claude to read CI results on PRs
additional_permissions: |
actions: read

# Optional: Give a custom prompt to Claude. If this is not specified, Claude will perform the instructions specified in the comment that tagged it.
# prompt: 'Update the pull request description to include a summary of changes.'

# Optional: Add claude_args to customize behavior and configuration
# See https://github.com/anthropics/claude-code-action/blob/main/docs/usage.md
# or https://code.claude.com/docs/en/cli-reference for available options
# claude_args: '--allowed-tools Bash(gh pr:*)'

1 change: 0 additions & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ on:
push:
branches:
- main
- develop

permissions:
contents: write
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Claude Code local settings
.claude/

# Logs
logs
*.log
Expand Down
9 changes: 9 additions & 0 deletions .mcp.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"mcpServers": {
"desktop-driver-mcp": {
"command": "node",
"args": ["./build/lib/mcp/index.js"],
"env": {}
}
}
}
32 changes: 32 additions & 0 deletions .npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Source files
lib/
test/
examples/

# Config files
tsconfig.json
eslint.config.mjs
vitest.config.ts
vitest.e2e.config.ts
.releaserc
.npmrc

# CI/CD
.github/

# IDE
.vscode/
.claude/

# Dev tool config
.mcp.json

# Build artifacts
build/tsconfig.tsbuildinfo
build/eslint.config.*

# Misc
.gitignore
CHANGELOG.md
CLAUDE.md
MCP_README.md
46 changes: 46 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,49 @@
## [1.4.1](https://github.com/verisoft-ai/appium-desktop-driver/compare/v1.4.0...v1.4.1) (2026-03-23)

### Bug Fixes

* **mcp:** change mcp naming to match new Desktop Driver convention ([e77a7b0](https://github.com/verisoft-ai/appium-desktop-driver/commit/e77a7b02d4ed8769d61697c96c239bb03e3aef61))

## [1.4.0](https://github.com/verisoft-ai/appium-desktop-driver/compare/v1.3.1...v1.4.0) (2026-03-23)

### Features

* **capability:** Add capabilities windowSwitchRetries and windowSwitchInterval ([a831b3a](https://github.com/verisoft-ai/appium-desktop-driver/commit/a831b3af5d8d531a41483f36391d631cd787c371))
* custom env variables capabilities ([6a98b5c](https://github.com/verisoft-ai/appium-desktop-driver/commit/6a98b5c76bdfecf1ef6a893ea9c37abdd5e47c33))
* **display:** add support for multi monitor testing ([1029aec](https://github.com/verisoft-ai/appium-desktop-driver/commit/1029aec97adb420e9525e1f325c64870dc51585c))
* Implemented missing commands ([4434b99](https://github.com/verisoft-ai/appium-desktop-driver/commit/4434b996fa33cd0214c7cd44073f34806cd1c99f))
* **mcp:** add MCP server with 39 tools and unit test suite ([cf3d464](https://github.com/verisoft-ai/appium-desktop-driver/commit/cf3d464d1b74efbdddb04aa16ff3a19e19564934))

### Bug Fixes

* Add allowed tools to claude code reviewer ([c4c18e9](https://github.com/verisoft-ai/appium-desktop-driver/commit/c4c18e9f35915eb609e41572cb3c5ea15e3314a7))
* add tabbing ([c0cb0e8](https://github.com/verisoft-ai/appium-desktop-driver/commit/c0cb0e8bb4fb37c9f70b8e891c659c56142c1943))
* fix attaching to wrong application window ([8960843](https://github.com/verisoft-ai/appium-desktop-driver/commit/8960843d548c98728880c901b154215b6265b69e))
* fix code review comments ([d7bebd9](https://github.com/verisoft-ai/appium-desktop-driver/commit/d7bebd9ff1660fd065a92e7008344c3b1323bd27))
* **lint:** resolve lint errors ([5b72f12](https://github.com/verisoft-ai/appium-desktop-driver/commit/5b72f122472cdd4e563aa044b1baa2a52af7d10a))
* **mcp:** resolve bugs, add tool annotations, and new UIA tools ([fd73365](https://github.com/verisoft-ai/appium-desktop-driver/commit/fd7336552264a52ad2dda8c28bee2afcf44050a6))
* Remove claude code review workflow ([88f921d](https://github.com/verisoft-ai/appium-desktop-driver/commit/88f921d81ba9b81ff1578b2ac34c81d337670f30))
* Remove outerloops ([92eedfa](https://github.com/verisoft-ai/appium-desktop-driver/commit/92eedfa5decf6125c0f688da2d4c3bcf896491d5))
* replace NovaWindows automation name with DesktopDriver ([dd585b9](https://github.com/verisoft-ai/appium-desktop-driver/commit/dd585b9013fd5b128e10c42af86ca4e5f86f6934))
* **window:** narrow appProcessIds to the attached window's PID ([d281444](https://github.com/verisoft-ai/appium-desktop-driver/commit/d2814445ab5248483d451f1817fbff67d30d7654))
* **window:** track child processes spawned from launched apps. ([f1e6bff](https://github.com/verisoft-ai/appium-desktop-driver/commit/f1e6bfffe5bfcdebe884de207eabce8381c83a67))
* **window:** window handles access capability added ([eefb804](https://github.com/verisoft-ai/appium-desktop-driver/commit/eefb8040b2ec43795b4c985e42090dad2fa2e6ae))

### Miscellaneous Chores

* bump version to 1.1.0 ([393bdae](https://github.com/verisoft-ai/appium-desktop-driver/commit/393bdaeaa0070385a15a61affe88c059c8967c6a))
* bump version to 1.2.0 ([94a4f04](https://github.com/verisoft-ai/appium-desktop-driver/commit/94a4f046be53c3b804497ad4e1d2782860e25070))
* **claude:** Add constraints and context for claude code review ([e95c6b2](https://github.com/verisoft-ai/appium-desktop-driver/commit/e95c6b219f436dd030f4e7dd840713e651af9cb4))
* **claude:** change constraints for claude code review ([3acee1c](https://github.com/verisoft-ai/appium-desktop-driver/commit/3acee1c1565b575e84f3b087aff6085b6b524229))
* **npm:** Ignore build artifacts and local mcp/claude config ([c9d3529](https://github.com/verisoft-ai/appium-desktop-driver/commit/c9d3529cf1c259d36eaf82e64702fdd080463195))
* prepare package for verisoft npm distribution ([5b35ff7](https://github.com/verisoft-ai/appium-desktop-driver/commit/5b35ff722d8254f24fea2bfe5112f4e0bddfc1e7))
* **release:** bump version and re-added the auto release workflow ([8555903](https://github.com/verisoft-ai/appium-desktop-driver/commit/8555903c4b3038d11fe24c038ef83964f71a1710))
* remove auto publish on push to main ([e940fd1](https://github.com/verisoft-ai/appium-desktop-driver/commit/e940fd1e4f20a505ea81dd668b4240abd2053d7f))

### Code Refactoring

* **mcp:** remove auto-start, require Appium to be running externally ([8b76810](https://github.com/verisoft-ai/appium-desktop-driver/commit/8b76810041db68c960b3448173a8adca52679390))

## [1.3.1](https://github.com/AutomateThePlanet/appium-novawindows-driver/compare/v1.3.0...v1.3.1) (2026-03-09)

### Bug Fixes
Expand Down
63 changes: 63 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Commands

```bash
npm run build # Compile TypeScript to build/
npm run watch # Watch mode compilation
npm run lint # ESLint validation
npm run test # Unit tests (Vitest)
npm run test:e2e # E2E tests (requires Windows + Appium setup)
npm run mcp:start # Launch MCP server
```

Run a single test file:
```bash
npx vitest run test/path/to/file.test.ts
```

## Architecture

This is an **Appium driver** for Windows desktop UI automation. It exposes two interfaces:

1. **Appium WebDriver API** — used by test frameworks (Selenium-style)
2. **MCP Server** (`lib/mcp/`) — exposes 30+ tools over Model Context Protocol for AI agent use

### Core driver flow

`lib/driver.ts` — `NovaWindowsDriver` extends `BaseDriver`. On `createSession()`, it starts a persistent PowerShell process that remains open for the session lifetime. All UI Automation operations are executed by sending PowerShell commands through this process and reading stdout.

### Element finding

Element searches go through `lib/powershell/` which builds PowerShell scripts using Windows UI Automation APIs. The driver converts Appium locator strategies (XPath, accessibility id, class name, etc.) into `UIA3` conditions via `lib/powershell/conditions.ts` and `converter.ts`. XPath is evaluated in `lib/xpath/` against the live UI Automation tree.

### Input simulation

Low-level mouse and keyboard events use native Windows API bindings in `lib/winapi/user32.ts` via the `koffi` FFI library. Higher-level action sequences (W3C Actions) are handled in `lib/commands/actions.ts` which translates WebDriver action chains into `user32` calls with optional easing/delay curves.

### Commands

All driver commands live in `lib/commands/` and are mixed into the driver class via `lib/commands/index.ts`. Key files:
- `actions.ts` — mouse, keyboard, wheel via W3C ActionSequence
- `element.ts` — element finding and attribute retrieval
- `app.ts` — app launch/close/window management
- `extension.ts` — `executeScript()` platform-specific commands
- `powershell.ts` — raw PowerShell execution
- `screen-recorder.ts` — FFmpeg-based recording

### MCP server

`lib/mcp/` is an independent MCP server binary (`novawindows-mcp`). It auto-starts and manages an Appium server process, creates WebdriverIO sessions, and exposes tools grouped by domain in `lib/mcp/tools/`. The server communicates via stdio using the `@modelcontextprotocol/sdk`.

### TypeScript paths

`@/` resolves to `lib/` (configured in both `tsconfig.json` and Vitest configs).

## Key capabilities

- `platformName`: `"Windows"`, `automationName`: `"DesktopDriver"`
- Supported locator strategies: `xpath`, `accessibility id`, `id`, `name`, `class name`, `tag name`, `-windows uiautomation`
- Custom `executeScript()` commands listed in README.md
- Prerun/postrun PowerShell scripts via session capabilities
Loading
Loading