Codex Computer Run MCP Server gives Codex and other MCP-capable agents direct control over a signed-in desktop session. It exposes focused tools for screenshots, mouse movement, clicks, scrolling, keyboard shortcuts, Unicode paste, cursor position, and visible window discovery, plus a bundled Codex Skill for safe desktop-use workflows.
It is implemented in C# on net10.0 using ModelContextProtocol 1.3.0.
The current package and MCP manifest version is 1.1.0.
The package targets plain net10.0 so it can be distributed as a .NET tool. Windows uses native Win32 APIs; Linux and macOS use best-effort command-backed adapters.
Click to install in your preferred environment:
Note:
- These install links are prepared for the intended NuGet package identity
CP.CodexComputerRun.Mcp.Server. - If the latest package has not been published yet, use the manual source-build or published-executable configuration below.
- Run the server from the signed-in desktop session you want to control. Windows desktop automation must be launched from Windows, not WSL.
- Linux support expects
xdotoolfor pointer and keyboard actions,xrandras a display-geometry fallback,wmctrlorxdotoolfor window discovery, one ofgnome-screenshot,grim, or ImageMagickimportfor screenshots, and one ofwl-copy,xclip, orxselfor clipboard paste. - macOS support uses
screencapture,pbcopy, andosascript; pointer actions requirecliclick. Screen Recording and Accessibility permissions may be required by macOS.
Codex Computer Run gives an agent a minimal, fast desktop-control layer for:
- Observe the full desktop via PNG screenshots.
- Point the cursor at absolute virtual-screen coordinates.
- Click left, right, or middle mouse buttons where supported, including repeated clicks. The built-in macOS adapter supports left and right clicks.
- Scroll the wheel at the current cursor position or supplied coordinates.
- Press single keys and keyboard shortcuts such as
ctrl+lorctrl+shift+escape. - Paste Unicode text through the platform clipboard paste path.
- Inspect cursor position and visible top-level windows.
The server is designed for Codex computer-use workflows where the MCP client controls the active desktop.
Windows remains the primary implementation. Linux and macOS support keeps the same MCP tool surface but depends on external desktop commands that must be available inside the active graphical session.
| Area | Current behavior |
|---|---|
| Version | 1.1.0 |
| Target framework | net10.0 |
| Windows | Native Win32 implementation with virtual-screen capture, SendInput, clipboard paste, cursor position, and visible top-level window enumeration |
| Linux | Command-backed adapter using xdotool for pointer and keyboard input, xrandr for display-geometry fallback, wmctrl or xdotool for windows, screenshot command fallbacks, and clipboard command fallbacks |
| macOS | Command-backed adapter using screencapture, pbcopy, osascript, and cliclick; macOS middle-click automation is not supported by the built-in adapter |
| Unsupported OS | Deterministic unsupported-platform errors instead of silent no-ops |
| Session requirement | Signed-in interactive desktop session |
| Transport | MCP stdio |
Do not run this server from WSL to control a Windows desktop. Building from WSL through Windows dotnet.exe can work, but the MCP server itself must be launched by a Windows MCP client or Windows PowerShell session.
When this server is active, agents should follow this operating protocol:
- Call
screenshotfirst when visual context matters. - Use
cursor_positionbefore relative manual reasoning about the current pointer location. - Use
list_windowsto identify visible applications before focusing or interacting with them. - Use
move_mouse,click,scroll,press_key,hotkey, andtype_textonly when the intended foreground application is known. - Prefer
type_textfor text entry because it uses Unicode clipboard paste and is faster and more reliable than simulated per-character typing. - Keep screenshots small in conversation by setting
include_imagetofalsewhen only dimensions, platform metadata, or a saved path are needed.
The repository and NuGet package include a Codex Skill at skills/codex-computer-run. The skill teaches Codex the observation-first workflow, safety rules, and exact MCP tool names for this server.
When the packaged server starts, it tries to install the skill into the current Codex installation if CODEX_HOME is set or %USERPROFILE%\.codex already exists. Existing skill files are not overwritten during automatic install.
Manual install from a globally installed tool:
dotnet tool install --global CP.CodexComputerRun.Mcp.Server --version 1.*
codex-computer-run-mcp-server --install-codex-skillManual install from source:
dotnet run --project .\src\CodexComputerRunMCPServer\CodexComputerRunMCPServer.csproj -- --install-codex-skillSet CODEX_HOME first if Codex uses a non-default location:
$env:CODEX_HOME = "C:\Users\you\.codex"
codex-computer-run-mcp-server --install-codex-skillTo refresh an existing installed copy with the packaged skill files, add --force.
Use the skill in Codex by asking for it explicitly, for example:
Use $codex-computer-run to list visible windows, take a screenshot, and confirm the active desktop state.
Captures the current desktop as PNG.
Parameters:
path(optional) - output PNG path. If omitted, the image is returned in memory and no temporary file is created.include_image(default:true) - include PNG image data in the MCP tool result.
Response: The first content block is JSON metadata with message, path, mimeType, platform, left, top, width, and height. When include_image is true, a PNG image block is also returned.
When to use: Use before interacting with the desktop, after UI changes, or when the agent needs visual confirmation.
Moves the cursor to absolute desktop coordinates.
Parameters:
x- absolute X coordinate.y- absolute Y coordinate.delay(optional) - seconds to wait after the action.
When to use: Use before a click or hover-sensitive action.
Clicks at the current cursor position or at supplied absolute coordinates.
Parameters:
x(optional) - absolute X coordinate.y(optional) - absolute Y coordinate.button(default:left) -left,right, ormiddle;middleis not supported by the built-in macOS adapter.clicks(default:1) - number of clicks.interval(default:0.08) - seconds between repeated clicks.delay(optional) - seconds to wait after the action.
When to use: Use for buttons, menus, tabs, context menus, and desktop UI selection.
Scrolls the mouse wheel.
Parameters:
amount(default:-3) - wheel notches. Positive scrolls up, negative scrolls down.x(optional) - absolute X coordinate to move to before scrolling.y(optional) - absolute Y coordinate to move to before scrolling.delay(optional) - seconds to wait after the action.
When to use: Use for lists, pages, combo boxes, and scrollable application panes.
Presses one keyboard key.
Parameters:
key- key name or single character, for exampleenter,tab,escape,f5,a,A,?, or1.duration(default:0.03) - seconds to hold the key.delay(optional) - seconds to wait after the action.
When to use: Use for navigation keys, function keys, confirm/cancel actions, and single-character shortcuts.
Presses a keyboard shortcut.
Parameters:
keys- shortcut text using+, comma, or space separators, for examplectrl+l,ctrl+shift+escape, oralt+tab.delay(optional) - seconds to wait after the action.
When to use: Use for application shortcuts, browser address bar focus, task switching, command palettes, and system shortcuts.
Pastes Unicode text into the focused application using the platform clipboard paste path.
Parameters:
text- text to paste.delay(optional) - seconds to wait after the action.
When to use: Use for text fields, editors, terminals, and any non-trivial text entry.
Returns the current desktop cursor position as JSON.
When to use: Use before or after mouse actions when the agent needs exact coordinates.
Lists visible top-level desktop windows as JSON.
Parameters:
limit(default:50) - maximum number of windows to return.
When to use: Use to identify visible applications and window titles before interacting with the desktop.
- Screenshot capture avoids temporary files when
pathis omitted. include_image:falseavoids PNG encoding unless apathis supplied.- Windows mouse and keyboard actions use batched
SendInputcalls instead of legacy per-event APIs. - Windows
hotkeypresses all keys down and releases them in reverse order in one batch. - Windows clipboard access retries briefly when another process has the clipboard open.
- Windows visible window enumeration caches process names by PID during each call.
- Startup enables per-monitor DPI awareness on Windows for correct coordinate and screenshot behavior on mixed-DPI displays.
- Linux and macOS adapters fail with actionable dependency messages when required desktop commands are missing.
- Release publishing enables single-file and ReadyToRun output for faster Codex startup.
CodexComputerRunMCPServer.slnx # Root solution wrapper for CI and local pack commands
src/
|-- CodexComputerRunMCPServer/ # MCP host, tools, service layer, and platform adapters
|-- CodexComputerRunMCPServer.Tests/ # TUnit unit and MCP integration tests
`-- CodexComputerRunMCPServer.slnx # Source solution file
.mcp/
|-- server.json # MCP registry/package metadata
`-- install.md # Manual MCP install snippets
skills/
`-- codex-computer-run/ # Codex Skill bundled into the NuGet package
The server allows multiple Codex sessions to start their own MCP server process so tool discovery remains available in each session. Input-changing tools still coordinate desktop control by taking an exclusive, renewable lease under the platform local application-data folder:
CodexComputerRunMCPServer\control.lock
On Windows this is normally %LOCALAPPDATA%\CodexComputerRunMCPServer\control.lock. On Linux and macOS it follows .NET's local application-data location for the signed-in user, falling back to the temp directory if no local application-data path is available.
The control lease is acquired by move_mouse, click, scroll, press_key, hotkey, and type_text. If another Codex session currently owns the lease, the tool call fails with a busy message instead of allowing simultaneous mouse, keyboard, or clipboard input. Observation tools (screenshot, cursor_position, and list_windows) remain available from every session.
After the latest control action, the owning process keeps the lease briefly so follow-up clicks or keystrokes from the same session are not interleaved with another session. The lease is also released immediately when the owning MCP process exits.
Idle shutdown is disabled by default so long-lived Codex sessions can call the MCP tools later without finding a closed stdio transport. If you explicitly enable idle shutdown, every tool call updates activity state and active calls are never stopped mid-invocation.
Optional environment overrides:
| Variable | Default | Detail |
|---|---|---|
CODEX_COMPUTER_RUN_CONTROL_LOCK |
true |
Set false to disable cross-session desktop-control coordination. |
CODEX_COMPUTER_RUN_CONTROL_LEASE_SECONDS |
60 |
Seconds the owning session keeps desktop control after the latest input-changing action. Set 0 to release immediately after each action. |
CODEX_COMPUTER_RUN_IDLE_SHUTDOWN |
false |
Set true to enable idle shutdown. |
CODEX_COMPUTER_RUN_IDLE_TIMEOUT_SECONDS |
300 |
Seconds without tool activity before shutdown when idle shutdown is enabled. Values 0 or lower disable idle shutdown. |
CODEX_COMPUTER_RUN_IDLE_CHECK_INTERVAL_SECONDS |
10 |
Seconds between idle checks. |
After publishing, Codex can launch the optimized executable directly. Use the runtime identifier that matches the OS running the signed-in desktop session.
Windows:
[mcp_servers.codex-computer-run]
command = "PathTo\\CodexComputerRunMCPServer\\artifacts\\publish\\win-x64\\CodexComputerRunMCPServer.exe"
args = []Linux or macOS:
[mcp_servers.codex-computer-run]
command = "/path/to/CodexComputerRunMCPServer/artifacts/publish/linux-x64/CodexComputerRunMCPServer"
args = []The checked-in .codex/config.toml uses the Windows fast published-executable path for this workspace.
Published executable:
Windows:
{
"mcpServers": {
"codex-computer-run": {
"command": "PathTo\\CodexComputerRunMCPServer\\artifacts\\publish\\win-x64\\CodexComputerRunMCPServer.exe",
"args": []
}
}
}Linux or macOS:
{
"mcpServers": {
"codex-computer-run": {
"command": "/path/to/CodexComputerRunMCPServer/artifacts/publish/linux-x64/CodexComputerRunMCPServer",
"args": []
}
}
}NuGet package through dnx:
{
"mcpServers": {
"codex-computer-run": {
"command": "dnx",
"args": [
"CP.CodexComputerRun.Mcp.Server@1.*",
"--yes"
]
}
}
}Development source run:
{
"mcpServers": {
"codex-computer-run": {
"command": "dotnet",
"args": [
"run",
"--project",
"PathTo\\CodexComputerRunMCPServer\\src\\CodexComputerRunMCPServer\\CodexComputerRunMCPServer.csproj",
"--configuration",
"Release",
"--no-launch-profile"
]
}
}
}Use forward slashes in the project path on Linux and macOS.
No. They are optional convenience snippets for MCP clients that import JSON config files manually.
Required or primary MCP/Codex files are:
.mcp/server.jsonfor MCP package metadata..mcp/install.mdfor install notes.skills/codex-computer-runfor the bundled Codex Skill..codex/config.tomlfor this local Codex workspace..mcp.jsononly if your client reads repository-local MCP JSON configuration.
Windows PowerShell:
dotnet restore .\CodexComputerRunMCPServer.slnx
dotnet build .\CodexComputerRunMCPServer.slnx --configuration ReleaseLinux or macOS:
dotnet restore ./CodexComputerRunMCPServer.slnx
dotnet build ./CodexComputerRunMCPServer.slnx --configuration ReleaseIf a running MCP server locks the default bin\Release output, build to a verification output path:
dotnet build .\CodexComputerRunMCPServer.slnx --configuration Release --no-restore /p:OutputPath=D:\Projects\Github\chrispulman\CodexComputerRunMCPServer\artifacts\verify\bin\Windows PowerShell:
dotnet test .\src\CodexComputerRunMCPServer.Tests\CodexComputerRunMCPServer.Tests.csproj --configuration ReleaseLinux or macOS:
dotnet test ./src/CodexComputerRunMCPServer.Tests/CodexComputerRunMCPServer.Tests.csproj --configuration ReleaseCoverage with TUnit/Microsoft Testing Platform:
dotnet test .\src\CodexComputerRunMCPServer.Tests\CodexComputerRunMCPServer.Tests.csproj --configuration Release -- --coverage --coverage-output coverage.cobertura.xml --coverage-output-format cobertura --results-directory .\artifacts\test-resultsCurrent verification:
- 61 TUnit tests passed.
- Coverage: 77.65% line coverage, 48.37% branch coverage for testable code.
- Repository and package verification confirm
skills/codex-computer-run/SKILL.mdandskills/codex-computer-run/agents/openai.yamlare bundled. - Native Win32 P/Invoke shims are excluded from coverage and verified through the service boundary plus live MCP tool discovery.
The helper script name is historical; it now accepts Windows, Linux, and macOS runtime identifiers.
.\scripts\publish-windows.ps1 -Runtime win-x64
.\scripts\publish-windows.ps1 -Runtime linux-x64
.\scripts\publish-windows.ps1 -Runtime osx-arm64Direct command:
dotnet publish .\src\CodexComputerRunMCPServer\CodexComputerRunMCPServer.csproj --configuration Release --runtime win-x64 --self-contained false --output .\artifacts\publish\win-x64The TUnit suite verifies MCP metadata, the bundled Codex Skill, platform adapters, lifecycle behavior, and the static tool facade. The published win-x64 executable was also validated with an MCP stdio initialize and tools/list handshake. The server reported all 9 tools:
scroll, hotkey, type_text, screenshot, list_windows, click, move_mouse, press_key, cursor_position
Live Linux and macOS desktop behavior depends on the active graphical session, installed command dependencies, and OS-level permissions.
Once configured, you can ask things like:
- "Call
screenshotand describe the active window." - "List visible windows and tell me which browser tabs or apps are available."
- "Move the mouse to
x=400,y=300, click, then take another screenshot." - "Press
ctrl+l, typehttps://example.com, then pressenter." - "Paste this text into the focused editor using
type_text." - "Scroll down 5 notches and confirm what changed on screen."
- "Get the cursor position before clicking."
This server controls the active desktop. Mouse, keyboard, and clipboard actions affect the currently focused application. Use it only in a trusted desktop session and pair destructive UI actions with screenshots or window checks first.