Support end-to-end reload workflow + codex configurator#2
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
- Replace polling loop in reload_plugin with event-driven wait_for_session (race-safe pre-check + Future-based wait instead of 50ms sleep loop) - Clean up session waiter futures on timeout in SessionRegistry - Remove [reload-smoke] debug tag from plugin log - Remove TOCTOU dir existence check in Codex configurator - Add handler unit tests for all domains (testing, client, session, scene, node, editor, project) — closes coverage from 85% to 97% - Add integration tests for client, session, testing tools and resource reads - Add ASGI error path tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR completes an end-to-end reload workflow for local development by introducing a shared runtime/handler layer, adding a reloadable ASGI entrypoint for HTTP transports (to enable real Uvicorn --reload), wiring robust reload_plugin behavior, and expanding the Godot plugin’s client configurator to support Codex.
Changes:
- Introduces shared
handlers/*plusruntime/*to unify tool/resource logic across editor/scene/node/project/session/testing flows. - Adds
godot_ai.asgiand updates CLI--reloadbehavior to use Uvicorn’s factory/import-string reload path for HTTP transports. - Adds plugin-side reload support + dev-server controls, and adds Codex client configuration support (
~/.codex/config.toml) with new regression tests.
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_session_registry.py | Adds async coverage for SessionRegistry.wait_for_session() behavior. |
| tests/unit/test_runtime_handlers.py | Adds unit tests for shared handlers + DirectRuntime adapter (including reload behavior). |
| tests/unit/test_resources.py | Makes resource registration test compatible with multiple FastMCP APIs (list_resources vs get_resources). |
| tests/unit/test_cli_reload.py | Verifies ASGI factory + CLI reload wiring uses Uvicorn factory path and reload dirs. |
| tests/integration/test_mcp_tools.py | Adds integration coverage for reload_plugin reconnect cycle; minor formatting updates. |
| src/godot_ai/tools/testing.py | Switches tool implementations to shared testing handlers via DirectRuntime. |
| src/godot_ai/tools/session.py | Switches session tools to shared session handlers via DirectRuntime. |
| src/godot_ai/tools/scene.py | Switches scene tools to shared scene handlers via DirectRuntime. |
| src/godot_ai/tools/project.py | Switches project tools to shared project handlers via DirectRuntime. |
| src/godot_ai/tools/node.py | Switches node tools to shared node handlers via DirectRuntime. |
| src/godot_ai/tools/editor.py | Switches editor tools to shared editor handlers and adds reload_plugin tool. |
| src/godot_ai/tools/client.py | Switches client tools to shared client handlers; updates docs to include Codex. |
| src/godot_ai/sessions/registry.py | Adds session waiters + wait_for_session() for reconnect/reload workflows. |
| src/godot_ai/runtime/interface.py | Defines a Runtime protocol consumed by shared handlers. |
| src/godot_ai/runtime/direct.py | Adds in-process DirectRuntime adapter for FastMCP lifespan context. |
| src/godot_ai/runtime/init.py | Introduces runtime package. |
| src/godot_ai/resources/sessions.py | Routes sessions resource through shared session handlers/runtime adapter. |
| src/godot_ai/resources/scenes.py | Routes scene resources through shared scene handlers/runtime adapter. |
| src/godot_ai/resources/project.py | Routes project resources through shared project handlers/runtime adapter. |
| src/godot_ai/resources/editor.py | Routes editor resources through shared editor handlers/runtime adapter. |
| src/godot_ai/handlers/testing.py | Adds shared handler implementations for testing commands. |
| src/godot_ai/handlers/session.py | Adds shared handler implementations for session commands/resources. |
| src/godot_ai/handlers/scene.py | Adds shared handler implementations for scene commands/resources. |
| src/godot_ai/handlers/project.py | Adds shared handler implementations for project commands/resources. |
| src/godot_ai/handlers/node.py | Adds shared handler implementations for node commands. |
| src/godot_ai/handlers/editor.py | Adds shared handler implementations for editor commands/resources incl. reload workflow. |
| src/godot_ai/handlers/client.py | Adds shared handler implementations for client configuration/status commands. |
| src/godot_ai/handlers/init.py | Introduces handlers package. |
| src/godot_ai/asgi.py | Adds reloadable ASGI app factory + run_with_reload() Uvicorn runner. |
| src/godot_ai/init.py | Updates CLI entrypoint to support reload runner and injectable argv for testing. |
| plugin/addons/godot_ai/plugin.gd | Registers plugin reload command; adds dev-server start/stop helpers and passes plugin into dock. |
| plugin/addons/godot_ai/mcp_dock.gd | Adds dev-server start/stop UI button and wires it to plugin methods. |
| plugin/addons/godot_ai/handlers/editor_handler.gd | Adds plugin-side reload_plugin command handler. |
| plugin/addons/godot_ai/client_configurator.gd | Adds Codex client support with configure/status/remove flows via ~/.codex/config.toml. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if _plugin.is_dev_server_running(): | ||
| _plugin.stop_dev_server() | ||
| else: | ||
| _plugin.start_dev_server() |
There was a problem hiding this comment.
The dock calls _plugin.start_dev_server() directly, but start_dev_server() contains an await and therefore returns a coroutine/function state unless awaited. This likely prevents the dev server from actually starting. Either await the call here (and make this handler async) or refactor start_dev_server() to be non-yielding.
| _plugin.start_dev_server() | |
| await _plugin.start_dev_server() |
| _dock.setup(_connection, _log_buffer, self) | ||
| add_control_to_dock(DOCK_SLOT_RIGHT_BL, _dock) | ||
|
|
||
| _log_buffer.log("plugin loaded") | ||
|
|
There was a problem hiding this comment.
The log message "plugin loaded [reload-smoke]" looks like a temporary smoke-test marker, but it’s left in the committed changes and will permanently alter user-facing logs. If this was only for manual validation, it should be reverted back to the normal message before merge (also aligns with the PR description mentioning temporary markers were reverted).
| static func _get_codex_config_path() -> String: | ||
| return OS.get_environment("HOME").path_join(".codex/config.toml") | ||
|
|
There was a problem hiding this comment.
_get_codex_config_path() uses OS.get_environment("HOME") unconditionally. On Windows this env var is often unset, which would produce a relative path like .codex/config.toml and read/write the wrong location. Consider using a Windows fallback (e.g., USERPROFILE) and/or validating that the resolved home directory is non-empty before writing.
| def from_context(cls, ctx: Context) -> DirectRuntime: | ||
| app = ctx.fastmcp._lifespan_result |
There was a problem hiding this comment.
DirectRuntime.from_context() reaches into ctx.fastmcp._lifespan_result, which is a private attribute (leading underscore) and can break with FastMCP upgrades. Prefer an official Context API if available (e.g. ctx.lifespan_context) or add a compatibility fallback (try public attributes first, then private) to avoid a hard runtime failure for all tools/resources.
| def from_context(cls, ctx: Context) -> DirectRuntime: | |
| app = ctx.fastmcp._lifespan_result | |
| def _get_app_context(cls, ctx: Context) -> SupportsDirectRuntime | None: | |
| app = getattr(ctx, "lifespan_context", None) | |
| if app is not None: | |
| return app | |
| fastmcp = getattr(ctx, "fastmcp", None) | |
| if fastmcp is None: | |
| return None | |
| app = getattr(fastmcp, "lifespan_context", None) | |
| if app is not None: | |
| return app | |
| return getattr(fastmcp, "_lifespan_result", None) | |
| @classmethod | |
| def from_context(cls, ctx: Context) -> DirectRuntime: | |
| app = cls._get_app_context(ctx) |
| class TestWaitForSession: | ||
| async def test_resolves_immediately_if_session_already_registered(self): | ||
| """If a session registers before wait starts, it resolves on the next one.""" | ||
| reg = SessionRegistry() | ||
| s = _make_session("new-1") | ||
|
|
||
| async def register_soon(): | ||
| await asyncio.sleep(0.05) | ||
| reg.register(s) |
There was a problem hiding this comment.
The test docstring says the session registers "before wait starts", but the test schedules registration after a sleep while wait_for_session() is already pending. Consider updating the docstring to reflect the actual scenario being validated (race where the waiter is installed before registration).
- Remove await from start_dev_server, use timer callback instead so the dock call site does not silently drop the coroutine - Escape %a in Windows stop_dev_server cmd string to avoid GDScript format operator treating it as a specifier - Use USERPROFILE fallback for Codex config path on Windows - Switch get_event_loop() to get_running_loop() in wait_for_session and move waiter cleanup to finally block (covers cancellation too) - Fix misleading test docstring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs after the GDScript test suite on all three OSes. The script calls reload_plugin via MCP HTTP, waits for the new session, then calls editor_state to verify the plugin is alive post-reload. This exercises the real EditorInterface.set_plugin_enabled path that can't be tested from within the plugin's own test runner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nuity - Before reload: create a MeshInstance3D (ReloadTestCube) via old plugin - After reload: check logs_read starts with 'plugin loaded' (proves handlers/log buffer were fully rebuilt, not reused) - After reload: find ReloadTestCube via new plugin (proves the scene tree survived the plugin disable/enable cycle) - Extract mcp_call helper to reduce SSE parsing repetition Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ping The mcp_call helper was swallowing error output into the captured variable, then set -e killed the script before it could be printed. Now errors go to stderr and mcp_call_or_die prints diagnostics. Also handles isError tool responses and fixes default args brace issue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…st counts CLAUDE.md: - Add streamable-http transport to architecture - Document handler/runtime abstraction pattern - Update 'adding a new tool' steps for handler layer - Add Codex to client configuration list - Document reload_plugin tool and ASGI reload path - Update test count: 81 -> 140 implementation-plan.md: - Mark CI setup tasks as done (Tier 1+2, Codecov, badges) - Add reload workflow items as completed - Add Codex client and handler/runtime layer as completed - Update test count: 125 -> 184 - Add project_handler.gd to plugin file listing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…uplicate-ID reject Three small audit-v2 fixes in src/godot_ai/transport/websocket.py covering #343 sub-findings #2, #4, and #5. - #4 (P1): replace hardcoded `e.errno == 48` (macOS-only) with `errno.EADDRINUSE` so Linux (98) and Windows (10048) also hit the friendly "port already in use" branch instead of crashing the WS lifespan with an unhandled OSError. - #5 (P2): wrap `send_command`'s `ws.send` + `wait_for` in a try/finally that always pops `_pending`. Pre-fix, a `ws.send` raise (ConnectionClosed mid-send, transport error) leaked the Future entry forever; under churn the dict grew unbounded. Happy path still pops via the response receiver, so the finally pop is a no-op there. - #2 (P1): reject a second handshake whose session_id is already registered (close code 4001), instead of silently overwriting the routing map. session_id format is `<slug>@<4hex>` — 16 bits of suffix is locally guessable, so without this any local peer could hijack an active session by impersonating its ID. Legitimate plugin reconnect after `editor_reload_plugin` first triggers ConnectionClosed -> unregister, so the new connect still lands cleanly. Tests: - tests/unit/test_websocket_server_lifecycle.py — parametrized errno coverage (macos 48, linux 98, windows 10048; one runs per CI host) plus a non-EADDRINUSE OSError propagation case. - tests/integration/test_websocket.py — TestDuplicateHandshake pins the reject + reconnect-after-clean-disconnect paths; TestPendingFutureCleanup pins the timeout-pop and send-failure-pop behaviors. Live-smoked against a real Godot 4.6 editor: full GDScript suite 1037/1040 passed, duplicate-handshake rejected with close 4001, original session unaffected. https://claude.ai/code/session_016ijmCD5S6QfwJGJcc5Wirp
…ests Per #316 acceptance criterion #2: after 12+ fix attempts the Camera2D `make_current` → `Viewport.camera_2d` slot propagation still occasionally lags on macOS headless. The handler-side logical-current bookkeeping (#311) stays correct in that race; only the engine-side viewport slot lags. The test polls 200 ms (#341 widened diagnostics) but observed lag has been up to ~600 ms in the wild — beyond any reasonable wait budget for a unit test. Add `_skip_if_macos_engine_lag(cam, expected, report, label)`: - if the wait already settled → noop, caller continues - if not on macOS headless → noop, caller's hard assertion fails normally - if handler logical state disagrees with expectation → noop (real regression, caller's hard assertion fires with the diagnostic) - only when on macOS headless AND handler state agrees → skip() with the full diagnostic message so the next investigator still sees the divergence Wired into `test_create_with_make_current_unmarks_sibling` and the four poll sites in `test_configure_current_sibling_unmark_single_undo`. Linux / Windows behavior is unchanged — those platforms still hard-fail on a missed propagation. This intentionally does NOT close the underlying race: it converts the known upstream-flavor flake into a soft signal so CI stops reporting noise on it. The diagnostic-rich skip message keeps the failure visible enough that any NEW divergence (e.g. a real handler regression) still surfaces. Refs: #316
…ests (#372) * Skip persistent macOS-headless engine-state lag in #316 flake-prone tests Per #316 acceptance criterion #2: after 12+ fix attempts the Camera2D `make_current` → `Viewport.camera_2d` slot propagation still occasionally lags on macOS headless. The handler-side logical-current bookkeeping (#311) stays correct in that race; only the engine-side viewport slot lags. The test polls 200 ms (#341 widened diagnostics) but observed lag has been up to ~600 ms in the wild — beyond any reasonable wait budget for a unit test. Add `_skip_if_macos_engine_lag(cam, expected, report, label)`: - if the wait already settled → noop, caller continues - if not on macOS headless → noop, caller's hard assertion fails normally - if handler logical state disagrees with expectation → noop (real regression, caller's hard assertion fires with the diagnostic) - only when on macOS headless AND handler state agrees → skip() with the full diagnostic message so the next investigator still sees the divergence Wired into `test_create_with_make_current_unmarks_sibling` and the four poll sites in `test_configure_current_sibling_unmark_single_undo`. Linux / Windows behavior is unchanged — those platforms still hard-fail on a missed propagation. This intentionally does NOT close the underlying race: it converts the known upstream-flavor flake into a soft signal so CI stops reporting noise on it. The diagnostic-rich skip message keeps the failure visible enough that any NEW divergence (e.g. a real handler regression) still surfaces. Refs: #316 * ci: trigger workflow after PR base change to beta * Tighten #316 macOS gate to require explicit logical marker (PR review) Copilot review on PR #372: `_handler_logical_current_matches` was using `camera_get(path).current` which routes through `_resolve_current`, which falls back to `_is_effective_current` (raw engine state) when no logical marker is set. For `expected=false` that meant the gate could return "matches" when the handler had lost track entirely — masking a real regression as a documented engine-state lag. In practice the case is narrow (`is_current()` and `viewport.get_camera_2d() == cam` are equivalent absent a stale-viewport divergence, which #316 diagnostics ruled out via `cam_viewport_matches_scene=true`), but the gate must not rely on that abstractly. - `camera_handler.gd`: add `peek_logical_current(scene_root, type_str)` — public introspection that returns the logical-current camera (or null when no marker), bypassing the engine fallback. - `test_camera.gd`: rewrite `_handler_logical_current_matches` to consult `peek_logical_current` directly. Skip only fires when: - expected=true: marker points at this camera - expected=false: marker exists for the type and points elsewhere No marker → refuse to skip in either direction → real regression fails. Refs: #316 --------- Co-authored-by: Claude <noreply@anthropic.com>
…#375) * Reject non-loopback Host/Origin on WS + HTTP transports (audit-v2 #1) Closes #345 — refs #343 (audit-v2 umbrella). Both transports bound to 127.0.0.1, but a malicious browser tab could mount **DNS rebinding**: resolve attacker.example.com to 127.0.0.1, then `new WebSocket("ws://attacker.example.com:9500")`. The request lands on our loopback socket carrying a non-localhost Host (and Origin), the WS handshake registered the peer as a session, and any MCP tool became drivable from a foreign origin — including write tools. The streamable-HTTP transport on :8000 had the same gap, including `/godot-ai/status` (comment-marked "small unauthenticated probe"). Per the issue's "Fix shape": **strict Host/Origin allowlist in a Starlette middleware ahead of FastMCP** plus a matching `process_request` hook on the WebSocket server. `src/godot_ai/transport/origin_guard.py` (new): - `is_allowed_host` — accepts `127.0.0.1`, `localhost`, `[::1]` with optional `:port`; case-insensitive; bare `::1` rejected (RFC 7230 requires bracketed IPv6 in Host). - `is_allowed_origin` — None / empty / `null` accepted (native clients omit Origin; sandboxed/file:// emit `null`); otherwise must parse to a URL whose hostname is loopback. Schemes outside `http/https/ws/wss` rejected. - `LocalhostOnlyHTTPMiddleware` — ASGI middleware. Rejects on first failure with HTTP 403; passes lifespan scopes through; handles duplicate-Host smuggling shapes (fail closed when the same header appears more than once). - `make_websocket_request_guard()` — `process_request` hook for `websockets.asyncio.server.serve(...)` mirroring the same logic. Uses `headers.get_all` so a smuggled duplicate Host fails closed rather than tripping `MultipleValuesError` and surfacing as 500. Wiring: - `transport/websocket.py`: pass `process_request=` to `serve()`. - `server.py`: outermost wrap on the HTTP app, applied to every HTTP transport (`http`, `streamable-http`, `sse`) so `/godot-ai/status` and the FastMCP endpoints are guarded uniformly. Native clients keep working: the Godot plugin's WebSocketPeer sends `Host: 127.0.0.1:<port>` and no Origin. Verified by smoke against a real WebSocketServer (loopback connect → `handshake_ack`; rebound Origin → 403) and against the streamable-HTTP transport (`/godot-ai/status`: loopback → 200, `Host: attacker.example.com` → 403, loopback Host + browser Origin → 403). Tests: - `tests/unit/test_origin_guard.py` (44 tests): parametrized helper coverage (loopback / sneaky-substring / IP / malformed / multiple smuggled values), middleware behavior, lifespan passthrough, `state` `__getattr__` passthrough. - `tests/integration/test_websocket.py::TestDnsRebindingGuard` (5 tests): live `websockets` server + client. Loopback succeeds, non-loopback Host returns 403, non-loopback Origin returns 403, loopback-shaped Origin succeeds, rejected request never registers a session. - `tests/unit/test_asgi_session_diagnostics.py`: structural assertions updated for the new outer wrap; FastMCP `state` lookup now traverses one extra layer. - `tests/unit/test_server_status.py`: `TestClient(base_url=...)` set to a loopback host so the request passes the new guard. Test plan: - ruff check + format: clean - pytest -q: 825 passed (+45 new) - script/ci-check-gdscript: clean (no GDScript touched) - Live smoke: real `GodotWebSocketServer` and real streamable-HTTP app, loopback succeeds, non-loopback Host/Origin → 403. Note: PR #366 (audit-v2 #2, session-hijack) closes the duplicate-handshake takeover. This PR closes the rebinding-from-browser path. Together they harden the WebSocket transport at both layers (who can connect, and what they can do once they have). * Simplify: extract evaluate_loopback so WS + HTTP guards can't drift Both the WebSocket ``process_request`` hook and ``LocalhostOnlyHTTPMiddleware`` implemented the same 3-step rule (count duplicate headers → check ``is_allowed_host`` → check ``is_allowed_origin`` → 403) twice, with slightly different intermediate state. Funnel both through one ``evaluate_loopback(hosts, origins) -> bool`` so a regression in one transport cannot accidentally diverge from the other. Drive-by: - Decode FORBIDDEN_BODY once at module load (FORBIDDEN_BODY_TEXT) so the WS guard doesn't decode the same bytes per rejection. - Drop redundant ``parsed.scheme.lower()`` — ``urlsplit`` already normalises the scheme per RFC 3986. No behavior change. 825 tests pass. * Address Copilot review: reject Origin: null + Sec-Fetch-Site cross-origin Two browser-side liveness/bypass shapes that the first cut of the loopback guard let through: 1. **``Origin: null``** (Copilot, origin_guard.py:85). Sandboxed ``<iframe sandbox>`` and downloaded ``file://`` pages serialize their origin as ``null``. The original guard accepted this on the theory that file:// /sandboxed contexts are "legitimate"; in practice they're exactly the bypass an attacker would use to bridge a foreign origin onto our loopback socket. **Now rejected.** Native clients never produce ``null`` (they omit Origin entirely), so this tightening is invisible to the Godot plugin / FastMCP CLI / curl. 2. **Cross-origin no-cors subresource probes** (Copilot, server.py:79). `<img src="http://127.0.0.1:9500/godot-ai/status">` from ``https://attacker.example.com`` arrives with a loopback ``Host`` and *no* ``Origin`` (browsers omit Origin for ``no-cors`` GETs of ``<img>`` / ``<script>`` / ``<link>``), so the original Host/Origin gate let it through and the page could use the 200/403 outcome as a liveness oracle. Browsers stamp every HTTP request with ``Sec-Fetch-Site`` (``cross-site`` / ``same-site`` / ``same-origin`` / ``none``). Native non-browser clients never send it. **The guard now rejects ``cross-site`` and ``same-site``** while still allowing ``none`` (top-level navigation: user typed URL, bookmark) and ``same-origin`` (loopback page fetching its own server). Missing → allow (native client), preserving the existing flow. Drive-by: trailing-dot loopback (``Host: localhost.``) now accepted. RFC 1034 valid FQDN syntax that browsers and curl can preserve through to the Host header — friction trap if rejected, no security implication either way. `evaluate_loopback` now takes an optional ``sec_fetch_sites`` list so both transports run the new policy through the same single helper — preserves the audit-v2 invariant that the WS hook and HTTP middleware cannot drift. Tests: - New parametrized helper coverage for ``is_allowed_sec_fetch_site`` (friendly + foreign + case-insensitive + whitespace), trailing-dot hosts, and explicit ``Origin: null`` rejection. - Two new ``evaluate_loopback`` cases pinning the cross-site rejection with no-Origin (the exact Copilot shape) and ``Origin: null`` with any Sec-Fetch-Site value. - Three new ``LocalhostOnlyHTTPMiddleware`` middleware cases: ``Origin: null`` rejected, cross-site subresource rejected, top-level navigation accepted. - Three new live-WS integration cases pinning the same rules end-to-end through the websockets ``process_request`` hook, plus a bracketed-IPv6 ``http://[::1]:9500`` symmetry test for the WS path. Live smoke: native loopback connect → ``handshake_ack`` ✓; ``Origin: null`` WS connect → 403 ✓; ``Sec-Fetch-Site: cross-site`` WS connect → 403 ✓; ``Origin: null`` HTTP probe → 403 ✓; cross-site HTTP probe → 403 ✓. --------- Co-authored-by: Claude <noreply@anthropic.com>
…ake reject Two small audit-v2 fixes in src/godot_ai/transport/websocket.py covering #343 sub-findings #2 and #5. (Sub-finding #4 — errno.EADDRINUSE portability — landed separately via PR #373.) - #5 (P2): wrap `send_command`'s `ws.send` + `wait_for` in a try/finally that always pops `_pending`. Pre-fix, a `ws.send` raise (ConnectionClosed mid-send, transport error) leaked the Future entry forever; under churn the dict grew unbounded. Happy path still pops via the response receiver, so the finally pop is a no-op there. - #2 (P1): reject a second handshake whose session_id is already registered (close code 4001), instead of silently overwriting the routing map. session_id format is `<slug>@<4hex>` — 16 bits of suffix is locally guessable, so without this any local peer could hijack an active session by impersonating its ID. Legitimate plugin reconnect after `editor_reload_plugin` first triggers ConnectionClosed -> unregister, so the new connect still lands cleanly. Tests: - tests/integration/test_websocket.py — TestDuplicateHandshake pins the reject + reconnect-after-clean-disconnect paths; TestPendingFutureCleanup pins the timeout-pop and send-failure-pop behaviors. Live-smoked against a real Godot 4.6 editor: full GDScript suite 1037/1040 passed, duplicate-handshake rejected with close 4001, original session unaffected. https://claude.ai/code/session_016ijmCD5S6QfwJGJcc5Wirp
…ake reject Two small audit-v2 fixes in src/godot_ai/transport/websocket.py covering #343 sub-findings #2 and #5. (Sub-finding #4 — errno.EADDRINUSE portability — landed separately via PR #373.) - #5 (P2): wrap `send_command`'s `ws.send` + `wait_for` in a try/finally that always pops `_pending`. Pre-fix, a `ws.send` raise (ConnectionClosed mid-send, transport error) leaked the Future entry forever; under churn the dict grew unbounded. Happy path still pops via the response receiver, so the finally pop is a no-op there. - #2 (P1): reject a second handshake whose session_id is already registered (close code 4001), instead of silently overwriting the routing map. session_id format is `<slug>@<4hex>` — 16 bits of suffix is locally guessable, so without this any local peer could hijack an active session by impersonating its ID. Legitimate plugin reconnect after `editor_reload_plugin` first triggers ConnectionClosed -> unregister, so the new connect still lands cleanly. Tests: - tests/integration/test_websocket.py — TestDuplicateHandshake pins the reject + reconnect-after-clean-disconnect paths; TestPendingFutureCleanup pins the timeout-pop and send-failure-pop behaviors. Live-smoked against a real Godot 4.6 editor: full GDScript suite 1037/1040 passed, duplicate-handshake rejected with close 4001, original session unaffected. https://claude.ai/code/session_016ijmCD5S6QfwJGJcc5Wirp
…ake reject (#374) * Harden WebSocket transport: pending-Future leak + duplicate-ID handshake reject Two small audit-v2 fixes in src/godot_ai/transport/websocket.py covering #343 sub-findings #2 and #5. (Sub-finding #4 — errno.EADDRINUSE portability — landed separately via PR #373.) - #5 (P2): wrap `send_command`'s `ws.send` + `wait_for` in a try/finally that always pops `_pending`. Pre-fix, a `ws.send` raise (ConnectionClosed mid-send, transport error) leaked the Future entry forever; under churn the dict grew unbounded. Happy path still pops via the response receiver, so the finally pop is a no-op there. - #2 (P1): reject a second handshake whose session_id is already registered (close code 4001), instead of silently overwriting the routing map. session_id format is `<slug>@<4hex>` — 16 bits of suffix is locally guessable, so without this any local peer could hijack an active session by impersonating its ID. Legitimate plugin reconnect after `editor_reload_plugin` first triggers ConnectionClosed -> unregister, so the new connect still lands cleanly. Tests: - tests/integration/test_websocket.py — TestDuplicateHandshake pins the reject + reconnect-after-clean-disconnect paths; TestPendingFutureCleanup pins the timeout-pop and send-failure-pop behaviors. Live-smoked against a real Godot 4.6 editor: full GDScript suite 1037/1040 passed, duplicate-handshake rejected with close 4001, original session unaffected. https://claude.ai/code/session_016ijmCD5S6QfwJGJcc5Wirp * Apply pending ruff format to test_mcp_tools and test_gdscript_no_adjacent_string_concat Drive-by: `ruff format --check` was failing on these two test files on beta. Reformatting them keeps the format-check CI step green for the audit-v2 transport-hardening PR. No behavioral change — purely whitespace / line-wrapping deltas produced by `ruff format`. https://claude.ai/code/session_016ijmCD5S6QfwJGJcc5Wirp --------- Co-authored-by: Claude <noreply@anthropic.com>
…tree Two issues from Copilot's review on PR #381 + the second CI failure: (A) Cross-scan signal race [Copilot]: scan #1 watchdog'd, scan #2 then arms a fresh `filesystem_changed` listener. A delayed emission from scan #1 fires on whichever listener is currently connected to the shared signal — Godot can't tag emissions with their source scan — so scan #2's listener falsely settles before its actual filesystem scan completed, re-enabling the plugin against a potentially incomplete on-disk install. Generation-counter + Callable.bind doesn't help because a single emission still fires every connected listener (CONNECT_ONE_SHOT only auto-disconnects after firing). (B) Second CI failure: `Timer.start()` requires the Timer to be inside a SceneTree. The previous fix removed `add_child(runner)` from tests but didn't parent the runner anywhere, so when `_arm_scan_watchdog` called `add_child(timer)` and `timer.start()` inside the runner, the timer wasn't in any tree. Fix: - Add a sticky `_scan_timed_out` flag set by `_on_scan_watchdog_timeout`. - `_start_filesystem_scan` checks the flag at the top and bypasses the connect + `fs.scan()` path entirely, falling straight through to `call_deferred(deferred_step)`. With no listener armed, a delayed emission from the timed-out scan has nothing to satisfy. Godot's normal background scan catches up after the plugin re-enables. - Tests now parent the runner via `(Engine.get_main_loop() as SceneTree).root.add_child(runner)` so `Timer.start()` works. Cleanup via a `_free_runner` helper (remove_child + free). - New regression test `test_subsequent_scan_after_watchdog_bypasses_ listener_arm` explicitly drives scan #1 → watchdog → scan #2 and asserts scan #2 does NOT set `_waiting_for_scan = true` (i.e. no listener armed for a delayed scan-#1 emission to satisfy). - Existing watchdog test `test_watchdog_no_op_when_signal_already_settled` also asserts `_scan_timed_out` stays false on the late-Timer-after- successful-signal path — guards against the watchdog poisoning subsequent scans when no actual deadlock happened.
…ad (#381) * [Audit v2 #353] 30s watchdog on filesystem_changed during update reload Pre-fix, `_start_filesystem_scan` connected `filesystem_changed` (CONNECT_ONE_SHOT) and called `fs.scan()`, then sat in `_waiting_for_scan = true` forever if the signal never fired (slow disk, NFS, AV holding the just-extracted addon files open). The update completed mechanically — files on disk, plugin disabled — but the dock stayed disabled because `_finish_scan_wait` was the only path back to `_enable_new_plugin`. User had to kill and restart Godot. Add a 30-second `Timer` armed alongside the signal connect: - `_arm_scan_watchdog`: lazy-creates a one-shot `Timer` child node on first use, reuses it on subsequent arms (the runner does two filesystem scans per update — new files then existing files). - `_on_scan_watchdog_timeout`: if the wait is still open, push a warning, disconnect the `filesystem_changed` listener so a delayed signal can't double-call, then dispatch `_finish_scan_wait`. `_finish_scan_wait`'s existing `if not _waiting_for_scan: return` guard makes it idempotent against the signal-arrives-just-before-watchdog race. - `_finish_scan_wait` now also stops the still-running timer on the happy path, so a queued late `timeout` can't fire after a successful scan. Worst case after the watchdog fires: the new files aren't visible on the first frame the new plugin enables, but they get picked up on the next scan. That's strictly better than the prior dock- permanently-disabled state. Tests in `test_project/tests/test_update_reload_runner.gd`: - `test_watchdog_timeout_proceeds_when_signal_never_fires` — the deadlock case the issue describes. - `test_watchdog_no_op_when_signal_already_settled` — late-Timer race after the signal won. - `test_finish_scan_wait_stops_armed_watchdog` — happy path cancels the Timer. - `test_watchdog_timer_reused_across_arms` — second scan in the same update doesn't leak a second Timer child. Tests fire `_on_scan_watchdog_timeout` and `_finish_scan_wait` directly rather than waiting for the wall-clock Timer, keeping runs deterministic and sub-second. Closes #353 * Fix CI: remove add_child(runner) from watchdog tests McpTestSuite extends RefCounted (not Node), so add_child(runner) errors at test runtime — Linux/macOS/Windows all failed with the same root cause. The tests fire _on_scan_watchdog_timeout() directly instead of relying on the Timer counting down, so the runner doesn't actually need to be in a SceneTree; the Timer just needs to be a child of the runner (works regardless of where the runner sits). * Redesign per Copilot review: sticky bypass after watchdog + tests in tree Two issues from Copilot's review on PR #381 + the second CI failure: (A) Cross-scan signal race [Copilot]: scan #1 watchdog'd, scan #2 then arms a fresh `filesystem_changed` listener. A delayed emission from scan #1 fires on whichever listener is currently connected to the shared signal — Godot can't tag emissions with their source scan — so scan #2's listener falsely settles before its actual filesystem scan completed, re-enabling the plugin against a potentially incomplete on-disk install. Generation-counter + Callable.bind doesn't help because a single emission still fires every connected listener (CONNECT_ONE_SHOT only auto-disconnects after firing). (B) Second CI failure: `Timer.start()` requires the Timer to be inside a SceneTree. The previous fix removed `add_child(runner)` from tests but didn't parent the runner anywhere, so when `_arm_scan_watchdog` called `add_child(timer)` and `timer.start()` inside the runner, the timer wasn't in any tree. Fix: - Add a sticky `_scan_timed_out` flag set by `_on_scan_watchdog_timeout`. - `_start_filesystem_scan` checks the flag at the top and bypasses the connect + `fs.scan()` path entirely, falling straight through to `call_deferred(deferred_step)`. With no listener armed, a delayed emission from the timed-out scan has nothing to satisfy. Godot's normal background scan catches up after the plugin re-enables. - Tests now parent the runner via `(Engine.get_main_loop() as SceneTree).root.add_child(runner)` so `Timer.start()` works. Cleanup via a `_free_runner` helper (remove_child + free). - New regression test `test_subsequent_scan_after_watchdog_bypasses_ listener_arm` explicitly drives scan #1 → watchdog → scan #2 and asserts scan #2 does NOT set `_waiting_for_scan = true` (i.e. no listener armed for a delayed scan-#1 emission to satisfy). - Existing watchdog test `test_watchdog_no_op_when_signal_already_settled` also asserts `_scan_timed_out` stays false on the late-Timer-after- successful-signal path — guards against the watchdog poisoning subsequent scans when no actual deadlock happened. --------- Co-authored-by: Claude <noreply@anthropic.com>
…#518) (#519) The opaque INTERNAL_ERROR 'regression' on game_eval (1.42%->6.15%) was a thin-baseline cohort artifact plus #500 reclassifying ~15s TimeoutErrors into fast ~3s INTERNAL_ERRORs; the genuine ~10s hang is flat ~2.8%. Carve the play-session-up-but-capture-not-ready failure into its own EVAL_GAME_NOT_READY code (mirrors #491, and is #518's Suggested Action #2) so the opaque bucket again means 'the eval truly hung'. Relabel only; no timing machinery touched. Refs #518.
…dator The strong traversal validator from #347 was wired into only filesystem and script handlers; ~12 other path-taking handlers used a bare begins_with("res://") check (which does not reject "..") or no check at all (relying solely on ResourceLoader.exists/load). This unifies all of them on McpPathValidator and extends the validator with two checks. Validator (utils/path_validator.gd): - Reject embedded null bytes (truncation trap — the path written could differ from the one validated). - New for_write flag: write callers additionally refuse res://project.godot, the res://.godot/ metadata dir, and .import sidecars (overwriting these corrupts the project / import cache). Reads still permit inspecting them. Signature is backward-compatible (for_write defaults to false). Writers now validate with for_write=true (closes the traversal-write primitive): resource_io.save_to_disk (backs resource/environment/texture/curve saves), scene create_scene/save_scene_as, material/theme save helpers, filesystem write_text, script create/patch. Load/read sites now validate (closes the unvalidated-load surface, incl. the @tool-script-execution risk from open_scene/create_node): resource load/assign + nested object-property loads, scene open_scene, script attach_script, node create_node scene_path + set_property resource values, audio set_stream + list root, material assign/shader/list, ui theme + stylebox, autoload path, curve set_points, particle mesh/ material/texture, material_values.load_texture. Path-validation rejections report VALUE_OUT_OF_RANGE (the code these handlers already used for "must start with res://"), keeping the INVALID_PARAMS catch-all count under the audit-v2 #21 ceiling enforced by test_error_code_distribution. The four pre-existing validator sites that already wrapped errors as INVALID_PARAMS (filesystem, script create/read/patch/find, resource_io save, material _validate_material_path) are left unchanged. set_stream and the other load handlers now reject user:// (consistent with the validator's existing test_user_prefix_rejected policy); the audio test fixture moves from user:// to a res:// .tres registered via EditorFileSystem.update_file. Tests (run against live Godot 4.6.3): validator unit tests for null-byte + write blocklist + read-allows; scene traversal/manifest-overwrite rejection. All affected suites pass (path_validator/scene/resource/node/material/theme/ curve/particle/ui/audio/autoload/filesystem/script). test_error_code_distribution passes. Parse check clean. Addresses advisory GHSA-p5x8-v25q-qw69 (GH-1, GH-2, GH-3, GH-4). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…dator The strong traversal validator from #347 was wired into only filesystem and script handlers; ~12 other path-taking handlers used a bare begins_with("res://") check (which does not reject "..") or no check at all (relying solely on ResourceLoader.exists/load). This unifies all of them on McpPathValidator and extends the validator with two checks. Validator (utils/path_validator.gd): - Reject embedded null bytes (truncation trap — the path written could differ from the one validated). - New for_write flag: write callers additionally refuse res://project.godot, the res://.godot/ metadata dir, and .import sidecars (overwriting these corrupts the project / import cache). Reads still permit inspecting them. Signature is backward-compatible (for_write defaults to false). Writers now validate with for_write=true (closes the traversal-write primitive): resource_io.save_to_disk (backs resource/environment/texture/curve saves), scene create_scene/save_scene_as, material/theme save helpers, filesystem write_text, script create/patch. Load/read sites now validate (closes the unvalidated-load surface, incl. the @tool-script-execution risk from open_scene/create_node): resource load/assign + nested object-property loads, scene open_scene, script attach_script, node create_node scene_path + set_property resource values, audio set_stream + list root, material assign/shader/list, ui theme + stylebox, autoload path, curve set_points, particle mesh/ material/texture, material_values.load_texture. Path-validation rejections report VALUE_OUT_OF_RANGE (the code these handlers already used for "must start with res://"), keeping the INVALID_PARAMS catch-all count under the audit-v2 #21 ceiling enforced by test_error_code_distribution. The four pre-existing validator sites that already wrapped errors as INVALID_PARAMS (filesystem, script create/read/patch/find, resource_io save, material _validate_material_path) are left unchanged. set_stream and the other load handlers now reject user:// (consistent with the validator's existing test_user_prefix_rejected policy); the audio test fixture moves from user:// to a res:// .tres registered via EditorFileSystem.update_file. Tests (run against live Godot 4.6.3): validator unit tests for null-byte + write blocklist + read-allows; scene traversal/manifest-overwrite rejection. All affected suites pass (path_validator/scene/resource/node/material/theme/ curve/particle/ui/audio/autoload/filesystem/script). test_error_code_distribution passes. Parse check clean. Addresses advisory GHSA-p5x8-v25q-qw69 (GH-1, GH-2, GH-3, GH-4). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…dator (#546) * harden(handlers): route every path-taking handler through McpPathValidator The strong traversal validator from #347 was wired into only filesystem and script handlers; ~12 other path-taking handlers used a bare begins_with("res://") check (which does not reject "..") or no check at all (relying solely on ResourceLoader.exists/load). This unifies all of them on McpPathValidator and extends the validator with two checks. Validator (utils/path_validator.gd): - Reject embedded null bytes (truncation trap — the path written could differ from the one validated). - New for_write flag: write callers additionally refuse res://project.godot, the res://.godot/ metadata dir, and .import sidecars (overwriting these corrupts the project / import cache). Reads still permit inspecting them. Signature is backward-compatible (for_write defaults to false). Writers now validate with for_write=true (closes the traversal-write primitive): resource_io.save_to_disk (backs resource/environment/texture/curve saves), scene create_scene/save_scene_as, material/theme save helpers, filesystem write_text, script create/patch. Load/read sites now validate (closes the unvalidated-load surface, incl. the @tool-script-execution risk from open_scene/create_node): resource load/assign + nested object-property loads, scene open_scene, script attach_script, node create_node scene_path + set_property resource values, audio set_stream + list root, material assign/shader/list, ui theme + stylebox, autoload path, curve set_points, particle mesh/ material/texture, material_values.load_texture. Path-validation rejections report VALUE_OUT_OF_RANGE (the code these handlers already used for "must start with res://"), keeping the INVALID_PARAMS catch-all count under the audit-v2 #21 ceiling enforced by test_error_code_distribution. The four pre-existing validator sites that already wrapped errors as INVALID_PARAMS (filesystem, script create/read/patch/find, resource_io save, material _validate_material_path) are left unchanged. set_stream and the other load handlers now reject user:// (consistent with the validator's existing test_user_prefix_rejected policy); the audio test fixture moves from user:// to a res:// .tres registered via EditorFileSystem.update_file. Tests (run against live Godot 4.6.3): validator unit tests for null-byte + write blocklist + read-allows; scene traversal/manifest-overwrite rejection. All affected suites pass (path_validator/scene/resource/node/material/theme/ curve/particle/ui/audio/autoload/filesystem/script). test_error_code_distribution passes. Parse check clean. Addresses advisory GHSA-p5x8-v25q-qw69 (GH-1, GH-2, GH-3, GH-4). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * harden(handlers): address review — uid:// loads, case-fold blocklist, shared helper Follow-up to the McpPathValidator unification, fixing the code-review findings. Correctness: - Restore uid:// and user:// support on load handlers. ResourceLoader accepts both (uid:// is an opaque resource id that can't express traversal; user:// is the user data sandbox), but routing every load through the strict res:// validator rejected them — a regression for uid:// refs copied from .tscn/.uid and for user:// runtime assets. New validate_loadable_path() accepts res:// (confined), uid:// (as-is), and user:// (confined under the user root), and load sites now use it: set_property/assign_resource/resource property dicts, open_scene, create_node, attach_script, set_stream, particle mesh/material/ texture, material assign/shader, ui theme/stylebox, curve, material_values. - Case-fold the write blocklist. macOS (APFS) and Windows (NTFS) are case-insensitive by default, so res://Project.godot resolved to the real project.godot and slipped past a case-sensitive compare. - Add override.cfg to the write blocklist (applied over project.godot at startup — same takeover surface as the manifest). - Reads no longer run the write blocklist: _validate_material_path / _validate_res_path / _load_material_from_path / _load_theme_from_params take for_write, passed true only by the create/save callers. get_material and apply_theme (pure reads) no longer return a spurious "Refusing to write". - Stop blocking .import writes — editing import config then reimporting is a legitimate, recoverable workflow; the blocklist is the startup-execution surface (manifest, override.cfg, .godot/) only. Cleanup / altitude: - Single error-code choke point: McpPathValidator.path_error / loadable_error return a ready error dict (MISSING_REQUIRED_PARAM for empty, VALUE_OUT_OF_RANGE for invalid) so every handler reports the same code for the same failure. filesystem/script move off their old INVALID_PARAMS wrapping onto this. - curve set_points validates the load path (the save layer, resource_io.save_to_disk, remains the authoritative write-confinement check) instead of double-running the write validator. - Drop the redundant is_empty() pre-check in material_values.load_texture. - Document the lexical-containment (no symlink resolution) limitation in the validator — GDScript has no realpath; the loopback boundary is the control. Tests (live Godot 4.6.3, all suites 0 failures): uid:// / user:// acceptance, user:// traversal + unknown-scheme rejection, case-insensitive blocklist, override.cfg block, .import now allowed; audio fixture moved back to user:// (no repo/EFS pollution). test_path_traversal_guard counts any McpPathValidator delegation; missing-path tests now assert MISSING_REQUIRED_PARAM. Addresses review findings 1-10 on GHSA-p5x8-v25q-qw69 (path confinement). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * harden(handlers): address Copilot review on path validation - Guard the null-byte check against an empty String.chr(0) sentinel in both validate_resource_path and validate_loadable_path. On builds that normalize embedded nulls away (e.g. 4.3), contains("") would be true and reject every path; a String that can't hold a null can't smuggle one, so skip the check. - Update stale "res:// path" comments in ui_handler (stylebox override) and material_values.load_texture — both now accept uid:// / user:// via validate_loadable_path. - Tighten test_path_traversal_guard: assert attach_script is present and require >=5 McpPathValidator delegations in script_handler, so a regression where attach_script stops validating its path is caught. Live Godot 4.6.3: path_validator/filesystem/script/ui/material/resource/scene/ audio suites all pass; test_path_traversal_guard green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * style: wrap long func-name tuple in test_path_traversal_guard (ruff E501) The 5-entry func list exceeded the 100-char line limit after adding attach_script. Pure formatting; the assertion is unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs(handlers): align remaining res:// messages with uid:// / user:// support Copilot re-review: now that the theme / stylebox / shader load paths accept uid:// and user:// via loadable_error, the surrounding comments, the build_layout docstring, the stylebox fallback error ("expects a res:// path"), and the shader_path missing-arg error still said res:// only. Text-only. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Summary
This PR wires the reload workflow all the way through for local development:
reload_pluginrobust against fast reconnect races by detecting the replacement session directlygodot_ai.asgipython -m godot_ai --transport streamable-http --port 8000 --reloadonto Uvicorn's supported import-string/factory reload pathclient_configurator.gd, including configure/status/remove handling for~/.codex/config.tomlRoot Cause
There were two separate failure modes behind the expected reload behavior:
reload=Truethrough FastMCP's in-memory app path, but Uvicorn only activates its reload supervisor when the app is provided as an import string/factory.reload_pluginwrapper could miss a very fast reconnect. If the replacement Godot session registered before the waiter was installed, the tool would hang until timeout even though the plugin had already reloaded.This PR also folds in the related Codex client configuration work that was needed to exercise the end-to-end workflow from the editor side.
Codex Configurator
codexto the supported client set alongside Claude Code and Antigravitygodot-aiMCP server entry in~/.codex/config.tomlImpact
--reloadmode.reload_pluginnow returns reliably after the plugin reconnects.Validation
Automated:
pytest -q tests/unit/test_cli_reload.py tests/unit/test_runtime_handlers.py tests/unit/test_session_registry.py tests/unit/test_resources.py tests/integration/test_mcp_tools.pyLive verification:
--reloadmode and confirmed Uvicorn launched a real WatchFiles reloadereditor_statereflected the new server-side markerreload_pluginreturned a new session id andlogs_readshowed the updated plugin-side marker