Harden WebSocket transport: errno portability, Future leak, duplicate-ID reject#366
Closed
dsarno wants to merge 2 commits into
Closed
Harden WebSocket transport: errno portability, Future leak, duplicate-ID reject#366dsarno wants to merge 2 commits into
dsarno wants to merge 2 commits into
Conversation
…uplicate-ID reject Three small audit-v2 fixes in src/godot_ai/transport/websocket.py covering #343 sub-findings #2, #4, and #5. - #4 (P1): replace hardcoded `e.errno == 48` (macOS-only) with `errno.EADDRINUSE` so Linux (98) and Windows (10048) also hit the friendly "port already in use" branch instead of crashing the WS lifespan with an unhandled OSError. - #5 (P2): wrap `send_command`'s `ws.send` + `wait_for` in a try/finally that always pops `_pending`. Pre-fix, a `ws.send` raise (ConnectionClosed mid-send, transport error) leaked the Future entry forever; under churn the dict grew unbounded. Happy path still pops via the response receiver, so the finally pop is a no-op there. - #2 (P1): reject a second handshake whose session_id is already registered (close code 4001), instead of silently overwriting the routing map. session_id format is `<slug>@<4hex>` — 16 bits of suffix is locally guessable, so without this any local peer could hijack an active session by impersonating its ID. Legitimate plugin reconnect after `editor_reload_plugin` first triggers ConnectionClosed -> unregister, so the new connect still lands cleanly. Tests: - tests/unit/test_websocket_server_lifecycle.py — parametrized errno coverage (macos 48, linux 98, windows 10048; one runs per CI host) plus a non-EADDRINUSE OSError propagation case. - tests/integration/test_websocket.py — TestDuplicateHandshake pins the reject + reconnect-after-clean-disconnect paths; TestPendingFutureCleanup pins the timeout-pop and send-failure-pop behaviors. Live-smoked against a real Godot 4.6 editor: full GDScript suite 1037/1040 passed, duplicate-handshake rejected with close 4001, original session unaffected. https://claude.ai/code/session_016ijmCD5S6QfwJGJcc5Wirp
…cent_string_concat Drive-by: `ruff format --check` was failing on these two pre-existing test files. Reformatting them keeps the format-check CI step green for the audit-v2 transport-hardening PR. No behavioral change — purely whitespace / line-wrapping deltas produced by `ruff format`. https://claude.ai/code/session_016ijmCD5S6QfwJGJcc5Wirp
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Contributor
Author
|
Closing — landed on the wrong branch. Audit v2 (#343) is baselined on Generated by Claude Code |
This was referenced May 5, 2026
dsarno
added a commit
that referenced
this pull request
May 5, 2026
…#375) * Reject non-loopback Host/Origin on WS + HTTP transports (audit-v2 #1) Closes #345 — refs #343 (audit-v2 umbrella). Both transports bound to 127.0.0.1, but a malicious browser tab could mount **DNS rebinding**: resolve attacker.example.com to 127.0.0.1, then `new WebSocket("ws://attacker.example.com:9500")`. The request lands on our loopback socket carrying a non-localhost Host (and Origin), the WS handshake registered the peer as a session, and any MCP tool became drivable from a foreign origin — including write tools. The streamable-HTTP transport on :8000 had the same gap, including `/godot-ai/status` (comment-marked "small unauthenticated probe"). Per the issue's "Fix shape": **strict Host/Origin allowlist in a Starlette middleware ahead of FastMCP** plus a matching `process_request` hook on the WebSocket server. `src/godot_ai/transport/origin_guard.py` (new): - `is_allowed_host` — accepts `127.0.0.1`, `localhost`, `[::1]` with optional `:port`; case-insensitive; bare `::1` rejected (RFC 7230 requires bracketed IPv6 in Host). - `is_allowed_origin` — None / empty / `null` accepted (native clients omit Origin; sandboxed/file:// emit `null`); otherwise must parse to a URL whose hostname is loopback. Schemes outside `http/https/ws/wss` rejected. - `LocalhostOnlyHTTPMiddleware` — ASGI middleware. Rejects on first failure with HTTP 403; passes lifespan scopes through; handles duplicate-Host smuggling shapes (fail closed when the same header appears more than once). - `make_websocket_request_guard()` — `process_request` hook for `websockets.asyncio.server.serve(...)` mirroring the same logic. Uses `headers.get_all` so a smuggled duplicate Host fails closed rather than tripping `MultipleValuesError` and surfacing as 500. Wiring: - `transport/websocket.py`: pass `process_request=` to `serve()`. - `server.py`: outermost wrap on the HTTP app, applied to every HTTP transport (`http`, `streamable-http`, `sse`) so `/godot-ai/status` and the FastMCP endpoints are guarded uniformly. Native clients keep working: the Godot plugin's WebSocketPeer sends `Host: 127.0.0.1:<port>` and no Origin. Verified by smoke against a real WebSocketServer (loopback connect → `handshake_ack`; rebound Origin → 403) and against the streamable-HTTP transport (`/godot-ai/status`: loopback → 200, `Host: attacker.example.com` → 403, loopback Host + browser Origin → 403). Tests: - `tests/unit/test_origin_guard.py` (44 tests): parametrized helper coverage (loopback / sneaky-substring / IP / malformed / multiple smuggled values), middleware behavior, lifespan passthrough, `state` `__getattr__` passthrough. - `tests/integration/test_websocket.py::TestDnsRebindingGuard` (5 tests): live `websockets` server + client. Loopback succeeds, non-loopback Host returns 403, non-loopback Origin returns 403, loopback-shaped Origin succeeds, rejected request never registers a session. - `tests/unit/test_asgi_session_diagnostics.py`: structural assertions updated for the new outer wrap; FastMCP `state` lookup now traverses one extra layer. - `tests/unit/test_server_status.py`: `TestClient(base_url=...)` set to a loopback host so the request passes the new guard. Test plan: - ruff check + format: clean - pytest -q: 825 passed (+45 new) - script/ci-check-gdscript: clean (no GDScript touched) - Live smoke: real `GodotWebSocketServer` and real streamable-HTTP app, loopback succeeds, non-loopback Host/Origin → 403. Note: PR #366 (audit-v2 #2, session-hijack) closes the duplicate-handshake takeover. This PR closes the rebinding-from-browser path. Together they harden the WebSocket transport at both layers (who can connect, and what they can do once they have). * Simplify: extract evaluate_loopback so WS + HTTP guards can't drift Both the WebSocket ``process_request`` hook and ``LocalhostOnlyHTTPMiddleware`` implemented the same 3-step rule (count duplicate headers → check ``is_allowed_host`` → check ``is_allowed_origin`` → 403) twice, with slightly different intermediate state. Funnel both through one ``evaluate_loopback(hosts, origins) -> bool`` so a regression in one transport cannot accidentally diverge from the other. Drive-by: - Decode FORBIDDEN_BODY once at module load (FORBIDDEN_BODY_TEXT) so the WS guard doesn't decode the same bytes per rejection. - Drop redundant ``parsed.scheme.lower()`` — ``urlsplit`` already normalises the scheme per RFC 3986. No behavior change. 825 tests pass. * Address Copilot review: reject Origin: null + Sec-Fetch-Site cross-origin Two browser-side liveness/bypass shapes that the first cut of the loopback guard let through: 1. **``Origin: null``** (Copilot, origin_guard.py:85). Sandboxed ``<iframe sandbox>`` and downloaded ``file://`` pages serialize their origin as ``null``. The original guard accepted this on the theory that file:// /sandboxed contexts are "legitimate"; in practice they're exactly the bypass an attacker would use to bridge a foreign origin onto our loopback socket. **Now rejected.** Native clients never produce ``null`` (they omit Origin entirely), so this tightening is invisible to the Godot plugin / FastMCP CLI / curl. 2. **Cross-origin no-cors subresource probes** (Copilot, server.py:79). `<img src="http://127.0.0.1:9500/godot-ai/status">` from ``https://attacker.example.com`` arrives with a loopback ``Host`` and *no* ``Origin`` (browsers omit Origin for ``no-cors`` GETs of ``<img>`` / ``<script>`` / ``<link>``), so the original Host/Origin gate let it through and the page could use the 200/403 outcome as a liveness oracle. Browsers stamp every HTTP request with ``Sec-Fetch-Site`` (``cross-site`` / ``same-site`` / ``same-origin`` / ``none``). Native non-browser clients never send it. **The guard now rejects ``cross-site`` and ``same-site``** while still allowing ``none`` (top-level navigation: user typed URL, bookmark) and ``same-origin`` (loopback page fetching its own server). Missing → allow (native client), preserving the existing flow. Drive-by: trailing-dot loopback (``Host: localhost.``) now accepted. RFC 1034 valid FQDN syntax that browsers and curl can preserve through to the Host header — friction trap if rejected, no security implication either way. `evaluate_loopback` now takes an optional ``sec_fetch_sites`` list so both transports run the new policy through the same single helper — preserves the audit-v2 invariant that the WS hook and HTTP middleware cannot drift. Tests: - New parametrized helper coverage for ``is_allowed_sec_fetch_site`` (friendly + foreign + case-insensitive + whitespace), trailing-dot hosts, and explicit ``Origin: null`` rejection. - Two new ``evaluate_loopback`` cases pinning the cross-site rejection with no-Origin (the exact Copilot shape) and ``Origin: null`` with any Sec-Fetch-Site value. - Three new ``LocalhostOnlyHTTPMiddleware`` middleware cases: ``Origin: null`` rejected, cross-site subresource rejected, top-level navigation accepted. - Three new live-WS integration cases pinning the same rules end-to-end through the websockets ``process_request`` hook, plus a bracketed-IPv6 ``http://[::1]:9500`` symmetry test for the WS path. Live smoke: native loopback connect → ``handshake_ack`` ✓; ``Origin: null`` WS connect → 403 ✓; ``Sec-Fetch-Site: cross-site`` WS connect → 403 ✓; ``Origin: null`` HTTP probe → 403 ✓; cross-site HTTP probe → 403 ✓. --------- Co-authored-by: Claude <noreply@anthropic.com>
This was referenced May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundles three small, surgical hardening fixes from the audit-v2 umbrella (#343), all in
src/godot_ai/transport/websocket.py. Each fix is independently scoped and pinned by a regression test; bundled because they're tightly themed (transport-layer hardening) and trivially reviewable side-by-side.Closes findings #2, #4, and #5 of #343.
Fixes
#4 (P1) —
errno.EADDRINUSEportabilitysrc/godot_ai/transport/websocket.py:46previously checkede.errno == 48. That number is macOS-only; Linux uses 98 and Windows 10048, so on those platforms the friendly "port already in use" branch never fired and the OSError propagated, taking the WS server lifespan down with a generic traceback. Replaced witherrno.EADDRINUSE.#5 (P2) —
send_commandpending-Future leaksend_commandregisteredself._pending[request.request_id] = futurethen awaitedws.send(...). Ifws.sendraised (ConnectionClosedmid-send, transport error, cancellation), the pending entry was never popped and the dict grew unbounded under churn. Wrapped the send + wait intry/finally: self._pending.pop(...); the response receiver still pops on the happy path so the finally is a no-op there.#2 (P1) — Duplicate-ID handshake hijack
_handle_connectionregistered the inbound peer withsession_id = handshake.session_id, silently overwriting both the registry entry and_connections[session_id]when the same ID was already present. session_id format is<slug>@<4hex>— 16 bits of suffix is locally guessable, so any local peer could hijack an active session by sending a handshake with the same ID; subsequent MCP commands would route to the impostor.Now
_handle_connectionchecksregistry.get(handshake.session_id)and rejects with WebSocket close code4001(RFC 6455 application-defined range) plus a structured warning log naming the existing peer's PID and project. Legitimate plugin reconnect aftereditor_reload_pluginfirst triggersConnectionClosed→unregister, so the new connect still lands cleanly — pinned bytest_reconnect_after_clean_disconnect_succeeds.Tests
New:
tests/unit/test_websocket_server_lifecycle.py— parametrized errno coverage (macos 48, linux 98, windows 10048; one runs per CI host) plus a non-EADDRINUSE OSError propagation case.tests/integration/test_websocket.py::TestDuplicateHandshake— pins both the reject path and the legitimate-reconnect path.tests/integration/test_websocket.py::TestPendingFutureCleanup— pins both the timeout-pop and the send-raise pop behaviors (the latter monkey-patchesws.sendto force a syntheticConnectionErrorafter the pending entry is registered).Existing: 728 passed, 2 skipped (cross-platform errno parametrizations on the non-host OS).
Live smoke
Spawned a real Godot 4.6.2 editor against
test_project/, openedmain.tscn, ran via the streamable-HTTP MCP transport:editor_state→readiness=ready✓test_run→ 1037/1040 passed, 0 failed ✓'session id already registered'✓editor_stateafter the rejected hijack → stillreadiness=ready(original session unaffected) ✓Rejecting duplicate handshake for session test-project@307e (existing pid=21071, project=/home/user/godot-ai/test_project/)✓What's NOT closed
script_create/filesystem_write_text) — touches GDScript handlers; deserves its own PR with a regression test againstres://../etc/passwd.gd.Test plan
ruff check src/ tests/cleanruff format --check src/ tests/cleanpytest -q— 728 passedtest_runagainst Godot 4.6.2 editor — 1037/1040 passed, 0 failedhttps://claude.ai/code/session_016ijmCD5S6QfwJGJcc5Wirp
Generated by Claude Code