Skip to content

Harden WebSocket transport: errno portability, Future leak, duplicate-ID reject#366

Closed
dsarno wants to merge 2 commits into
mainfrom
claude/laughing-mendel-eVqHJ
Closed

Harden WebSocket transport: errno portability, Future leak, duplicate-ID reject#366
dsarno wants to merge 2 commits into
mainfrom
claude/laughing-mendel-eVqHJ

Conversation

@dsarno

@dsarno dsarno commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Bundles three small, surgical hardening fixes from the audit-v2 umbrella (#343), all in src/godot_ai/transport/websocket.py. Each fix is independently scoped and pinned by a regression test; bundled because they're tightly themed (transport-layer hardening) and trivially reviewable side-by-side.

Closes findings #2, #4, and #5 of #343.

Fixes

#4 (P1) — errno.EADDRINUSE portability

src/godot_ai/transport/websocket.py:46 previously checked e.errno == 48. That number is macOS-only; Linux uses 98 and Windows 10048, so on those platforms the friendly "port already in use" branch never fired and the OSError propagated, taking the WS server lifespan down with a generic traceback. Replaced with errno.EADDRINUSE.

#5 (P2) — send_command pending-Future leak

send_command registered self._pending[request.request_id] = future then awaited ws.send(...). If ws.send raised (ConnectionClosed mid-send, transport error, cancellation), the pending entry was never popped and the dict grew unbounded under churn. Wrapped the send + wait in try/finally: self._pending.pop(...); the response receiver still pops on the happy path so the finally is a no-op there.

#2 (P1) — Duplicate-ID handshake hijack

_handle_connection registered the inbound peer with session_id = handshake.session_id, silently overwriting both the registry entry and _connections[session_id] when the same ID was already present. session_id format is <slug>@<4hex> — 16 bits of suffix is locally guessable, so any local peer could hijack an active session by sending a handshake with the same ID; subsequent MCP commands would route to the impostor.

Now _handle_connection checks registry.get(handshake.session_id) and rejects with WebSocket close code 4001 (RFC 6455 application-defined range) plus a structured warning log naming the existing peer's PID and project. Legitimate plugin reconnect after editor_reload_plugin first triggers ConnectionClosedunregister, so the new connect still lands cleanly — pinned by test_reconnect_after_clean_disconnect_succeeds.

Tests

New:

  • tests/unit/test_websocket_server_lifecycle.py — parametrized errno coverage (macos 48, linux 98, windows 10048; one runs per CI host) plus a non-EADDRINUSE OSError propagation case.
  • tests/integration/test_websocket.py::TestDuplicateHandshake — pins both the reject path and the legitimate-reconnect path.
  • tests/integration/test_websocket.py::TestPendingFutureCleanup — pins both the timeout-pop and the send-raise pop behaviors (the latter monkey-patches ws.send to force a synthetic ConnectionError after the pending entry is registered).

Existing: 728 passed, 2 skipped (cross-platform errno parametrizations on the non-host OS).

Live smoke

Spawned a real Godot 4.6.2 editor against test_project/, opened main.tscn, ran via the streamable-HTTP MCP transport:

  • editor_statereadiness=ready
  • test_run1037/1040 passed, 0 failed
  • Duplicate-ID handshake against the live session's own ID → server closed the WS with code 4001 and reason 'session id already registered'
  • editor_state after the rejected hijack → still readiness=ready (original session unaffected) ✓
  • Server log: Rejecting duplicate handshake for session test-project@307e (existing pid=21071, project=/home/user/godot-ai/test_project/)

What's NOT closed

Test plan

  • ruff check src/ tests/ clean
  • ruff format --check src/ tests/ clean
  • pytest -q — 728 passed
  • Live MCP test_run against Godot 4.6.2 editor — 1037/1040 passed, 0 failed
  • Live duplicate-ID handshake reject smoke — 4001 close received, original session unaffected
  • Logged server-side warning includes existing-peer pid + project_path

https://claude.ai/code/session_016ijmCD5S6QfwJGJcc5Wirp


Generated by Claude Code

claude added 2 commits May 5, 2026 14:27
…uplicate-ID reject

Three small audit-v2 fixes in src/godot_ai/transport/websocket.py
covering #343 sub-findings #2, #4, and #5.

- #4 (P1): replace hardcoded `e.errno == 48` (macOS-only) with
  `errno.EADDRINUSE` so Linux (98) and Windows (10048) also hit
  the friendly "port already in use" branch instead of crashing
  the WS lifespan with an unhandled OSError.

- #5 (P2): wrap `send_command`'s `ws.send` + `wait_for` in a
  try/finally that always pops `_pending`. Pre-fix, a `ws.send`
  raise (ConnectionClosed mid-send, transport error) leaked the
  Future entry forever; under churn the dict grew unbounded.
  Happy path still pops via the response receiver, so the finally
  pop is a no-op there.

- #2 (P1): reject a second handshake whose session_id is already
  registered (close code 4001), instead of silently overwriting
  the routing map. session_id format is `<slug>@<4hex>` — 16 bits
  of suffix is locally guessable, so without this any local peer
  could hijack an active session by impersonating its ID.
  Legitimate plugin reconnect after `editor_reload_plugin` first
  triggers ConnectionClosed -> unregister, so the new connect
  still lands cleanly.

Tests:
- tests/unit/test_websocket_server_lifecycle.py — parametrized
  errno coverage (macos 48, linux 98, windows 10048; one runs per
  CI host) plus a non-EADDRINUSE OSError propagation case.
- tests/integration/test_websocket.py — TestDuplicateHandshake
  pins the reject + reconnect-after-clean-disconnect paths;
  TestPendingFutureCleanup pins the timeout-pop and
  send-failure-pop behaviors.

Live-smoked against a real Godot 4.6 editor: full GDScript suite
1037/1040 passed, duplicate-handshake rejected with close 4001,
original session unaffected.

https://claude.ai/code/session_016ijmCD5S6QfwJGJcc5Wirp
…cent_string_concat

Drive-by: `ruff format --check` was failing on these two pre-existing
test files. Reformatting them keeps the format-check CI step green for
the audit-v2 transport-hardening PR. No behavioral change — purely
whitespace / line-wrapping deltas produced by `ruff format`.

https://claude.ai/code/session_016ijmCD5S6QfwJGJcc5Wirp
@codecov

codecov Bot commented May 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

dsarno commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

Closing — landed on the wrong branch. Audit v2 (#343) is baselined on beta; opening a fresh PR against beta with the same three fixes (rebased clean onto aac5483).


Generated by Claude Code

@dsarno dsarno closed this May 5, 2026
dsarno added a commit that referenced this pull request May 5, 2026
…#375)

* Reject non-loopback Host/Origin on WS + HTTP transports (audit-v2 #1)

Closes #345 — refs #343 (audit-v2 umbrella).

Both transports bound to 127.0.0.1, but a malicious browser tab could
mount **DNS rebinding**: resolve attacker.example.com to 127.0.0.1,
then `new WebSocket("ws://attacker.example.com:9500")`. The request
lands on our loopback socket carrying a non-localhost Host (and
Origin), the WS handshake registered the peer as a session, and any
MCP tool became drivable from a foreign origin — including write
tools.

The streamable-HTTP transport on :8000 had the same gap, including
`/godot-ai/status` (comment-marked "small unauthenticated probe").

Per the issue's "Fix shape": **strict Host/Origin allowlist in a
Starlette middleware ahead of FastMCP** plus a matching `process_request`
hook on the WebSocket server.

`src/godot_ai/transport/origin_guard.py` (new):

- `is_allowed_host` — accepts `127.0.0.1`, `localhost`, `[::1]` with
  optional `:port`; case-insensitive; bare `::1` rejected (RFC 7230
  requires bracketed IPv6 in Host).
- `is_allowed_origin` — None / empty / `null` accepted (native clients
  omit Origin; sandboxed/file:// emit `null`); otherwise must parse to
  a URL whose hostname is loopback. Schemes outside `http/https/ws/wss`
  rejected.
- `LocalhostOnlyHTTPMiddleware` — ASGI middleware. Rejects on first
  failure with HTTP 403; passes lifespan scopes through; handles
  duplicate-Host smuggling shapes (fail closed when the same header
  appears more than once).
- `make_websocket_request_guard()` — `process_request` hook for
  `websockets.asyncio.server.serve(...)` mirroring the same logic.
  Uses `headers.get_all` so a smuggled duplicate Host fails closed
  rather than tripping `MultipleValuesError` and surfacing as 500.

Wiring:

- `transport/websocket.py`: pass `process_request=` to `serve()`.
- `server.py`: outermost wrap on the HTTP app, applied to every
  HTTP transport (`http`, `streamable-http`, `sse`) so `/godot-ai/status`
  and the FastMCP endpoints are guarded uniformly.

Native clients keep working: the Godot plugin's WebSocketPeer sends
`Host: 127.0.0.1:<port>` and no Origin. Verified by smoke against a
real WebSocketServer (loopback connect → `handshake_ack`; rebound
Origin → 403) and against the streamable-HTTP transport
(`/godot-ai/status`: loopback → 200, `Host: attacker.example.com` → 403,
loopback Host + browser Origin → 403).

Tests:

- `tests/unit/test_origin_guard.py` (44 tests):
  parametrized helper coverage (loopback / sneaky-substring / IP /
  malformed / multiple smuggled values), middleware behavior, lifespan
  passthrough, `state` `__getattr__` passthrough.
- `tests/integration/test_websocket.py::TestDnsRebindingGuard` (5
  tests): live `websockets` server + client. Loopback succeeds,
  non-loopback Host returns 403, non-loopback Origin returns 403,
  loopback-shaped Origin succeeds, rejected request never registers
  a session.
- `tests/unit/test_asgi_session_diagnostics.py`: structural assertions
  updated for the new outer wrap; FastMCP `state` lookup now traverses
  one extra layer.
- `tests/unit/test_server_status.py`: `TestClient(base_url=...)` set
  to a loopback host so the request passes the new guard.

Test plan:
- ruff check + format: clean
- pytest -q: 825 passed (+45 new)
- script/ci-check-gdscript: clean (no GDScript touched)
- Live smoke: real `GodotWebSocketServer` and real streamable-HTTP
  app, loopback succeeds, non-loopback Host/Origin → 403.

Note: PR #366 (audit-v2 #2, session-hijack) closes the
duplicate-handshake takeover. This PR closes the rebinding-from-browser
path. Together they harden the WebSocket transport at both layers
(who can connect, and what they can do once they have).

* Simplify: extract evaluate_loopback so WS + HTTP guards can't drift

Both the WebSocket ``process_request`` hook and ``LocalhostOnlyHTTPMiddleware``
implemented the same 3-step rule (count duplicate headers → check
``is_allowed_host`` → check ``is_allowed_origin`` → 403) twice, with
slightly different intermediate state. Funnel both through one
``evaluate_loopback(hosts, origins) -> bool`` so a regression in one
transport cannot accidentally diverge from the other.

Drive-by:

- Decode FORBIDDEN_BODY once at module load (FORBIDDEN_BODY_TEXT) so
  the WS guard doesn't decode the same bytes per rejection.
- Drop redundant ``parsed.scheme.lower()`` — ``urlsplit`` already
  normalises the scheme per RFC 3986.

No behavior change. 825 tests pass.

* Address Copilot review: reject Origin: null + Sec-Fetch-Site cross-origin

Two browser-side liveness/bypass shapes that the first cut of the
loopback guard let through:

1. **``Origin: null``** (Copilot, origin_guard.py:85). Sandboxed
   ``<iframe sandbox>`` and downloaded ``file://`` pages serialize
   their origin as ``null``. The original guard accepted this on the
   theory that file:// /sandboxed contexts are "legitimate"; in
   practice they're exactly the bypass an attacker would use to bridge
   a foreign origin onto our loopback socket. **Now rejected.** Native
   clients never produce ``null`` (they omit Origin entirely), so this
   tightening is invisible to the Godot plugin / FastMCP CLI / curl.

2. **Cross-origin no-cors subresource probes** (Copilot, server.py:79).
   `<img src="http://127.0.0.1:9500/godot-ai/status">` from
   ``https://attacker.example.com`` arrives with a loopback ``Host``
   and *no* ``Origin`` (browsers omit Origin for ``no-cors`` GETs of
   ``<img>`` / ``<script>`` / ``<link>``), so the original Host/Origin
   gate let it through and the page could use the 200/403 outcome as
   a liveness oracle. Browsers stamp every HTTP request with
   ``Sec-Fetch-Site`` (``cross-site`` / ``same-site`` / ``same-origin``
   / ``none``). Native non-browser clients never send it. **The guard
   now rejects ``cross-site`` and ``same-site``** while still allowing
   ``none`` (top-level navigation: user typed URL, bookmark) and
   ``same-origin`` (loopback page fetching its own server). Missing →
   allow (native client), preserving the existing flow.

Drive-by: trailing-dot loopback (``Host: localhost.``) now accepted.
RFC 1034 valid FQDN syntax that browsers and curl can preserve through
to the Host header — friction trap if rejected, no security
implication either way.

`evaluate_loopback` now takes an optional ``sec_fetch_sites`` list so
both transports run the new policy through the same single helper —
preserves the audit-v2 invariant that the WS hook and HTTP middleware
cannot drift.

Tests:

- New parametrized helper coverage for ``is_allowed_sec_fetch_site``
  (friendly + foreign + case-insensitive + whitespace), trailing-dot
  hosts, and explicit ``Origin: null`` rejection.
- Two new ``evaluate_loopback`` cases pinning the cross-site rejection
  with no-Origin (the exact Copilot shape) and ``Origin: null`` with
  any Sec-Fetch-Site value.
- Three new ``LocalhostOnlyHTTPMiddleware`` middleware cases:
  ``Origin: null`` rejected, cross-site subresource rejected, top-level
  navigation accepted.
- Three new live-WS integration cases pinning the same rules end-to-end
  through the websockets ``process_request`` hook, plus a
  bracketed-IPv6 ``http://[::1]:9500`` symmetry test for the WS path.

Live smoke: native loopback connect → ``handshake_ack`` ✓; ``Origin: null``
WS connect → 403 ✓; ``Sec-Fetch-Site: cross-site`` WS connect → 403 ✓;
``Origin: null`` HTTP probe → 403 ✓; cross-site HTTP probe → 403 ✓.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants