Add stormtest concurrency/reload stress harness + docs#506
Closed
dsarno wants to merge 21 commits into
Closed
Conversation
Surface that uv ships in distro repos (pacman/apt/dnf) and Homebrew, so users aren't required to pipe the astral.sh install script from the web. Addresses #464. https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
…500) The game_eval not-ready path reused the 20s GAME_READY_WAIT_SEC tuned for screenshots. That wait exceeds the 15s server-side game_eval command timeout, so when a game isn't running/ready the editor polled for up to 20s while the server gave up first with an opaque ~15s TimeoutError — the editor's actionable "Is the game actually running?" error never reached the client. This is #500's residual ~15s TimeoutError bucket. Give eval its own EVAL_READY_WAIT_SEC (3s). Editor-side worst case is now 3s wait + 10s backstop = 13s, comfortably under the 15s server timeout, so the fast actionable error always wins the race. The 20s wait stays for screenshots (a freshly launched game needs longer to render a first frame). Refs #500. https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
The actionable hint pointed at a nonexistent 'play_scene' op; the MCP tool that runs the game is project_run. https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
…463) Claude Code installed only as a VS Code/Cursor extension exposes no 'claude' binary on PATH, so the CLI strategy reported it red and Configure failed — even though 'claude mcp add --scope user' just writes an mcpServers entry into ~/.claude.json, which the user can (and did) add by hand. Make claude_code a CLI-preferred client with a JSON fallback: it declares its config location (~/.claude.json, mcpServers, type:http) and, when the binary isn't resolvable, Configure/Remove/status route through the shared JSON read-merge-write strategy instead. The CLI remains the primary path whenever it resolves, so nothing changes for users who have it. The fallback only triggers on a missing binary and preserves all other content in ~/.claude.json. - _base.gd: has_json_fallback() capability; is_installed() honours it. - claude_code.gd: declare path_template / server_key_path / entry_extra_fields. - client_configurator.gd: route cli dispatch to JSON when the binary is absent. - test_clients.gd: cover the fallback dispatch, descriptor, and semantics. Closes #463. https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
Code-review follow-up: a user without the claude binary (VS Code/Cursor extension) who opens 'Run this manually' was shown only the unusable 'claude mcp add' line. For has_json_fallback clients, also surface the config-file edit that auto-configure falls back to writing. https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
Both transports are hard-locked to loopback with no escape hatch, so a remote agent on a trusted LAN / VPN can't reach an editor running on a workstation without an SSH tunnel or Tailscale. Add an explicit, scoped opt-in — not a blanket --insecure switch. --allow-host takes one or more CIDRs / bare IPs (repeatable or comma-separated). When present: - both the HTTP and WebSocket transports bind off loopback (0.0.0.0), and - the DNS-rebinding guard widens its Host allowlist to those networks ONLY. The Origin and Sec-Fetch-Site checks are deliberately left untouched, so rebinding defense survives the opt-in: a browser on the LAN still sends a non-loopback Origin (rejected) and a foreign Sec-Fetch-Site (rejected); a native remote agent sends neither and passes once its Host IP is allowed. A DNS name never parses to an IP, so it can't slip into an allowed network. Default (no flag) is byte-for-byte the original loopback-only behavior. A bad CIDR fails loudly at startup rather than silently exposing nothing (or worse). Scope: this is the server-side core. The developer-mode-gated dock field and the LAN-URL surfacing in the configurator's manual command (also in #421) are left as a follow-up; the CLI flag is the functional, security-critical gate. https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
Code-review follow-up: the bind was hardcoded 0.0.0.0 (IPv4-only), so an IPv6 --allow-host CIDR set the guard to accept IPv6 hosts but the socket never listened on IPv6. Add a shared bind_host_for_networks() helper (single source of truth for the HTTP, WebSocket, and reload binds, mirroring the shared evaluate_loopback) that binds '::' when any requested network is IPv6 and '0.0.0.0' for an IPv4-only allowlist. https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
Closes the one patch-coverage gap codecov flagged: the bracketed-IPv6 Host unwrap in _host_ip_in_networks was only exercised by IPv4 hosts. Add a test matching [fd00::1] against an IPv6 CIDR (in-range allowed, out-of-range rejected). https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
script/stormtest.py opens many concurrent MCP clients and fires rapid, randomized tool calls across every domain at a live editor, with periodic editor_reload_plugin churn mixed in. It's a robustness harness (not a correctness test): it answers whether the editor + plugin + WebSocket dispatcher + server survive sustained concurrent abuse and reload cycles without crashing, and surfaces per-tool latency/error hot-spots. - 8 workers x 5 waves x 25 calls/wave by default (~1000 calls); brutal mode via SS_WORKERS/SS_WAVES env (~9000). Reads-only smoke via SS_RELOAD=0. - Workers route to the active session and follow the session-id rotation a reload causes; writes are namespaced per-worker under a scratch scene so they never touch the project's real scene. - Per-call timeout so a reload that severs the response can't wedge the run; full JSON snapshot flushed every few seconds (survives crash/kill). - Docs in docs/STRESS_TESTING.md; pointer added to AGENTS.md Testing section. - _stormtest/ scratch added to .gitignore. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5 tasks
added 3 commits
June 1, 2026 12:26
…ind (#421) Addresses Copilot review on #505: 1. The WebSocket server (port 9500) is the LOCAL editor<->server bridge — the editor connects via ws://127.0.0.1 and remote agents reach us over HTTP only. --allow-host must not widen the WS bind: doing so exposed the unauthenticated plugin WS to the LAN, and binding '::' (IPv6-only by default on Windows) would break the editor's IPv4 loopback connection. WS is now always 127.0.0.1, regardless of --allow-host. Only the HTTP transport widens. 2. bind_host_for_networks() no longer assumes '::' is dual-stack (it isn't on Windows). It now prioritizes IPv4 reachability: any IPv4 in the allowlist binds 0.0.0.0 (reachable on every platform); '::' only for an IPv6-only allowlist. A mixed allowlist no longer silently drops IPv4 reachability. https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
…d fix) into beta-clean
…ady error (#500) Addresses Copilot review on #503: - The EVAL_READY_WAIT_SEC doc referenced 'handlers/editor.py' and lumped dispatcher.gd's 15000ms budget as 'server-side'. Correct to the full path (src/godot_ai/handlers/editor.py) and note dispatcher.gd's budget is editor/plugin-side; the 15s ceiling is enforced at both layers. - The not-ready error was optimized for 'game not running' but the same symptom occurs when the _mcp_game_helper autoload is missing/disabled. Point users to Project Settings → Autoload too (mirrors the screenshot path). https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
The Python handler is particle_set_process(node_path, properties); the harness passed `path`, producing an unexpected-kwarg INVALID_PARAMS on every call. Validated: set_process now succeeds (auto-creates the ParticleProcessMaterial). No-reload pass at ~4.7% error, remaining INVALID_PARAMS are intermittent name collisions only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The harness logic is platform-agnostic (res:// virtual paths, tempfile-based report path), but the documented invocation is POSIX. Call out the venv interpreter, serve-this-worktree bash dependency, and $TMPDIR -> %TEMP% for agents running stormtest on Windows. Resilience tracked in #509.
Contributor
Author
|
Superseded. The four feature PRs this integration branch staged — #502 (#464), #503 (#500), #504 (#463), #505 (#421) — merged to main individually. The stormtest harness now has its own clean PR (#510). Combined smoke testing also surfaced an EditorFileSystem scan-stacking crash, fixed in #508 (engine record: dsarno/godot#6). |
This was referenced Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the
stormteststress-testing harness and its docs. (This branch began as a staging integration for #464/#500/#463/#421, but those all merged independently via #502/#503/#504/#505 — so it has been refreshed againstmainand its only remaining content is the stress harness below.)What's here
script/stormtest.py— opens many concurrent MCP clients and fires rapid operations (optionally interleaved with plugin-reload churn) to shake out concurrency/reload races against a live editor.docs/STRESS_TESTING.md— how to run it and interpret results.AGENTS.md— pointer to the stress-testing docs from the Testing section..gitignore— ignores the_stormtest/scratch dir.A dev/test tool only — it does not touch the shipped plugin or server runtime.
Context
This harness is what surfaced #508 (
fix: use update_file() instead of full scan() on single-file writes), already merged tomain— the single-file-register change that avoids theWorkerThreadPoolSIGABRT under concurrent script/file creation.beta-cleanis now current withmain(carries #508 + the four merged features as no-ops), so this PR's diff is purely the four files above.Status
Unmerged — left for your call. No runtime risk (test tooling only).
https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw