Skip to content

Add stormtest concurrency/reload stress harness + docs#506

Closed
dsarno wants to merge 21 commits into
mainfrom
beta-clean
Closed

Add stormtest concurrency/reload stress harness + docs#506
dsarno wants to merge 21 commits into
mainfrom
beta-clean

Conversation

@dsarno

@dsarno dsarno commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds the stormtest stress-testing harness and its docs. (This branch began as a staging integration for #464/#500/#463/#421, but those all merged independently via #502/#503/#504/#505 — so it has been refreshed against main and its only remaining content is the stress harness below.)

What's here

  • script/stormtest.py — opens many concurrent MCP clients and fires rapid operations (optionally interleaved with plugin-reload churn) to shake out concurrency/reload races against a live editor.
  • docs/STRESS_TESTING.md — how to run it and interpret results.
  • AGENTS.md — pointer to the stress-testing docs from the Testing section.
  • .gitignore — ignores the _stormtest/ scratch dir.

A dev/test tool only — it does not touch the shipped plugin or server runtime.

Context

This harness is what surfaced #508 (fix: use update_file() instead of full scan() on single-file writes), already merged to main — the single-file-register change that avoids the WorkerThreadPool SIGABRT under concurrent script/file creation. beta-clean is now current with main (carries #508 + the four merged features as no-ops), so this PR's diff is purely the four files above.

Status

Unmerged — left for your call. No runtime risk (test tooling only).

https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw

claude and others added 12 commits May 31, 2026 17:39
Surface that uv ships in distro repos (pacman/apt/dnf) and Homebrew, so
users aren't required to pipe the astral.sh install script from the web.
Addresses #464.

https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
…500)

The game_eval not-ready path reused the 20s GAME_READY_WAIT_SEC tuned for
screenshots. That wait exceeds the 15s server-side game_eval command timeout,
so when a game isn't running/ready the editor polled for up to 20s while the
server gave up first with an opaque ~15s TimeoutError — the editor's
actionable "Is the game actually running?" error never reached the client.
This is #500's residual ~15s TimeoutError bucket.

Give eval its own EVAL_READY_WAIT_SEC (3s). Editor-side worst case is now
3s wait + 10s backstop = 13s, comfortably under the 15s server timeout, so the
fast actionable error always wins the race. The 20s wait stays for screenshots
(a freshly launched game needs longer to render a first frame).

Refs #500.

https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
The actionable hint pointed at a nonexistent 'play_scene' op; the MCP tool
that runs the game is project_run.

https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
…463)

Claude Code installed only as a VS Code/Cursor extension exposes no 'claude'
binary on PATH, so the CLI strategy reported it red and Configure failed —
even though 'claude mcp add --scope user' just writes an mcpServers entry into
~/.claude.json, which the user can (and did) add by hand.

Make claude_code a CLI-preferred client with a JSON fallback: it declares its
config location (~/.claude.json, mcpServers, type:http) and, when the binary
isn't resolvable, Configure/Remove/status route through the shared JSON
read-merge-write strategy instead. The CLI remains the primary path whenever it
resolves, so nothing changes for users who have it. The fallback only triggers
on a missing binary and preserves all other content in ~/.claude.json.

- _base.gd: has_json_fallback() capability; is_installed() honours it.
- claude_code.gd: declare path_template / server_key_path / entry_extra_fields.
- client_configurator.gd: route cli dispatch to JSON when the binary is absent.
- test_clients.gd: cover the fallback dispatch, descriptor, and semantics.

Closes #463.

https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
Code-review follow-up: a user without the claude binary (VS Code/Cursor
extension) who opens 'Run this manually' was shown only the unusable
'claude mcp add' line. For has_json_fallback clients, also surface the
config-file edit that auto-configure falls back to writing.

https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
Both transports are hard-locked to loopback with no escape hatch, so a remote
agent on a trusted LAN / VPN can't reach an editor running on a workstation
without an SSH tunnel or Tailscale. Add an explicit, scoped opt-in — not a
blanket --insecure switch.

--allow-host takes one or more CIDRs / bare IPs (repeatable or comma-separated).
When present:
  - both the HTTP and WebSocket transports bind off loopback (0.0.0.0), and
  - the DNS-rebinding guard widens its Host allowlist to those networks ONLY.

The Origin and Sec-Fetch-Site checks are deliberately left untouched, so
rebinding defense survives the opt-in: a browser on the LAN still sends a
non-loopback Origin (rejected) and a foreign Sec-Fetch-Site (rejected); a
native remote agent sends neither and passes once its Host IP is allowed. A
DNS name never parses to an IP, so it can't slip into an allowed network.

Default (no flag) is byte-for-byte the original loopback-only behavior. A bad
CIDR fails loudly at startup rather than silently exposing nothing (or worse).

Scope: this is the server-side core. The developer-mode-gated dock field and
the LAN-URL surfacing in the configurator's manual command (also in #421) are
left as a follow-up; the CLI flag is the functional, security-critical gate.

https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
Code-review follow-up: the bind was hardcoded 0.0.0.0 (IPv4-only), so an IPv6
--allow-host CIDR set the guard to accept IPv6 hosts but the socket never
listened on IPv6. Add a shared bind_host_for_networks() helper (single source
of truth for the HTTP, WebSocket, and reload binds, mirroring the shared
evaluate_loopback) that binds '::' when any requested network is IPv6 and
'0.0.0.0' for an IPv4-only allowlist.

https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
Closes the one patch-coverage gap codecov flagged: the bracketed-IPv6 Host
unwrap in _host_ip_in_networks was only exercised by IPv4 hosts. Add a test
matching [fd00::1] against an IPv6 CIDR (in-range allowed, out-of-range
rejected).

https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
@codecov

codecov Bot commented May 31, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

script/stormtest.py opens many concurrent MCP clients and fires rapid,
randomized tool calls across every domain at a live editor, with periodic
editor_reload_plugin churn mixed in. It's a robustness harness (not a
correctness test): it answers whether the editor + plugin + WebSocket
dispatcher + server survive sustained concurrent abuse and reload cycles
without crashing, and surfaces per-tool latency/error hot-spots.

- 8 workers x 5 waves x 25 calls/wave by default (~1000 calls); brutal mode
  via SS_WORKERS/SS_WAVES env (~9000). Reads-only smoke via SS_RELOAD=0.
- Workers route to the active session and follow the session-id rotation a
  reload causes; writes are namespaced per-worker under a scratch scene so
  they never touch the project's real scene.
- Per-call timeout so a reload that severs the response can't wedge the run;
  full JSON snapshot flushed every few seconds (survives crash/kill).
- Docs in docs/STRESS_TESTING.md; pointer added to AGENTS.md Testing section.
- _stormtest/ scratch added to .gitignore.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
trial added 3 commits June 1, 2026 12:26
…ind (#421)

Addresses Copilot review on #505:

1. The WebSocket server (port 9500) is the LOCAL editor<->server bridge — the
   editor connects via ws://127.0.0.1 and remote agents reach us over HTTP
   only. --allow-host must not widen the WS bind: doing so exposed the
   unauthenticated plugin WS to the LAN, and binding '::' (IPv6-only by default
   on Windows) would break the editor's IPv4 loopback connection. WS is now
   always 127.0.0.1, regardless of --allow-host. Only the HTTP transport widens.

2. bind_host_for_networks() no longer assumes '::' is dual-stack (it isn't on
   Windows). It now prioritizes IPv4 reachability: any IPv4 in the allowlist
   binds 0.0.0.0 (reachable on every platform); '::' only for an IPv6-only
   allowlist. A mixed allowlist no longer silently drops IPv4 reachability.

https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
trial and others added 4 commits June 1, 2026 12:30
…ady error (#500)

Addresses Copilot review on #503:
- The EVAL_READY_WAIT_SEC doc referenced 'handlers/editor.py' and lumped
  dispatcher.gd's 15000ms budget as 'server-side'. Correct to the full path
  (src/godot_ai/handlers/editor.py) and note dispatcher.gd's budget is
  editor/plugin-side; the 15s ceiling is enforced at both layers.
- The not-ready error was optimized for 'game not running' but the same
  symptom occurs when the _mcp_game_helper autoload is missing/disabled.
  Point users to Project Settings → Autoload too (mirrors the screenshot path).

https://claude.ai/code/session_01Jq5X4ivngAf1N6r5UX2BVw
The Python handler is particle_set_process(node_path, properties); the
harness passed `path`, producing an unexpected-kwarg INVALID_PARAMS on
every call. Validated: set_process now succeeds (auto-creates the
ParticleProcessMaterial). No-reload pass at ~4.7% error, remaining
INVALID_PARAMS are intermittent name collisions only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dsarno dsarno changed the title Integrate #464 / #500 / #463 / #421 for staged smoke testing Add stormtest concurrency/reload stress harness + docs Jun 1, 2026
The harness logic is platform-agnostic (res:// virtual paths, tempfile-based
report path), but the documented invocation is POSIX. Call out the venv
interpreter, serve-this-worktree bash dependency, and $TMPDIR -> %TEMP% for
agents running stormtest on Windows. Resilience tracked in #509.
@dsarno

dsarno commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Superseded. The four feature PRs this integration branch staged — #502 (#464), #503 (#500), #504 (#463), #505 (#421) — merged to main individually. The stormtest harness now has its own clean PR (#510). Combined smoke testing also surfaced an EditorFileSystem scan-stacking crash, fixed in #508 (engine record: dsarno/godot#6).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants