Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ AI Client → MCP (stdio/sse/streamable-http) → Python FastMCP server → WebS
- `integration/` — WebSocket server + mock Godot plugin, MCP tools, rollups
- `script/` — dev and CI scripts
- `setup-dev` / `setup-dev.ps1` / `verify-worktree` — dev environment + worktree health
- `serve-this-worktree` / `open-godot-here` — point dev server / editor at the current worktree
- `serve-this-worktree` (bash) / `serve-this-worktree.py` (cross-platform) / `open-godot-here` — point dev server / editor at the current worktree
- `local-self-update-smoke` — interactive local fixture for self-update changes
- `ci-start-server`, `ci-godot-tests`, `ci-reload-test`, `ci-quit-test`, `ci-check-gdscript` — CI scripts
- `ci-find-regression-range` — helper for identifying CI regression windows
Expand Down Expand Up @@ -93,7 +93,7 @@ Assistant sessions may run in git worktrees. Claude Code commonly uses `.claude/

- **File paths**: Your working directory is the worktree, not the repo root. Files you create live in that worktree.
- **Godot editor**: The editor runs against a specific worktree's `test_project/`. The plugin is symlinked from that worktree's `plugin/` directory. Check `session_list` — the `project_path` field tells you which worktree the editor is using.
- **Dev server**: The plugin-managed server (auto-spawned on editor start, no `--reload`) uses the root repo's `.venv` and `src/`. Python code changes in a worktree won't take effect there unless the root repo also has them. Two ways to serve the worktree's own Python source: (a) click **Start Dev Server** in the dock — it walks up from `res://` to find a sibling `src/godot_ai/` and auto-sets `PYTHONPATH` to that tree's `src/` before spawning `--reload`; (b) run `script/serve-this-worktree` from a terminal for the same effect outside the editor.
- **Dev server**: The plugin-managed server (auto-spawned on editor start, no `--reload`) uses the root repo's `.venv` and `src/`. Python code changes in a worktree won't take effect there unless the root repo also has them. Two ways to serve the worktree's own Python source: (a) click **Start Dev Server** in the dock — it walks up from `res://` to find a sibling `src/godot_ai/` and auto-sets `PYTHONPATH` to that tree's `src/` before spawning `--reload`; (b) run `script/serve-this-worktree` (POSIX) or `python script/serve-this-worktree.py` (any OS) from a terminal for the same effect outside the editor.
- **Passing info between sessions**: When writing prompts, handoff notes, or file references intended for another session, **always include the full worktree path** or specify the worktree name. Relative paths like `docs/friction-log.md` are ambiguous — a different session may be in a different worktree or on `main`. Use the absolute path.
- **Merging**: Worktree branches must be merged to `main` and pulled into other worktrees for changes to propagate. The plugin symlink means GDScript changes propagate within the same worktree immediately, but not across worktrees.

Expand Down Expand Up @@ -280,15 +280,18 @@ Test suites extend `McpTestSuite` (assertion methods: `assert_true`, `assert_eq`
`stormtest` opens many concurrent MCP clients and fires rapid, randomized tool calls across **every** domain at a live editor, with periodic `editor_reload_plugin` churn mixed in. It's a robustness test, not a correctness test: it answers "does the editor + plugin + WebSocket dispatcher + server survive sustained concurrent abuse and reload cycles without crashing?" and surfaces per-tool latency/error hot-spots. Use it after changes to the dispatcher, transport, readiness gating, session routing, or the reload/handoff path. Full reference: `docs/STRESS_TESTING.md`.

```bash
.venv/bin/python script/stormtest.py # ≈ 1000 calls, with reload churn
SS_WORKERS=12 SS_WAVES=30 .venv/bin/python script/stormtest.py # brutal ≈ 9000 calls
SS_RELOAD=0 .venv/bin/python script/stormtest.py # reads-only smoke, no reloads
SS_URL=http://127.0.0.1:8010/mcp .venv/bin/python script/stormtest.py # target another stack
python script/stormtest.py # ≈ 1000 calls, with reload churn
SS_WORKERS=12 SS_WAVES=30 python script/stormtest.py # brutal ≈ 9000 calls
SS_RELOAD=0 python script/stormtest.py # reads-only smoke, no reloads
SS_URL=http://127.0.0.1:8010/mcp python script/stormtest.py # target another stack
```

To stress a *branch's* code (plugin + server), point a Godot editor at that worktree's `test_project/` and serve its `src/` via `script/serve-this-worktree` (external server, so `editor_reload_plugin` exercises reload without killing the server), then run stormtest against it. A full JSON snapshot lands in `$TMPDIR/stormtest_report.json` (override with `SS_REPORT`), flushed every few seconds so a crash/kill still leaves data. A small `EDITOR_NOT_READY` / `NODE_NOT_FOUND` / `CONNECTION` error rate is expected noise under concurrency + reloads — watch instead for the process dying, a reload that never recovers, or one op with pathological error/latency.
(`python script/stormtest.py` re-execs into the project `.venv` on every OS;
`SS_NO_REEXEC=1` opts out.)

On Windows, a reads-dominant run (`SS_RELOAD=0`) works, but **reload churn (`SS_RELOAD=1`, the default) currently wedges the harness** and the external-server mitigation doesn't apply (`serve-this-worktree` is bash-only; a hand-started external `--reload` server gets killed by the reload). The editor itself survives reloads — only the harness hangs. Use `.venv\Scripts\python.exe` and note `$TMPDIR` is `%TEMP%`. See the "Windows / cross-platform notes" callout in `docs/STRESS_TESTING.md` (issues #513 / #514; resilience tracked in #509).
To stress a *branch's* code (plugin + server), point a Godot editor at that worktree's `test_project/` and serve its `src/` via `script/serve-this-worktree` (POSIX) or `python script/serve-this-worktree.py` (any OS) — an external server, so `editor_reload_plugin` exercises reload without killing the server — then run stormtest against it. A full JSON snapshot lands in `$TMPDIR/stormtest_report.json` (override with `SS_REPORT`), flushed every few seconds so a crash/kill still leaves data. A small `EDITOR_NOT_READY` / `NODE_NOT_FOUND` / `CONNECTION` error rate is expected noise under concurrency + reloads — watch instead for the process dying, a reload that never recovers, or one op with pathological error/latency.

On Windows, `python script/stormtest.py` re-execs into the `.venv` automatically (no `.venv\Scripts\python.exe` step) and `python script/serve-this-worktree.py` serves a worktree without bash. A reads-dominant run (`SS_RELOAD=0`) works, but **reload churn (`SS_RELOAD=1`, the default) still wedges the harness** and a hand-started external `--reload` server still gets killed by the reload. The editor itself survives reloads — only the harness hangs; note `$TMPDIR` is `%TEMP%`. See the "Windows / cross-platform notes" callout in `docs/STRESS_TESTING.md` (issues #513 / #514).

**Guardrails built into the test runner:**
- **Zero-assertion detection**: Tests that complete with 0 assertions are flagged as failures ("Test completed with 0 assertions — likely skipped its logic"). This catches tests that silently `return` before asserting anything.
Expand Down
60 changes: 32 additions & 28 deletions docs/STRESS_TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,61 +44,64 @@ under load and across the disable→extract→enable reload window.

## Running

The target editor's MCP server must be reachable (default `:8000`). For a true
test of a branch's code, point the editor at that branch's worktree and serve
that worktree's `src/` (see `script/serve-this-worktree`), so both the GDScript
plugin and the Python server are the code under test.
The target editor's MCP server must be reachable (default `:8000`). `python
script/stormtest.py` works on every OS — it re-execs into the project `.venv`
automatically, so there's no `.venv/bin/python` vs `.venv\Scripts\python.exe`
split (override with `SS_NO_REEXEC=1`). For a true test of a branch's code,
point the editor at that branch's worktree and serve that worktree's `src/`
(`python script/serve-this-worktree.py`, or `script/serve-this-worktree` on
POSIX), so both the GDScript plugin and the Python server are the code under
test.

```bash
# default ≈ 1000 calls, with reload churn, against localhost:8000
.venv/bin/python script/stormtest.py
python script/stormtest.py

# brutal ≈ 9000 calls
SS_WORKERS=12 SS_WAVES=30 .venv/bin/python script/stormtest.py
SS_WORKERS=12 SS_WAVES=30 python script/stormtest.py

# reads-only smoke, no reloads
SS_RELOAD=0 SS_WORKERS=4 SS_WAVES=3 .venv/bin/python script/stormtest.py
SS_RELOAD=0 SS_WORKERS=4 SS_WAVES=3 python script/stormtest.py

# target a server on another port / host
SS_URL=http://127.0.0.1:8010/mcp .venv/bin/python script/stormtest.py
SS_URL=http://127.0.0.1:8010/mcp python script/stormtest.py
Comment thread
dsarno marked this conversation as resolved.
```

### Windows / cross-platform notes

> ⚠️ **Running on Windows? Reads/writes work; reload churn does not (yet).**
>
> **Invocation is now identical on every OS.** `python script/stormtest.py`
> re-execs into the project `.venv` automatically (override with
> `SS_NO_REEXEC=1`), and `python script/serve-this-worktree.py` is a
> cross-platform (no-`bash`/no-`lsof`) way to serve a worktree's `src/` with
> `--reload` — extra args like `--ws-port` pass straight through. The report
> still lands in the platform temp dir (`%TEMP%` on Windows); pass an explicit
> `SS_REPORT=…` for a known location, and prefer forward slashes (Python accepts
> them on Windows and they dodge backslash-escaping surprises).
>
> **Concurrent reads/writes are fine.** The harness *logic* is platform-agnostic:
> the in-editor scratch paths (`res://_stormtest/…`) use Godot's virtual
> filesystem, and the report path goes through `tempfile.gettempdir()` /
> `os.path.join`, so both resolve correctly on every OS. A reads-dominant run
> (`SS_RELOAD=0`) is clean on Windows.
>
> **Reload churn (`SS_RELOAD=1`, the default) currently does NOT work on Windows**
> — two independent problems, both tracked:
> **Reload churn (`SS_RELOAD=1`, the default) still does NOT work on Windows** —
> two independent problems, both tracked, and *not* addressed by the pathing work
> above:
> - Against a plugin-managed server, the first `editor_reload_plugin` **wedges the
> harness**: the asyncio loop stalls past `CALL_TIMEOUT` when the server is
> killed mid-reload under concurrent load. The *editor* survives fine (it
> reloads and re-registers a new session) — only the harness hangs. See
> [#513](https://github.com/hi-godot/godot-ai/issues/513).
> - The "run the server externally" mitigation **also fails on Windows**:
> `script/serve-this-worktree` is bash-only (no PowerShell port, and it never
> passes `--ws-port`), and even a hand-started external `--reload` server gets
> **killed by the reload** (`_stop_server` takes down the port owner with no
> respawn). See [#514](https://github.com/hi-godot/godot-ai/issues/514).
> - The "run the server externally" mitigation still doesn't fully hold: even a
> correctly-launched external `--reload` server gets **killed by the reload**
> (`_stop_server` takes down the port owner with no respawn). See
> [#514](https://github.com/hi-godot/godot-ai/issues/514).
>
> Until those land, validate reload survival on Windows with a *single-threaded*
> reload loop (reload → reconnect → confirm a new `session_id`) rather than the
> concurrent churn mode.
>
> **Invocation also differs (POSIX → Windows):**
> - **venv interpreter** — use `.venv\Scripts\python.exe`, not `.venv/bin/python`.
> - **`$TMPDIR`** in the examples is POSIX; the report lands in the platform temp
> dir (`%TEMP%` on Windows). Pass an explicit `SS_REPORT=…` for a known
> location, and prefer forward slashes (Python accepts them on Windows and they
> dodge backslash-escaping surprises).
>
> Making the tooling resilient enough to drop this heads-up is tracked in
> [#509](https://github.com/hi-godot/godot-ai/issues/509).

### Knobs (env)

Expand Down Expand Up @@ -146,6 +149,7 @@ comes back** (managed-server-killed; recovery time unbounded), or one op with a
**pathologically high error rate or latency** that points at a real regression.

> If the target server is plugin-managed (auto-spawned), a reload may kill it
> and not return — run the server **externally** (e.g. `serve-this-worktree`,
> which uses `--reload`) so `editor_reload_plugin` exercises the plugin reload
> without taking the server down with it.
> and not return — run the server **externally** (`python
> script/serve-this-worktree.py`, or `script/serve-this-worktree` on POSIX; both
> use `--reload`) so `editor_reload_plugin` exercises the plugin reload without
> taking the server down with it.
Loading
Loading