|
| 1 | +# Stress testing — `script/stormtest.py` |
| 2 | + |
| 3 | +`stormtest` is a concurrency + reload stress harness. It opens many MCP client |
| 4 | +connections at once and fires rapid, randomized tool calls across **every** |
| 5 | +domain at a live Godot editor, periodically triggering `editor_reload_plugin` |
| 6 | +mid-run. It is not a correctness test — it answers two questions: |
| 7 | + |
| 8 | +1. **Does the stack survive sustained concurrent abuse + reload churn without |
| 9 | + crashing?** (editor process, GDScript plugin, WebSocket dispatcher, server) |
| 10 | +2. **Where are the latency / error hot-spots per tool?** |
| 11 | + |
| 12 | +It complements the deterministic suites (`pytest`, `test_run`): those check |
| 13 | +that each tool is *correct*; stormtest checks that the whole stack is *robust* |
| 14 | +under load and across the disable→extract→enable reload window. |
| 15 | + |
| 16 | +## What it does |
| 17 | + |
| 18 | +- **N parallel workers**, each its own `fastmcp.Client` connection (default 8). |
| 19 | +- Workers route to the **active session** (empty `session_id`), so when a |
| 20 | + reload rotates the session id they automatically follow the new one. |
| 21 | +- **Reads dominate** the op mix (like real traffic); **writes exercise every |
| 22 | + domain** — node/scene/script/batch/material/theme/resource/camera/particle/ |
| 23 | + audio/animation/input_map/signal/filesystem. |
| 24 | +- Each worker namespaces its writes under `<scene_root>/wN/...` so workers |
| 25 | + hammer one shared edited scene without colliding on node paths. |
| 26 | +- **Worker 0 is the "chaos" worker**: every `SS_RELOAD_EVERY` waves it fires |
| 27 | + `editor_reload_plugin` instead of a normal burst, then reconnects (and |
| 28 | + reopens the scratch scene). The other workers keep hammering through the |
| 29 | + reload window and reconnect on the connection drop. |
| 30 | +- All disk artifacts (scratch scripts/resources/scene) land under |
| 31 | + `res://_stormtest/` in whatever project the target editor has open — scratch |
| 32 | + material that's safe to delete afterward. |
| 33 | + |
| 34 | +## Safety |
| 35 | + |
| 36 | +- Operates in a throwaway scratch scene (`res://_stormtest/storm.tscn`), **not** |
| 37 | + the project's real scene; restores the originally-open scene on teardown. |
| 38 | +- Never calls `project_run`, so it can't autosave-pollute the real scene. |
| 39 | +- A full JSON snapshot is flushed to `stormtest_report.json` (in `$TMPDIR`, |
| 40 | + overridable via `SS_REPORT`) **every few seconds**, so a crash or a kill mid- |
| 41 | + run still leaves analyzable data (this is deliberate — an earlier version |
| 42 | + lost its metrics to a `SIGKILL`). |
| 43 | +- It does **not** clear logs (a diagnostic must not destroy its own evidence). |
| 44 | + |
| 45 | +## Running |
| 46 | + |
| 47 | +The target editor's MCP server must be reachable (default `:8000`). For a true |
| 48 | +test of a branch's code, point the editor at that branch's worktree and serve |
| 49 | +that worktree's `src/` (see `script/serve-this-worktree`), so both the GDScript |
| 50 | +plugin and the Python server are the code under test. |
| 51 | + |
| 52 | +```bash |
| 53 | +# default ≈ 1000 calls, with reload churn, against localhost:8000 |
| 54 | +.venv/bin/python script/stormtest.py |
| 55 | + |
| 56 | +# brutal ≈ 9000 calls |
| 57 | +SS_WORKERS=12 SS_WAVES=30 .venv/bin/python script/stormtest.py |
| 58 | + |
| 59 | +# reads-only smoke, no reloads |
| 60 | +SS_RELOAD=0 SS_WORKERS=4 SS_WAVES=3 .venv/bin/python script/stormtest.py |
| 61 | + |
| 62 | +# target a server on another port / host |
| 63 | +SS_URL=http://127.0.0.1:8010/mcp .venv/bin/python script/stormtest.py |
| 64 | +``` |
| 65 | + |
| 66 | +### Knobs (env) |
| 67 | + |
| 68 | +| Var | Default | Meaning | |
| 69 | +|---|---|---| |
| 70 | +| `SS_WORKERS` | 8 | parallel client connections | |
| 71 | +| `SS_WAVES` | 5 | waves per worker | |
| 72 | +| `SS_CALLS` | 25 | calls per worker per wave | |
| 73 | +| `SS_RELOAD` | 1 | include `editor_reload_plugin` churn (`0` to skip) | |
| 74 | +| `SS_RELOAD_EVERY` | 2 | chaos worker reloads every N waves | |
| 75 | +| `SS_RECONNECT_TIMEOUT` | 30 | seconds to wait for the server to return after a reload | |
| 76 | +| `SS_URL` | `http://127.0.0.1:8000/mcp` | target MCP endpoint | |
| 77 | +| `SS_REPORT` | `$TMPDIR/stormtest_report.json` | where to write the JSON snapshot | |
| 78 | + |
| 79 | +Total calls ≈ `WORKERS × WAVES × CALLS` minus the chaos worker's reload waves. |
| 80 | + |
| 81 | +## Reading the result |
| 82 | + |
| 83 | +On exit (or `Ctrl-C` / `SIGTERM` — it has a graceful handler) it prints: |
| 84 | + |
| 85 | +- **final verdict**: `EDITOR ALIVE` vs `EDITOR DEAD/UNREACHABLE` |
| 86 | +- throughput (calls/sec), ok/err totals |
| 87 | +- **reloads survived / attempted** and per-reload **recovery time** (wall-clock |
| 88 | + to reconnect) |
| 89 | +- overall **latency** p50 / p95 / max |
| 90 | +- **error-code histogram** (e.g. `EDITOR_NOT_READY`, `NODE_NOT_FOUND`, |
| 91 | + `INVALID_PARAMS`, `CONNECTION`) |
| 92 | +- **per-op table**: ok/err counts, p50/p95/max latency, and the error codes for |
| 93 | + that op |
| 94 | + |
| 95 | +The same data, plus more, is in `stormtest_report.json`. |
| 96 | + |
| 97 | +### Expected (healthy) error noise |
| 98 | + |
| 99 | +A small error rate is normal and *not* a failure: |
| 100 | + |
| 101 | +- `EDITOR_NOT_READY` — transient, during reload windows or play-state changes. |
| 102 | +- `NODE_NOT_FOUND` — concurrent-delete races (one worker deletes a node another |
| 103 | + was about to touch); expected under concurrency. |
| 104 | +- `CONNECTION` — during the reload disable→enable window before reconnect. |
| 105 | + |
| 106 | +What you're watching for instead: the editor **process dying** (verdict flips to |
| 107 | +`DEAD`, a flood of `CONNECTION` that never recovers), a reload that **never |
| 108 | +comes back** (managed-server-killed; recovery time unbounded), or one op with a |
| 109 | +**pathologically high error rate or latency** that points at a real regression. |
| 110 | + |
| 111 | +> If the target server is plugin-managed (auto-spawned), a reload may kill it |
| 112 | +> and not return — run the server **externally** (e.g. `serve-this-worktree`, |
| 113 | +> which uses `--reload`) so `editor_reload_plugin` exercises the plugin reload |
| 114 | +> without taking the server down with it. |
0 commit comments