fix(game_eval): carve the not-ready path out into EVAL_GAME_NOT_READY (#518)#519
Merged
Conversation
…#518) The opaque INTERNAL_ERROR rate on editor_manage(game_eval) looked like it regressed 1.42% (2.5.10) -> 6.15% (2.6.0). Splitting the bucket by duration_ms shows it is NOT a regression of the #487/#488 opaque hang: - The genuine ~10s editor-backstop hang is flat (~2.8%). No runtime code changed between 2.5.10 and 2.5.13 (only version strings), and the 2.5.10 "1.42%" baseline rested on a thin 9-install / 1.5k-call cohort. - The 2.6.0 jump is #500's not-ready path: capping the readiness wait at 3s converts what used to surface as a ~15s server TimeoutError into a fast ~3s INTERNAL_ERROR (TimeoutError collapsed 330 -> 3 in the data). Those share the INTERNAL_ERROR code, so they masquerade as the opaque hang. Carve that fast, caller-actionable failure into its own EVAL_GAME_NOT_READY code -- the same split #490/#491 made for compile/runtime errors, and the issue's own Suggested Action #2. The condition is specifically "the play session is up (editor_handler's is_playing_scene gate passed) but the game-side capture never registered within the wait" -- a boot-window race (worst on Windows) or a missing/disabled _mcp_game_helper autoload. This RELABELS; it does not change eval timing or reduce failures. The INTERNAL_ERROR telemetry bucket once again means "the eval truly hung" (~2.8%), and the not-ready volume becomes its own measurable, caller-side bucket. Reclassifies the two debugger-plugin not-ready sites (_wait_then_eval readiness-wait + _send_eval no-session); the higher-volume game_command / screenshot paths are intentionally left untouched. No timing machinery touched. Refs #518. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
- Move the no-session test's skip-guard BEFORE _send_eval. The old guard checked conn.captured AFTER the call, so if a session were ever present on the bare plugin, _send_eval would already have armed timers and sent a real mcp:eval into the running game before the test could bail. Gate on _first_active_session() up front instead (precondition, not post-hoc) so the test can never have side effects. - tools/editor.py: note EVAL_GAME_NOT_READY also covers a missing/disabled _mcp_game_helper autoload, not just "still launching" -- matches the fuller handlers/editor.py wording so an LLM caller doesn't retry forever when the real fix is enabling the autoload. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a dedicated EVAL_GAME_NOT_READY error code to reclassify a fast “play session is up, but game-side capture/session isn’t ready” failure mode in game_eval, preventing it from being miscounted as an opaque INTERNAL_ERROR hang in telemetry.
Changes:
- Add
EVAL_GAME_NOT_READYto the shared protocol/plugin error-code sets (Python + GDScript). - Reclassify two
game_evalnot-ready sites in the editor debugger bridge (_wait_then_evalexpiry and_send_evalno-session) to returnEVAL_GAME_NOT_READY. - Extend unit/test coverage (Python + GDScript) to assert the new error code and behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_game_eval.py | Asserts EVAL_GAME_NOT_READY exists in the Python ErrorCode enum. |
| test_project/tests/test_game_eval_errors.gd | Adds a behavioral test for _send_eval returning EVAL_GAME_NOT_READY when no debugger session exists. |
| src/godot_ai/tools/editor.py | Updates tool help text to document the new error code and its meaning. |
| src/godot_ai/protocol/errors.py | Adds ErrorCode.EVAL_GAME_NOT_READY to the protocol enum. |
| src/godot_ai/handlers/editor.py | Updates game_eval handler docstring to document EVAL_GAME_NOT_READY. |
| plugin/addons/godot_ai/utils/error_codes.gd | Adds EVAL_GAME_NOT_READY to the shared GDScript error-code constants. |
| plugin/addons/godot_ai/debugger/mcp_debugger_plugin.gd | Returns EVAL_GAME_NOT_READY instead of INTERNAL_ERROR for the two confirmed not-ready branches. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+436
to
442
| ## #518: EVAL_GAME_NOT_READY (not INTERNAL_ERROR) — the play session is up | ||
| ## but the game-side capture didn't register within the short wait. Fast | ||
| ## and caller-actionable; classifying it apart from the opaque 10s hang | ||
| ## keeps the INTERNAL_ERROR telemetry bucket meaning "the eval truly hung". | ||
| _send_error(connection, request_id, ErrorCodes.EVAL_GAME_NOT_READY, | ||
| "Game-side autoload never registered its debugger capture within %ds. Is the game actually running? Start it with project_run / the editor's Play button, then retry. If it IS running, check Project Settings → Autoload for _mcp_game_helper (added automatically when the plugin is enabled)." % int(EVAL_READY_WAIT_SEC)) | ||
| return |
Comment on lines
455
to
459
| ## #518: same not-ready condition as _wait_then_eval (capture reported | ||
| ## ready but no live debugger session), so the same caller-actionable code. | ||
| _send_error(connection, request_id, ErrorCodes.EVAL_GAME_NOT_READY, | ||
| "No active debugger session — is the game actually running?") | ||
| return |
Both not-ready paths are reached only after editor_handler already gated game_eval on is_playing_scene(), so "is the game running? start it with the Play button" was misleading — the session is up. Reword each to match what it actually means: - _wait_then_eval (capture never registered within the wait): the play session is up, so the game is most likely still booting → wait and retry; if it persists, the _mcp_game_helper autoload is missing/disabled or the game uses a custom main loop. - _send_eval (capture was ready but no live session): the game just stopped or is restarting → confirm it's running and retry. No autoload hint here — capture had already registered, so the autoload is present. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
#518 reports the opaque
INTERNAL_ERRORoneditor_manage(game_eval)"regressing" 1.42% (2.5.10) →6.15% (2.6.0). I split the bucket by
duration_ms(telemetry stores latency,not the message) and it is not a regression of the #487/#488 opaque hang —
it's a thin-baseline cohort artifact plus a #500 reclassification. This PR fixes
the one real, high-confidence problem: the not-ready failure is mis-coded as
INTERNAL_ERROR, so it masquerades as the opaque hang.The data (INTERNAL_ERROR split by latency)
pathological install (
2db6f78c, 15% hang rate, 42% of all 2.6.0 hangs);excluding it, 2.78% → 2.70%.
between those tags; the "1.42%" baseline was 9 installs / 1,561 calls.
TimeoutErrors into fast ~3sINTERNAL_ERRORs (TimeoutErrorcollapsed330 → 3). Same code, different (faster, actionable) failure.
The change
A dedicated
EVAL_GAME_NOT_READYcode — the same split #490/#491 made forcompile/runtime errors, and the issue's own Suggested Action #2. It reclassifies
the two debugger-plugin not-ready sites (
_wait_then_evalreadiness-wait expiryand
_send_eval's no-session branch). The condition is specifically "the playsession is up (
editor_handler'sis_playing_scene()gate already passed) butthe game-side capture never registered within the wait" — a boot-window race
(worst on Windows) or a missing/disabled
_mcp_game_helperautoload.This relabels; it does not change eval timing or reduce failures. Opaque
INTERNAL_ERRORdrops ~6.3% → ~2.8% by definition and once again means "the evaltruly hung"; the not-ready volume becomes its own measurable, caller-side bucket.
No timing machinery touched — the
game_command/ screenshot not-ready paths areintentionally left as-is.
What I deliberately did not do
Chase the residual ~2.8% hang floor or widen #500's 3s wait. The floor is the
inherent backgrounded-game idle-freeze limit, and the 3s wait is bounded by the
15s server ceiling — moving it would mean raising the whole timeout ladder for
partial benefit. That's the half-measure to avoid.
Validation
ruffcleanEVAL_GAME_NOT_READYassertion)--importparse-check: 0 errorstest_send_eval_without_active_session_replies_game_not_ready) — validated by CI'sci-godot-testsjob (needs the editor harness; couldn't run locally)Refs #518. After this ships, mark finding F-009 verifying and confirm the
opaque-
INTERNAL_ERRORrate drops in 2.6.x telemetry.🤖 Generated with Claude Code