fix(game_eval): carve the not-ready path out into EVAL_GAME_NOT_READY (#518) by dsarno · Pull Request #519 · hi-godot/godot-ai

dsarno · 2026-06-03T18:41:28Z

What this is

#518 reports the opaque
INTERNAL_ERROR on editor_manage(game_eval) "regressing" 1.42% (2.5.10) →
6.15% (2.6.0). I split the bucket by duration_ms (telemetry stores latency,
not the message) and it is not a regression of the #487/#488 opaque hang —
it's a thin-baseline cohort artifact plus a #500 reclassification. This PR fixes
the one real, high-confidence problem: the not-ready failure is mis-coded as
INTERNAL_ERROR, so it masquerades as the opaque hang.

The data (INTERNAL_ERROR split by latency)

ver	game_eval	INTERNAL_ERROR	~3s (#500 not-ready)	~10s hang (#487 kind)	TimeoutError
2.5.10	1,561	1.41%	0	21 (1.35%)	46
2.5.13	11,142	3.23%	5	325 (2.92%)	330
2.6.0	4,659	6.29%	151 (3.24%)	131 (2.81%)	3

Genuine ~10s hang is flat (2.92% → 2.81%). The within-install rise was one
pathological install (2db6f78c, 15% hang rate, 42% of all 2.6.0 hangs);
excluding it, 2.78% → 2.70%.
2.5.10→2.5.13 step is a cohort artifact — zero runtime code changed
between those tags; the "1.42%" baseline was 9 installs / 1,561 calls.
2.6.0 jump is game_eval: residual ~10–15s timeout failures not caught by the 8s eval bound (#488 follow-up) #500: capping the readiness wait at 3s converted ~15s
TimeoutErrors into fast ~3s INTERNAL_ERRORs (TimeoutError collapsed
330 → 3). Same code, different (faster, actionable) failure.

The change

A dedicated EVAL_GAME_NOT_READY code — the same split #490/#491 made for
compile/runtime errors, and the issue's own Suggested Action #2. It reclassifies
the two debugger-plugin not-ready sites (_wait_then_eval readiness-wait expiry
and _send_eval's no-session branch). The condition is specifically "the play
session is up (editor_handler's is_playing_scene() gate already passed) but
the game-side capture never registered within the wait" — a boot-window race
(worst on Windows) or a missing/disabled _mcp_game_helper autoload.

This relabels; it does not change eval timing or reduce failures. Opaque
INTERNAL_ERROR drops ~6.3% → ~2.8% by definition and once again means "the eval
truly hung"; the not-ready volume becomes its own measurable, caller-side bucket.
No timing machinery touched — the game_command / screenshot not-ready paths are
intentionally left as-is.

What I deliberately did not do

Chase the residual ~2.8% hang floor or widen #500's 3s wait. The floor is the
inherent backgrounded-game idle-freeze limit, and the 3s wait is bounded by the
15s server ceiling — moving it would mean raising the whole timeout ladder for
partial benefit. That's the half-measure to avoid.

Validation

✅ ruff clean
✅ Python unit suite: 854 passed, 2 skipped (incl. new EVAL_GAME_NOT_READY assertion)
✅ GDScript --import parse-check: 0 errors
⏳ GDScript behavioral test (test_send_eval_without_active_session_replies_game_not_ready) — validated by CI's ci-godot-tests job (needs the editor harness; couldn't run locally)

Refs #518. After this ships, mark finding F-009 verifying and confirm the
opaque-INTERNAL_ERROR rate drops in 2.6.x telemetry.

🤖 Generated with Claude Code

…#518) The opaque INTERNAL_ERROR rate on editor_manage(game_eval) looked like it regressed 1.42% (2.5.10) -> 6.15% (2.6.0). Splitting the bucket by duration_ms shows it is NOT a regression of the #487/#488 opaque hang: - The genuine ~10s editor-backstop hang is flat (~2.8%). No runtime code changed between 2.5.10 and 2.5.13 (only version strings), and the 2.5.10 "1.42%" baseline rested on a thin 9-install / 1.5k-call cohort. - The 2.6.0 jump is #500's not-ready path: capping the readiness wait at 3s converts what used to surface as a ~15s server TimeoutError into a fast ~3s INTERNAL_ERROR (TimeoutError collapsed 330 -> 3 in the data). Those share the INTERNAL_ERROR code, so they masquerade as the opaque hang. Carve that fast, caller-actionable failure into its own EVAL_GAME_NOT_READY code -- the same split #490/#491 made for compile/runtime errors, and the issue's own Suggested Action #2. The condition is specifically "the play session is up (editor_handler's is_playing_scene gate passed) but the game-side capture never registered within the wait" -- a boot-window race (worst on Windows) or a missing/disabled _mcp_game_helper autoload. This RELABELS; it does not change eval timing or reduce failures. The INTERNAL_ERROR telemetry bucket once again means "the eval truly hung" (~2.8%), and the not-ready volume becomes its own measurable, caller-side bucket. Reclassifies the two debugger-plugin not-ready sites (_wait_then_eval readiness-wait + _send_eval no-session); the higher-volume game_command / screenshot paths are intentionally left untouched. No timing machinery touched. Refs #518. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

codecov · 2026-06-03T18:46:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- Move the no-session test's skip-guard BEFORE _send_eval. The old guard checked conn.captured AFTER the call, so if a session were ever present on the bare plugin, _send_eval would already have armed timers and sent a real mcp:eval into the running game before the test could bail. Gate on _first_active_session() up front instead (precondition, not post-hoc) so the test can never have side effects. - tools/editor.py: note EVAL_GAME_NOT_READY also covers a missing/disabled _mcp_game_helper autoload, not just "still launching" -- matches the fuller handlers/editor.py wording so an LLM caller doesn't retry forever when the real fix is enabling the autoload. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

This PR introduces a dedicated EVAL_GAME_NOT_READY error code to reclassify a fast “play session is up, but game-side capture/session isn’t ready” failure mode in game_eval, preventing it from being miscounted as an opaque INTERNAL_ERROR hang in telemetry.

Changes:

Add EVAL_GAME_NOT_READY to the shared protocol/plugin error-code sets (Python + GDScript).
Reclassify two game_eval not-ready sites in the editor debugger bridge (_wait_then_eval expiry and _send_eval no-session) to return EVAL_GAME_NOT_READY.
Extend unit/test coverage (Python + GDScript) to assert the new error code and behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/unit/test_game_eval.py	Asserts `EVAL_GAME_NOT_READY` exists in the Python `ErrorCode` enum.
test_project/tests/test_game_eval_errors.gd	Adds a behavioral test for `_send_eval` returning `EVAL_GAME_NOT_READY` when no debugger session exists.
src/godot_ai/tools/editor.py	Updates tool help text to document the new error code and its meaning.
src/godot_ai/protocol/errors.py	Adds `ErrorCode.EVAL_GAME_NOT_READY` to the protocol enum.
src/godot_ai/handlers/editor.py	Updates `game_eval` handler docstring to document `EVAL_GAME_NOT_READY`.
plugin/addons/godot_ai/utils/error_codes.gd	Adds `EVAL_GAME_NOT_READY` to the shared GDScript error-code constants.
plugin/addons/godot_ai/debugger/mcp_debugger_plugin.gd	Returns `EVAL_GAME_NOT_READY` instead of `INTERNAL_ERROR` for the two confirmed not-ready branches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+		## #518: EVAL_GAME_NOT_READY (not INTERNAL_ERROR) — the play session is up
+		## but the game-side capture didn't register within the short wait. Fast
+		## and caller-actionable; classifying it apart from the opaque 10s hang
+		## keeps the INTERNAL_ERROR telemetry bucket meaning "the eval truly hung".
+		_send_error(connection, request_id, ErrorCodes.EVAL_GAME_NOT_READY,
 			"Game-side autoload never registered its debugger capture within %ds. Is the game actually running? Start it with project_run / the editor's Play button, then retry. If it IS running, check Project Settings → Autoload for _mcp_game_helper (added automatically when the plugin is enabled)." % int(EVAL_READY_WAIT_SEC))
 		return


+		## #518: same not-ready condition as _wait_then_eval (capture reported
+		## ready but no live debugger session), so the same caller-actionable code.
+		_send_error(connection, request_id, ErrorCodes.EVAL_GAME_NOT_READY,
 			"No active debugger session — is the game actually running?")
 		return


Both not-ready paths are reached only after editor_handler already gated game_eval on is_playing_scene(), so "is the game running? start it with the Play button" was misleading — the session is up. Reword each to match what it actually means: - _wait_then_eval (capture never registered within the wait): the play session is up, so the game is most likely still booting → wait and retry; if it persists, the _mcp_game_helper autoload is missing/disabled or the game uses a custom main loop. - _send_eval (capture was ready but no live session): the game just stopped or is restarting → confirm it's running and retry. No autoload hint here — capture had already registered, so the autoload is present. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

dsarno requested a review from Copilot June 3, 2026 18:56

Copilot started reviewing on behalf of dsarno June 3, 2026 18:57 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

dsarno merged commit d9bf2c5 into main Jun 3, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(game_eval): carve the not-ready path out into EVAL_GAME_NOT_READY (#518)#519

fix(game_eval): carve the not-ready path out into EVAL_GAME_NOT_READY (#518)#519
dsarno merged 3 commits into
mainfrom
fix/game-eval-not-ready-518

dsarno commented Jun 3, 2026

Uh oh!

codecov Bot commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dsarno commented Jun 3, 2026

What this is

The data (INTERNAL_ERROR split by latency)

The change

What I deliberately did not do

Validation

Uh oh!

codecov Bot commented Jun 3, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants