Skip to content

fix(game_eval): carve the not-ready path out into EVAL_GAME_NOT_READY (#518)#519

Merged
dsarno merged 3 commits into
mainfrom
fix/game-eval-not-ready-518
Jun 3, 2026
Merged

fix(game_eval): carve the not-ready path out into EVAL_GAME_NOT_READY (#518)#519
dsarno merged 3 commits into
mainfrom
fix/game-eval-not-ready-518

Conversation

@dsarno

@dsarno dsarno commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

What this is

#518 reports the opaque
INTERNAL_ERROR on editor_manage(game_eval) "regressing" 1.42% (2.5.10) →
6.15% (2.6.0). I split the bucket by duration_ms (telemetry stores latency,
not the message) and it is not a regression of the #487/#488 opaque hang
it's a thin-baseline cohort artifact plus a #500 reclassification. This PR fixes
the one real, high-confidence problem: the not-ready failure is mis-coded as
INTERNAL_ERROR, so it masquerades as the opaque hang.

The data (INTERNAL_ERROR split by latency)

ver game_eval INTERNAL_ERROR ~3s (#500 not-ready) ~10s hang (#487 kind) TimeoutError
2.5.10 1,561 1.41% 0 21 (1.35%) 46
2.5.13 11,142 3.23% 5 325 (2.92%) 330
2.6.0 4,659 6.29% 151 (3.24%) 131 (2.81%) 3
  • Genuine ~10s hang is flat (2.92% → 2.81%). The within-install rise was one
    pathological install (2db6f78c, 15% hang rate, 42% of all 2.6.0 hangs);
    excluding it, 2.78% → 2.70%.
  • 2.5.10→2.5.13 step is a cohort artifact — zero runtime code changed
    between those tags; the "1.42%" baseline was 9 installs / 1,561 calls.
  • 2.6.0 jump is game_eval: residual ~10–15s timeout failures not caught by the 8s eval bound (#488 follow-up) #500: capping the readiness wait at 3s converted ~15s
    TimeoutErrors into fast ~3s INTERNAL_ERRORs (TimeoutError collapsed
    330 → 3). Same code, different (faster, actionable) failure.

The change

A dedicated EVAL_GAME_NOT_READY code — the same split #490/#491 made for
compile/runtime errors, and the issue's own Suggested Action #2. It reclassifies
the two debugger-plugin not-ready sites (_wait_then_eval readiness-wait expiry
and _send_eval's no-session branch). The condition is specifically "the play
session is up (editor_handler's is_playing_scene() gate already passed) but
the game-side capture never registered within the wait"
— a boot-window race
(worst on Windows) or a missing/disabled _mcp_game_helper autoload.

This relabels; it does not change eval timing or reduce failures. Opaque
INTERNAL_ERROR drops ~6.3% → ~2.8% by definition and once again means "the eval
truly hung"; the not-ready volume becomes its own measurable, caller-side bucket.
No timing machinery touched — the game_command / screenshot not-ready paths are
intentionally left as-is.

What I deliberately did not do

Chase the residual ~2.8% hang floor or widen #500's 3s wait. The floor is the
inherent backgrounded-game idle-freeze limit, and the 3s wait is bounded by the
15s server ceiling — moving it would mean raising the whole timeout ladder for
partial benefit. That's the half-measure to avoid.

Validation

  • ruff clean
  • ✅ Python unit suite: 854 passed, 2 skipped (incl. new EVAL_GAME_NOT_READY assertion)
  • ✅ GDScript --import parse-check: 0 errors
  • ⏳ GDScript behavioral test (test_send_eval_without_active_session_replies_game_not_ready) — validated by CI's ci-godot-tests job (needs the editor harness; couldn't run locally)

Refs #518. After this ships, mark finding F-009 verifying and confirm the
opaque-INTERNAL_ERROR rate drops in 2.6.x telemetry.

🤖 Generated with Claude Code

…#518)

The opaque INTERNAL_ERROR rate on editor_manage(game_eval) looked like it
regressed 1.42% (2.5.10) -> 6.15% (2.6.0). Splitting the bucket by
duration_ms shows it is NOT a regression of the #487/#488 opaque hang:

- The genuine ~10s editor-backstop hang is flat (~2.8%). No runtime code
  changed between 2.5.10 and 2.5.13 (only version strings), and the 2.5.10
  "1.42%" baseline rested on a thin 9-install / 1.5k-call cohort.
- The 2.6.0 jump is #500's not-ready path: capping the readiness wait at 3s
  converts what used to surface as a ~15s server TimeoutError into a fast
  ~3s INTERNAL_ERROR (TimeoutError collapsed 330 -> 3 in the data). Those
  share the INTERNAL_ERROR code, so they masquerade as the opaque hang.

Carve that fast, caller-actionable failure into its own EVAL_GAME_NOT_READY
code -- the same split #490/#491 made for compile/runtime errors, and the
issue's own Suggested Action #2. The condition is specifically "the play
session is up (editor_handler's is_playing_scene gate passed) but the
game-side capture never registered within the wait" -- a boot-window race
(worst on Windows) or a missing/disabled _mcp_game_helper autoload.

This RELABELS; it does not change eval timing or reduce failures. The
INTERNAL_ERROR telemetry bucket once again means "the eval truly hung"
(~2.8%), and the not-ready volume becomes its own measurable, caller-side
bucket. Reclassifies the two debugger-plugin not-ready sites
(_wait_then_eval readiness-wait + _send_eval no-session); the higher-volume
game_command / screenshot paths are intentionally left untouched. No timing
machinery touched.

Refs #518.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- Move the no-session test's skip-guard BEFORE _send_eval. The old guard
  checked conn.captured AFTER the call, so if a session were ever present
  on the bare plugin, _send_eval would already have armed timers and sent
  a real mcp:eval into the running game before the test could bail.
  Gate on _first_active_session() up front instead (precondition, not
  post-hoc) so the test can never have side effects.
- tools/editor.py: note EVAL_GAME_NOT_READY also covers a missing/disabled
  _mcp_game_helper autoload, not just "still launching" -- matches the
  fuller handlers/editor.py wording so an LLM caller doesn't retry forever
  when the real fix is enabling the autoload.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a dedicated EVAL_GAME_NOT_READY error code to reclassify a fast “play session is up, but game-side capture/session isn’t ready” failure mode in game_eval, preventing it from being miscounted as an opaque INTERNAL_ERROR hang in telemetry.

Changes:

  • Add EVAL_GAME_NOT_READY to the shared protocol/plugin error-code sets (Python + GDScript).
  • Reclassify two game_eval not-ready sites in the editor debugger bridge (_wait_then_eval expiry and _send_eval no-session) to return EVAL_GAME_NOT_READY.
  • Extend unit/test coverage (Python + GDScript) to assert the new error code and behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/test_game_eval.py Asserts EVAL_GAME_NOT_READY exists in the Python ErrorCode enum.
test_project/tests/test_game_eval_errors.gd Adds a behavioral test for _send_eval returning EVAL_GAME_NOT_READY when no debugger session exists.
src/godot_ai/tools/editor.py Updates tool help text to document the new error code and its meaning.
src/godot_ai/protocol/errors.py Adds ErrorCode.EVAL_GAME_NOT_READY to the protocol enum.
src/godot_ai/handlers/editor.py Updates game_eval handler docstring to document EVAL_GAME_NOT_READY.
plugin/addons/godot_ai/utils/error_codes.gd Adds EVAL_GAME_NOT_READY to the shared GDScript error-code constants.
plugin/addons/godot_ai/debugger/mcp_debugger_plugin.gd Returns EVAL_GAME_NOT_READY instead of INTERNAL_ERROR for the two confirmed not-ready branches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +436 to 442
## #518: EVAL_GAME_NOT_READY (not INTERNAL_ERROR) — the play session is up
## but the game-side capture didn't register within the short wait. Fast
## and caller-actionable; classifying it apart from the opaque 10s hang
## keeps the INTERNAL_ERROR telemetry bucket meaning "the eval truly hung".
_send_error(connection, request_id, ErrorCodes.EVAL_GAME_NOT_READY,
"Game-side autoload never registered its debugger capture within %ds. Is the game actually running? Start it with project_run / the editor's Play button, then retry. If it IS running, check Project Settings → Autoload for _mcp_game_helper (added automatically when the plugin is enabled)." % int(EVAL_READY_WAIT_SEC))
return
Comment on lines 455 to 459
## #518: same not-ready condition as _wait_then_eval (capture reported
## ready but no live debugger session), so the same caller-actionable code.
_send_error(connection, request_id, ErrorCodes.EVAL_GAME_NOT_READY,
"No active debugger session — is the game actually running?")
return
Both not-ready paths are reached only after editor_handler already gated
game_eval on is_playing_scene(), so "is the game running? start it with the
Play button" was misleading — the session is up. Reword each to match what
it actually means:

- _wait_then_eval (capture never registered within the wait): the play
  session is up, so the game is most likely still booting → wait and retry;
  if it persists, the _mcp_game_helper autoload is missing/disabled or the
  game uses a custom main loop.
- _send_eval (capture was ready but no live session): the game just stopped
  or is restarting → confirm it's running and retry. No autoload hint here —
  capture had already registered, so the autoload is present.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dsarno dsarno merged commit d9bf2c5 into main Jun 3, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants