Skip to content

Commit 1109caf

Browse files
committed
Redesign per Copilot review: sticky bypass after watchdog + tests in tree
Two issues from Copilot's review on PR #381 + the second CI failure: (A) Cross-scan signal race [Copilot]: scan #1 watchdog'd, scan #2 then arms a fresh `filesystem_changed` listener. A delayed emission from scan #1 fires on whichever listener is currently connected to the shared signal — Godot can't tag emissions with their source scan — so scan #2's listener falsely settles before its actual filesystem scan completed, re-enabling the plugin against a potentially incomplete on-disk install. Generation-counter + Callable.bind doesn't help because a single emission still fires every connected listener (CONNECT_ONE_SHOT only auto-disconnects after firing). (B) Second CI failure: `Timer.start()` requires the Timer to be inside a SceneTree. The previous fix removed `add_child(runner)` from tests but didn't parent the runner anywhere, so when `_arm_scan_watchdog` called `add_child(timer)` and `timer.start()` inside the runner, the timer wasn't in any tree. Fix: - Add a sticky `_scan_timed_out` flag set by `_on_scan_watchdog_timeout`. - `_start_filesystem_scan` checks the flag at the top and bypasses the connect + `fs.scan()` path entirely, falling straight through to `call_deferred(deferred_step)`. With no listener armed, a delayed emission from the timed-out scan has nothing to satisfy. Godot's normal background scan catches up after the plugin re-enables. - Tests now parent the runner via `(Engine.get_main_loop() as SceneTree).root.add_child(runner)` so `Timer.start()` works. Cleanup via a `_free_runner` helper (remove_child + free). - New regression test `test_subsequent_scan_after_watchdog_bypasses_ listener_arm` explicitly drives scan #1 → watchdog → scan #2 and asserts scan #2 does NOT set `_waiting_for_scan = true` (i.e. no listener armed for a delayed scan-#1 emission to satisfy). - Existing watchdog test `test_watchdog_no_op_when_signal_already_settled` also asserts `_scan_timed_out` stays false on the late-Timer-after- successful-signal path — guards against the watchdog poisoning subsequent scans when no actual deadlock happened.
1 parent c58590f commit 1109caf

2 files changed

Lines changed: 112 additions & 18 deletions

File tree

plugin/addons/godot_ai/update_reload_runner.gd

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,13 @@ var _scan_next_step := ""
4545
## defensive pattern for state that survives `fs.scan()` during update.
4646
const SCAN_WATCHDOG_SECS := 30.0
4747
var _scan_watchdog_timer = null
48+
## Sticky flag set by `_on_scan_watchdog_timeout`. Subsequent
49+
## `_start_filesystem_scan` calls in the same update bypass connect+scan
50+
## so a delayed `filesystem_changed` emission from the timed-out scan
51+
## can't fire on a freshly-armed listener for the next scan and falsely
52+
## settle it before that scan actually completed. See PR #381 review for
53+
## the cross-scan race this guards against.
54+
var _scan_timed_out := false
4855
## Keep Array fields untyped: this runner survives fs.scan() during update,
4956
## and typed Variant storage is part of the hot-reload crash class.
5057
var _new_file_paths = []
@@ -131,6 +138,21 @@ func _start_filesystem_scan(next_step: String = "_enable_new_plugin") -> void:
131138
call_deferred(deferred_step)
132139
return
133140

141+
## Bypass: a previous scan in this update already watchdog'd, so the
142+
## editor's filesystem is unresponsive. Re-arming a `filesystem_changed`
143+
## listener now would race with a delayed emission from the timed-out
144+
## scan: that single emission would fire whichever listener is currently
145+
## connected to the shared signal, falsely settling this scan before it
146+
## actually completed. Skip the wait; Godot's normal background scan
147+
## catches up after the plugin re-enables. See PR #381 review.
148+
if _scan_timed_out:
149+
push_warning(
150+
"MCP | skipping filesystem_changed wait after previous timeout (next_step=%s)"
151+
% deferred_step
152+
)
153+
call_deferred(deferred_step)
154+
return
155+
134156
_waiting_for_scan = true
135157
_scan_next_step = deferred_step
136158
if not fs.filesystem_changed.is_connected(_on_filesystem_changed):
@@ -156,15 +178,22 @@ func _stop_scan_watchdog() -> void:
156178
func _on_scan_watchdog_timeout() -> void:
157179
## Signal didn't fire within SCAN_WATCHDOG_SECS — most likely the
158180
## filesystem scan is blocked behind a slow disk / NFS / AV scanner
159-
## still reading the just-extracted addon files. Drop the listener so
160-
## a delayed signal can't double-call `_finish_scan_wait`, then proceed.
161-
## `_finish_scan_wait` is idempotent on `_waiting_for_scan == false`.
181+
## still reading the just-extracted addon files.
182+
## Set the sticky `_scan_timed_out` flag so any subsequent
183+
## `_start_filesystem_scan` in this update skips its connect+scan
184+
## (otherwise a delayed emission from this scan would falsely settle
185+
## the next scan's listener — see PR #381 review).
186+
## Disconnect the current listener too, so this scan's listener can't
187+
## double-call `_finish_scan_wait` if the signal arrives quickly after
188+
## the timeout fires. `_finish_scan_wait` is idempotent on
189+
## `_waiting_for_scan == false`.
162190
if not _waiting_for_scan:
163191
return
164192
push_warning(
165193
"MCP | filesystem_changed didn't fire within %ds; proceeding without scan confirmation"
166194
% int(SCAN_WATCHDOG_SECS)
167195
)
196+
_scan_timed_out = true
168197
var fs := EditorInterface.get_resource_filesystem()
169198
if fs != null and fs.filesystem_changed.is_connected(_on_filesystem_changed):
170199
fs.filesystem_changed.disconnect(_on_filesystem_changed)

test_project/tests/test_update_reload_runner.gd

Lines changed: 80 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -483,6 +483,29 @@ func test_install_zip_paths_rolls_back_when_mid_loop_write_fails() -> void:
483483
# ----- _arm_scan_watchdog / _on_scan_watchdog_timeout (audit-v2 #9) -----
484484

485485

486+
func _scene_root() -> Node:
487+
# `Timer.start()` requires the timer (and thus its parent) to be
488+
# inside a SceneTree. `McpTestSuite` extends `RefCounted` so it can't
489+
# act as a parent itself — parent the runner under the editor's
490+
# SceneTree root for the duration of the test, then remove + free.
491+
var tree := Engine.get_main_loop() as SceneTree
492+
return tree.root if tree != null else null
493+
494+
495+
func _new_runner_in_tree():
496+
var runner = _new_runner()
497+
var root := _scene_root()
498+
assert_true(root != null, "test setup: SceneTree root must exist")
499+
root.add_child(runner)
500+
return runner
501+
502+
503+
func _free_runner(runner) -> void:
504+
if runner.get_parent() != null:
505+
runner.get_parent().remove_child(runner)
506+
runner.free()
507+
508+
486509
func _arm_scan_state(runner) -> void:
487510
# Mirror what `_start_filesystem_scan` would do, minus the actual
488511
# `EditorInterface.get_resource_filesystem().scan()` call. We can't
@@ -498,11 +521,7 @@ func test_watchdog_timeout_proceeds_when_signal_never_fires() -> void:
498521
## `_waiting_for_scan = true` forever. The watchdog must clear that
499522
## flag and dispatch `_scan_next_step` so the rest of the update
500523
## sequence can finish.
501-
## Test fires `_on_scan_watchdog_timeout` directly — the runner does
502-
## not need to be in a SceneTree because we don't depend on the Timer
503-
## actually counting down. (`McpTestSuite` extends `RefCounted`, so
504-
## `add_child(runner)` is not available here.)
505-
var runner = _new_runner()
524+
var runner = _new_runner_in_tree()
506525
_arm_scan_state(runner)
507526
assert_true(runner._waiting_for_scan, "scan wait armed by precondition")
508527
assert_true(runner._scan_watchdog_timer != null, "watchdog timer node exists once armed")
@@ -511,33 +530,39 @@ func test_watchdog_timeout_proceeds_when_signal_never_fires() -> void:
511530

512531
assert_false(runner._waiting_for_scan, "watchdog cleared the wait flag")
513532
assert_eq(runner._scan_next_step, "", "next-step token consumed by _finish_scan_wait")
514-
runner.free()
533+
assert_true(runner._scan_timed_out, "watchdog must set the sticky bypass flag")
534+
_free_runner(runner)
515535

516536

517537
func test_watchdog_no_op_when_signal_already_settled() -> void:
518538
## Race: signal fires, `_finish_scan_wait` cleans up, then the watchdog
519539
## Timer fires anyway (Godot's Timer can have queued timeout). Calling
520540
## `_on_scan_watchdog_timeout` after a settled scan must be a no-op —
521-
## otherwise it would double-dispatch `_scan_next_step`.
522-
var runner = _new_runner()
541+
## otherwise it would double-dispatch `_scan_next_step` AND it must
542+
## NOT poison subsequent scans by setting `_scan_timed_out = true`.
543+
var runner = _new_runner_in_tree()
523544
_arm_scan_state(runner)
524545

525546
# Simulate the happy path: signal arrived, _finish_scan_wait ran.
526547
runner._finish_scan_wait()
527548
assert_false(runner._waiting_for_scan, "wait already cleared by signal handler")
528549

529-
# Now the late watchdog timeout fires. It must not crash, must not
530-
# re-dispatch, must not flip _waiting_for_scan back to true.
550+
# Now the late watchdog timeout fires. Must not flip _waiting_for_scan
551+
# back to true, must not set _scan_timed_out (the scan succeeded).
531552
runner._on_scan_watchdog_timeout()
532553
assert_false(runner._waiting_for_scan, "watchdog stays no-op after settled wait")
533-
runner.free()
554+
assert_false(
555+
runner._scan_timed_out,
556+
"watchdog no-op must not poison _scan_timed_out — the scan actually succeeded",
557+
)
558+
_free_runner(runner)
534559

535560

536561
func test_finish_scan_wait_stops_armed_watchdog() -> void:
537562
## Happy path: filesystem_changed signal arrives; `_finish_scan_wait`
538563
## must stop the still-running Timer so it doesn't fire later and
539564
## attempt a second cleanup. Verify by inspecting the Timer state.
540-
var runner = _new_runner()
565+
var runner = _new_runner_in_tree()
541566
_arm_scan_state(runner)
542567
assert_false(runner._scan_watchdog_timer.is_stopped(), "timer running after arm")
543568

@@ -547,15 +572,15 @@ func test_finish_scan_wait_stops_armed_watchdog() -> void:
547572
runner._scan_watchdog_timer.is_stopped(),
548573
"finish_scan_wait must stop the watchdog so it can't fire later",
549574
)
550-
runner.free()
575+
_free_runner(runner)
551576

552577

553578
func test_watchdog_timer_reused_across_arms() -> void:
554579
## `_arm_scan_watchdog` lazy-creates the Timer on first use and reuses
555580
## it on subsequent arms. The runner makes two filesystem scans during
556581
## a single update (new files, then existing files), so the second arm
557582
## must not leak a second Timer child.
558-
var runner = _new_runner()
583+
var runner = _new_runner_in_tree()
559584
_arm_scan_state(runner)
560585
var first_timer = runner._scan_watchdog_timer
561586
runner._finish_scan_wait()
@@ -565,4 +590,44 @@ func test_watchdog_timer_reused_across_arms() -> void:
565590

566591
assert_true(first_timer == second_timer, "timer reused, not recreated")
567592
runner._finish_scan_wait()
568-
runner.free()
593+
_free_runner(runner)
594+
595+
596+
func test_subsequent_scan_after_watchdog_bypasses_listener_arm() -> void:
597+
## Cross-scan race regression (PR #381 review): if scan #1 watchdog'd,
598+
## scan #2 must NOT arm a fresh `filesystem_changed` listener.
599+
##
600+
## Why: a delayed `filesystem_changed` emission from scan #1 fires on
601+
## any listener currently connected to the shared signal — Godot can't
602+
## tag emissions with their source scan. So if scan #2 has armed a
603+
## fresh listener by the time scan #1's emission finally arrives, that
604+
## listener fires and falsely settles scan #2 before its actual
605+
## filesystem scan completed — re-enabling the plugin against a
606+
## potentially-incomplete on-disk install.
607+
##
608+
## Fix: the watchdog sets the sticky `_scan_timed_out` flag, and
609+
## `_start_filesystem_scan` checks it at the top and bypasses the
610+
## connect+scan path entirely. The pending plugin re-enable still
611+
## happens via `call_deferred(next_step)` — Godot's normal background
612+
## scan catches up after the plugin re-enables.
613+
var runner = _new_runner_in_tree()
614+
615+
# Scan #1: arm + watchdog timeout.
616+
_arm_scan_state(runner)
617+
runner._on_scan_watchdog_timeout()
618+
assert_true(runner._scan_timed_out, "watchdog set the sticky flag")
619+
assert_false(runner._waiting_for_scan, "watchdog cleared the wait")
620+
621+
# Scan #2 attempts to start. Pre-fix, this re-armed the listener. Now,
622+
# the bypass path must short-circuit before _waiting_for_scan flips to
623+
# true. We pass a benign deferred step (`set_process`) so the
624+
# `call_deferred` in the bypass branch doesn't re-enable the plugin
625+
# in the test environment.
626+
runner._start_filesystem_scan("set_process")
627+
assert_false(
628+
runner._waiting_for_scan,
629+
"post-watchdog _start_filesystem_scan must NOT arm a new listener — "
630+
+ "no listener means no false-settle from a delayed scan-#1 emission",
631+
)
632+
633+
_free_runner(runner)

0 commit comments

Comments
 (0)