Implement #399 Step 1: deny-by-default self-update parse-hazard lint by dsarno · Pull Request #411 · hi-godot/godot-ai

dsarno · 2026-05-07T21:11:33Z

Summary

Implements Step 1 of #399: invert tests/unit/test_plugin_self_update_safety.py from a 13-file targeted allowlist to a deny-by-default ratcheting lint.

Scans every .gd under plugin/addons/godot_ai/, minus a hardcoded 5-file OFF_LOAD_SURFACE set (3 game-side runtime helpers, 2 test-only utilities -- documented in-place with the rationale that each is provably outside the editor-process load graph).
Detects all four syntactic forms of the parse-hazard: typed-var, typed-param, Mcp*.new(...), and Mcp*.<identifier> member access.
Single test asserts total <= BASELINE_VIOLATION_COUNT. On regression, prints a paste-over-ready offender list. On improvement, demands the developer lower the baseline -- this is the ratchet, locking fixes in.

Recorded baseline

BASELINE_VIOLATION_COUNT = 1258, recorded against origin/beta at PR open. End state is 0, driven down by Steps 2-3 of #399.

This is higher than the ~600 figure cited in #399 because the original audit script's STATIC_ACCESS_RE only matched Mcp*.[A-Z_]+ (uppercase constants). A Mcp*.method() invocation hits the exact same parse-time class_name registry lookup as Mcp*.CONST, so the test extends the regex to all identifiers after Mcp*.. The recorded baseline reflects the true hazard surface.

Judgment calls

Deleted the existing targeted-allowlist tests (test_targeted_load_scripts_have_no_typed_fields_against_plugin_class_names, test_plugin_gd_does_not_construct_via_class_name, test_targeted_load_scripts_do_not_construct_via_class_name, test_targeted_load_scripts_do_not_access_members_via_class_name). Each was a strict subset of the new deny-by-default scan over a hand-maintained 13-file list. With the closure-wide scan in place, those files are still covered -- just as part of the full surface, not a parallel zero-tolerance bucket. Keeping both would have created two failure messages for the same offender, with the targeted version going stale as the allowlist diverged from reality.
Kept the two non-scan tests (test_update_backup_suffix_stays_in_sync and test_plugin_gd_documents_the_untyped_policy). Neither overlaps with the parse-hazard scan; they enforce the literal-suffix anti-drift guard from Self-update 2.3.2 -> 2.4.0 prints ~30 transient Parse Errors from handlers, dock panels, and update_mixed_state #398 and the in-source policy comment that prevents future contributors from "fixing" the apparent oversight.
Extended the member-access regex to all identifiers (not just uppercase). This raised the baseline from the audit's reported ~600 to 1258, but the under-counted method calls were a real coverage gap -- shipping the deny-by-default lint with the same blind spot would have left a class of regressions invisible to the ratchet.

Closure of #410

script/audit-self-update-load-surface is deleted in this PR -- its logic now lives inside the test. PR #410 should be closed as absorbed.

After this PR merges, run:

gh pr comment 410 --repo hi-godot/godot-ai --body "Absorbed into #<this-PR-number>; the audit logic now lives in tests/unit/test_plugin_self_update_safety.py with extended coverage."
gh pr close 410 --repo hi-godot/godot-ai

(I cannot close #410 from inside this PR's body / from this branch; leaving as a manual follow-up.)

How to verify locally

Clean checkout of this branch, then pytest tests/unit/test_plugin_self_update_safety.py -v -- 3 tests pass with baseline 1258.
Add a synthetic var x: McpErrorCodes to any handler not already at the limit (e.g. signal_handler.gd); rerun the test. Should fail with regressed: 1260 sites (baseline 1258) and the new line listed.
Lower BASELINE_VIOLATION_COUNT by 1 in the test file. Should fail with the "lower the baseline" message asking you to set it to 1258.
Revert both, confirm 3 tests pass.

Test plan

pytest tests/unit/test_plugin_self_update_safety.py -v passes with baseline 1258
Full pytest -q passes (915 tests, no regressions)
ruff check tests/ clean
ruff format tests/ clean
Synthetic-violation injection raises the count and fails with offender listing
Lowered-baseline experiment fails with the ratchet-down message

Out of scope (and intentionally not done): fixing any of the 1258 violations. Step 1 is observation-only; Steps 2-3 of #399 do the fixing.

Refs: #242, #244, #398, #399. Absorbs #410.

Inverts test_plugin_self_update_safety.py from a 13-file targeted allowlist to a deny-by-default scan of the entire addon tree (minus 5 files genuinely off the load surface: 3 game-side runtime helpers and 2 test-only utilities). Absorbs the logic from script/audit-self-update-load-surface (PR #410) into the test, with one extension: member-access detection now matches any identifier after Mcp*., not just uppercase constants. The audit regex matched only [A-Z_]+ and under-counted method calls -- a Mcp*.method() invocation hits the same parse-time class_name registry lookup as Mcp*.CONST. That fix raises the recorded baseline from ~600 sites to 1258. The single test asserts total violations <= BASELINE_VIOLATION_COUNT (1258 at PR open). On regression, prints a paste-over-ready offender list. On improvement, demands the developer lower the baseline -- the ratchet that locks fixes in. End state is 0, driven down by Steps 2-3 of #399. The previous targeted-allowlist tests (typed_fields / constructors / member_access on 13 files) are removed: the deny-by-default scan strictly subsumes them. script/audit-self-update-load-surface is deleted -- the test is now the single source of truth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-07T21:14:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR implements Step 1 of issue #399 by converting tests/unit/test_plugin_self_update_safety.py from a targeted allowlist into a deny-by-default, baseline-ratcheted lint that scans (nearly) the entire Godot addon tree for self-update parse-hazard patterns involving Mcp* class_name references.

Changes:

Replaces the prior targeted-load-surface allowlist tests with a full-tree scan (minus a fixed 5-file OFF_LOAD_SURFACE opt-out set).
Expands detection to four parse-hazard syntactic forms (typed fields/params, Mcp*.new(...), and Mcp*.<identifier> member access).
Adds a ratcheting baseline (BASELINE_VIOLATION_COUNT) that fails on both regressions and unrecorded improvements.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dsarno · 2026-05-07T21:23:03Z

+    the lint.
    """
-
    return re.sub(r"#.*$", "", source, flags=re.MULTILINE)


Fixed in 616232f — replaced the naive regex stripper with a character-walk tokenizer that handles plain, triple-quoted, and prefixed (r, &, ^) GDScript string forms. Verified against the 5 .gd files in the tree that contain # inside string literals. Newlines preserved so offender line numbers stay accurate.

dsarno · 2026-05-07T21:23:05Z

+_CLASS_NAME_RE = re.compile(r"^class_name\s+(Mcp\w+)\s*$", re.MULTILINE)
+# Top-level typed-var fields. Anchored to start-of-line; locals inside
+# functions are resolved lazily and don't participate in the parse-time
+# hazard.
+_TYPED_FIELD_RE = re.compile(r"^\s*(?:static\s+)?var\s+\w+\s*:\s*(Mcp\w+)\b", re.MULTILINE)
+# Any `name: McpFoo` typed annotation (parameters, locals with type, etc.).
+# Note this overlaps with TYPED_FIELD_RE for top-level vars -- both
+# patterns count, mirroring the audit script's accounting so the baseline
+# is a stable reproducible number.
+_TYPED_PARAM_RE = re.compile(r"\b\w+\s*:\s*(Mcp\w+)\b")


Fixed in 616232f — tightened _TYPED_FIELD_RE from ^\s*(?:static\s+)?var to ^(?:static\s+)?var so it only matches truly top-level field declarations. Indented vars inside function bodies are still counted (via _TYPED_ANNOTATION_RE) — they're just no longer triple-counted by both regexes. Baseline drops 1258 → 1254 to reflect the more accurate count.

Aggregated /simplify findings on PR #411: 1. Merge three file-walk passes (_registered_mcp_class_names, _load_surface_files, _scan_file) into a single _scan_load_surface that reads each .gd once. Drops 2x redundant disk I/O on every CI run. 2. Table-drive the four regex passes via _PASSES + _format_ident, collapsing four near-duplicate finditer blocks into one loop. Removes the `if member == "new"` outlier branch by routing it through the shared formatter. 3. Add Kind = Literal[...] for the four-valued offender kind. Documents the contract for free; no runtime cost. 4. Rename _TYPED_PARAM_RE -> _TYPED_ANNOTATION_RE and clarify the docstring around its intentional overlap with _TYPED_FIELD_RE so the double-counting reads as design rather than a bug. Kind value updated to typed_annotation to match. 5. Trim the module docstring's "## Layered enforcement" section from a PR-transition narrative to a 4-line statement of current behavior. The history belongs in #399, not in this file in perpetuity. Verified post-refactor: - BASELINE_VIOLATION_COUNT == count (1258, unchanged). - Both ratchet directions still fire (synthetic +1 and -1 baselines fail with the right error messages). - ruff check + ruff format clean. No behavior change beyond the rename of one offender-kind string in failure messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

1. Comment-stripper now respects string literals. Naive `re.sub(r"#.*$", ...)` was truncating valid lines like `remainder.begins_with("#")` (real example in clients/_toml_strategy.gd; 5 .gd files in the addon tree contain `#` inside string literals). On any future PR where an offender landed later on such a line, the count would silently shift. Replaced with a small character-walk tokenizer that handles plain, triple-quoted, and prefixed (raw r, StringName &, NodePath ^) GDScript string forms, matching the existing tokenizer in test_gdscript_no_adjacent_string_concat.py. Newlines preserved so line numbers stay accurate. 2. Tightened _TYPED_FIELD_RE from `^\s*(?:static\s+)?var` to `^(?:static\s+)?var`. The old `\s*` was matching any indentation, so indented `var` decls inside function bodies were counted both here and in _TYPED_ANNOTATION_RE -- triple-counting against _TYPED_ANNOTATION_RE's coverage. Tightening makes _TYPED_FIELD_RE the dedicated counter for top-level fields (the worst case for parse-time hazard, since they execute at script-load), while _TYPED_ANNOTATION_RE continues to sweep up all annotations including indented locals. Net effect: baseline drops 1258 -> 1254. The four removed sites are indented `var x: McpFoo` declarations inside function bodies that were being over-counted by the loose field regex. The new count is more accurate; the ratchet semantics are unchanged. Verified: - pytest passes with new baseline. - Both ratchet directions still fire (synthetic +/- 1). - ruff check + format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 7, 2026 21:11

Copilot started reviewing on behalf of dsarno May 7, 2026 21:12 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

dsarno and others added 2 commits May 7, 2026 14:19

dsarno merged commit ae08083 into beta May 7, 2026
11 checks passed

dsarno deleted the dlight/399-step1-deny-by-default branch May 7, 2026 21:26

This was referenced May 7, 2026

Add audit script for #399 self-update load surface analysis #410

Closed

Self-update parse-hazard lint: drop class_name from heavy internals (McpErrorCodes first) #399

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement #399 Step 1: deny-by-default self-update parse-hazard lint#411

Implement #399 Step 1: deny-by-default self-update parse-hazard lint#411
dsarno merged 3 commits into
betafrom
dlight/399-step1-deny-by-default

dsarno commented May 7, 2026

Uh oh!

codecov Bot commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

dsarno May 7, 2026

Uh oh!

dsarno May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dsarno commented May 7, 2026

Summary

Recorded baseline

Judgment calls

Closure of #410

How to verify locally

Test plan

Uh oh!

codecov Bot commented May 7, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

dsarno May 7, 2026

Choose a reason for hiding this comment

Uh oh!

dsarno May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants