Skip to content

Implement #399 Step 1: deny-by-default self-update parse-hazard lint#411

Merged
dsarno merged 3 commits into
betafrom
dlight/399-step1-deny-by-default
May 7, 2026
Merged

Implement #399 Step 1: deny-by-default self-update parse-hazard lint#411
dsarno merged 3 commits into
betafrom
dlight/399-step1-deny-by-default

Conversation

@dsarno

@dsarno dsarno commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements Step 1 of #399: invert tests/unit/test_plugin_self_update_safety.py from a 13-file targeted allowlist to a deny-by-default ratcheting lint.

  • Scans every .gd under plugin/addons/godot_ai/, minus a hardcoded 5-file OFF_LOAD_SURFACE set (3 game-side runtime helpers, 2 test-only utilities -- documented in-place with the rationale that each is provably outside the editor-process load graph).
  • Detects all four syntactic forms of the parse-hazard: typed-var, typed-param, Mcp*.new(...), and Mcp*.<identifier> member access.
  • Single test asserts total <= BASELINE_VIOLATION_COUNT. On regression, prints a paste-over-ready offender list. On improvement, demands the developer lower the baseline -- this is the ratchet, locking fixes in.

Recorded baseline

BASELINE_VIOLATION_COUNT = 1258, recorded against origin/beta at PR open. End state is 0, driven down by Steps 2-3 of #399.

This is higher than the ~600 figure cited in #399 because the original audit script's STATIC_ACCESS_RE only matched Mcp*.[A-Z_]+ (uppercase constants). A Mcp*.method() invocation hits the exact same parse-time class_name registry lookup as Mcp*.CONST, so the test extends the regex to all identifiers after Mcp*.. The recorded baseline reflects the true hazard surface.

Judgment calls

  • Deleted the existing targeted-allowlist tests (test_targeted_load_scripts_have_no_typed_fields_against_plugin_class_names, test_plugin_gd_does_not_construct_via_class_name, test_targeted_load_scripts_do_not_construct_via_class_name, test_targeted_load_scripts_do_not_access_members_via_class_name). Each was a strict subset of the new deny-by-default scan over a hand-maintained 13-file list. With the closure-wide scan in place, those files are still covered -- just as part of the full surface, not a parallel zero-tolerance bucket. Keeping both would have created two failure messages for the same offender, with the targeted version going stale as the allowlist diverged from reality.
  • Kept the two non-scan tests (test_update_backup_suffix_stays_in_sync and test_plugin_gd_documents_the_untyped_policy). Neither overlaps with the parse-hazard scan; they enforce the literal-suffix anti-drift guard from Self-update 2.3.2 -> 2.4.0 prints ~30 transient Parse Errors from handlers, dock panels, and update_mixed_state #398 and the in-source policy comment that prevents future contributors from "fixing" the apparent oversight.
  • Extended the member-access regex to all identifiers (not just uppercase). This raised the baseline from the audit's reported ~600 to 1258, but the under-counted method calls were a real coverage gap -- shipping the deny-by-default lint with the same blind spot would have left a class of regressions invisible to the ratchet.

Closure of #410

script/audit-self-update-load-surface is deleted in this PR -- its logic now lives inside the test. PR #410 should be closed as absorbed.

After this PR merges, run:

gh pr comment 410 --repo hi-godot/godot-ai --body "Absorbed into #<this-PR-number>; the audit logic now lives in tests/unit/test_plugin_self_update_safety.py with extended coverage."
gh pr close 410 --repo hi-godot/godot-ai

(I cannot close #410 from inside this PR's body / from this branch; leaving as a manual follow-up.)

How to verify locally

  1. Clean checkout of this branch, then pytest tests/unit/test_plugin_self_update_safety.py -v -- 3 tests pass with baseline 1258.
  2. Add a synthetic var x: McpErrorCodes to any handler not already at the limit (e.g. signal_handler.gd); rerun the test. Should fail with regressed: 1260 sites (baseline 1258) and the new line listed.
  3. Lower BASELINE_VIOLATION_COUNT by 1 in the test file. Should fail with the "lower the baseline" message asking you to set it to 1258.
  4. Revert both, confirm 3 tests pass.

Test plan

  • pytest tests/unit/test_plugin_self_update_safety.py -v passes with baseline 1258
  • Full pytest -q passes (915 tests, no regressions)
  • ruff check tests/ clean
  • ruff format tests/ clean
  • Synthetic-violation injection raises the count and fails with offender listing
  • Lowered-baseline experiment fails with the ratchet-down message

Out of scope (and intentionally not done): fixing any of the 1258 violations. Step 1 is observation-only; Steps 2-3 of #399 do the fixing.

Refs: #242, #244, #398, #399. Absorbs #410.

Inverts test_plugin_self_update_safety.py from a 13-file targeted
allowlist to a deny-by-default scan of the entire addon tree (minus 5
files genuinely off the load surface: 3 game-side runtime helpers and
2 test-only utilities).

Absorbs the logic from script/audit-self-update-load-surface (PR #410)
into the test, with one extension: member-access detection now matches
any identifier after Mcp*., not just uppercase constants. The audit
regex matched only [A-Z_]+ and under-counted method calls -- a
Mcp*.method() invocation hits the same parse-time class_name registry
lookup as Mcp*.CONST. That fix raises the recorded baseline from ~600
sites to 1258.

The single test asserts total violations <= BASELINE_VIOLATION_COUNT
(1258 at PR open). On regression, prints a paste-over-ready offender
list. On improvement, demands the developer lower the baseline -- the
ratchet that locks fixes in. End state is 0, driven down by Steps 2-3
of #399.

The previous targeted-allowlist tests (typed_fields / constructors /
member_access on 13 files) are removed: the deny-by-default scan
strictly subsumes them.

script/audit-self-update-load-surface is deleted -- the test is now the
single source of truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 7, 2026 21:11
@codecov

codecov Bot commented May 7, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Step 1 of issue #399 by converting tests/unit/test_plugin_self_update_safety.py from a targeted allowlist into a deny-by-default, baseline-ratcheted lint that scans (nearly) the entire Godot addon tree for self-update parse-hazard patterns involving Mcp* class_name references.

Changes:

  • Replaces the prior targeted-load-surface allowlist tests with a full-tree scan (minus a fixed 5-file OFF_LOAD_SURFACE opt-out set).
  • Expands detection to four parse-hazard syntactic forms (typed fields/params, Mcp*.new(...), and Mcp*.<identifier> member access).
  • Adds a ratcheting baseline (BASELINE_VIOLATION_COUNT) that fails on both regressions and unrecorded improvements.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 134 to 142
the lint.
"""

return re.sub(r"#.*$", "", source, flags=re.MULTILINE)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 616232f — replaced the naive regex stripper with a character-walk tokenizer that handles plain, triple-quoted, and prefixed (r, &, ^) GDScript string forms. Verified against the 5 .gd files in the tree that contain # inside string literals. Newlines preserved so offender line numbers stay accurate.

Comment on lines +107 to +116
_CLASS_NAME_RE = re.compile(r"^class_name\s+(Mcp\w+)\s*$", re.MULTILINE)
# Top-level typed-var fields. Anchored to start-of-line; locals inside
# functions are resolved lazily and don't participate in the parse-time
# hazard.
_TYPED_FIELD_RE = re.compile(r"^\s*(?:static\s+)?var\s+\w+\s*:\s*(Mcp\w+)\b", re.MULTILINE)
# Any `name: McpFoo` typed annotation (parameters, locals with type, etc.).
# Note this overlaps with TYPED_FIELD_RE for top-level vars -- both
# patterns count, mirroring the audit script's accounting so the baseline
# is a stable reproducible number.
_TYPED_PARAM_RE = re.compile(r"\b\w+\s*:\s*(Mcp\w+)\b")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 616232f — tightened _TYPED_FIELD_RE from ^\s*(?:static\s+)?var to ^(?:static\s+)?var so it only matches truly top-level field declarations. Indented vars inside function bodies are still counted (via _TYPED_ANNOTATION_RE) — they're just no longer triple-counted by both regexes. Baseline drops 1258 → 1254 to reflect the more accurate count.

dsarno and others added 2 commits May 7, 2026 14:19
Aggregated /simplify findings on PR #411:

1. Merge three file-walk passes (_registered_mcp_class_names,
   _load_surface_files, _scan_file) into a single _scan_load_surface
   that reads each .gd once. Drops 2x redundant disk I/O on every CI
   run.

2. Table-drive the four regex passes via _PASSES + _format_ident,
   collapsing four near-duplicate finditer blocks into one loop.
   Removes the `if member == "new"` outlier branch by routing it
   through the shared formatter.

3. Add Kind = Literal[...] for the four-valued offender kind. Documents
   the contract for free; no runtime cost.

4. Rename _TYPED_PARAM_RE -> _TYPED_ANNOTATION_RE and clarify the
   docstring around its intentional overlap with _TYPED_FIELD_RE so the
   double-counting reads as design rather than a bug. Kind value
   updated to typed_annotation to match.

5. Trim the module docstring's "## Layered enforcement" section from a
   PR-transition narrative to a 4-line statement of current behavior.
   The history belongs in #399, not in this file in perpetuity.

Verified post-refactor:
- BASELINE_VIOLATION_COUNT == count (1258, unchanged).
- Both ratchet directions still fire (synthetic +1 and -1 baselines
  fail with the right error messages).
- ruff check + ruff format clean.

No behavior change beyond the rename of one offender-kind string in
failure messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. Comment-stripper now respects string literals.
   Naive `re.sub(r"#.*$", ...)` was truncating valid lines like
   `remainder.begins_with("#")` (real example in
   clients/_toml_strategy.gd; 5 .gd files in the addon tree contain
   `#` inside string literals). On any future PR where an offender
   landed later on such a line, the count would silently shift.
   Replaced with a small character-walk tokenizer that handles plain,
   triple-quoted, and prefixed (raw r, StringName &, NodePath ^)
   GDScript string forms, matching the existing tokenizer in
   test_gdscript_no_adjacent_string_concat.py. Newlines preserved so
   line numbers stay accurate.

2. Tightened _TYPED_FIELD_RE from `^\s*(?:static\s+)?var` to
   `^(?:static\s+)?var`. The old `\s*` was matching any indentation,
   so indented `var` decls inside function bodies were counted both
   here and in _TYPED_ANNOTATION_RE -- triple-counting against
   _TYPED_ANNOTATION_RE's coverage. Tightening makes _TYPED_FIELD_RE
   the dedicated counter for top-level fields (the worst case for
   parse-time hazard, since they execute at script-load), while
   _TYPED_ANNOTATION_RE continues to sweep up all annotations
   including indented locals.

Net effect: baseline drops 1258 -> 1254. The four removed sites are
indented `var x: McpFoo` declarations inside function bodies that were
being over-counted by the loose field regex. The new count is more
accurate; the ratchet semantics are unchanged.

Verified:
- pytest passes with new baseline.
- Both ratchet directions still fire (synthetic +/- 1).
- ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dsarno dsarno merged commit ae08083 into beta May 7, 2026
11 checks passed
@dsarno dsarno deleted the dlight/399-step1-deny-by-default branch May 7, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants