Implement #399 Step 1: deny-by-default self-update parse-hazard lint#411
Conversation
Inverts test_plugin_self_update_safety.py from a 13-file targeted allowlist to a deny-by-default scan of the entire addon tree (minus 5 files genuinely off the load surface: 3 game-side runtime helpers and 2 test-only utilities). Absorbs the logic from script/audit-self-update-load-surface (PR #410) into the test, with one extension: member-access detection now matches any identifier after Mcp*., not just uppercase constants. The audit regex matched only [A-Z_]+ and under-counted method calls -- a Mcp*.method() invocation hits the same parse-time class_name registry lookup as Mcp*.CONST. That fix raises the recorded baseline from ~600 sites to 1258. The single test asserts total violations <= BASELINE_VIOLATION_COUNT (1258 at PR open). On regression, prints a paste-over-ready offender list. On improvement, demands the developer lower the baseline -- the ratchet that locks fixes in. End state is 0, driven down by Steps 2-3 of #399. The previous targeted-allowlist tests (typed_fields / constructors / member_access on 13 files) are removed: the deny-by-default scan strictly subsumes them. script/audit-self-update-load-surface is deleted -- the test is now the single source of truth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR implements Step 1 of issue #399 by converting tests/unit/test_plugin_self_update_safety.py from a targeted allowlist into a deny-by-default, baseline-ratcheted lint that scans (nearly) the entire Godot addon tree for self-update parse-hazard patterns involving Mcp* class_name references.
Changes:
- Replaces the prior targeted-load-surface allowlist tests with a full-tree scan (minus a fixed 5-file
OFF_LOAD_SURFACEopt-out set). - Expands detection to four parse-hazard syntactic forms (typed fields/params,
Mcp*.new(...), andMcp*.<identifier>member access). - Adds a ratcheting baseline (
BASELINE_VIOLATION_COUNT) that fails on both regressions and unrecorded improvements.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| the lint. | ||
| """ | ||
|
|
||
| return re.sub(r"#.*$", "", source, flags=re.MULTILINE) |
There was a problem hiding this comment.
Fixed in 616232f — replaced the naive regex stripper with a character-walk tokenizer that handles plain, triple-quoted, and prefixed (r, &, ^) GDScript string forms. Verified against the 5 .gd files in the tree that contain # inside string literals. Newlines preserved so offender line numbers stay accurate.
| _CLASS_NAME_RE = re.compile(r"^class_name\s+(Mcp\w+)\s*$", re.MULTILINE) | ||
| # Top-level typed-var fields. Anchored to start-of-line; locals inside | ||
| # functions are resolved lazily and don't participate in the parse-time | ||
| # hazard. | ||
| _TYPED_FIELD_RE = re.compile(r"^\s*(?:static\s+)?var\s+\w+\s*:\s*(Mcp\w+)\b", re.MULTILINE) | ||
| # Any `name: McpFoo` typed annotation (parameters, locals with type, etc.). | ||
| # Note this overlaps with TYPED_FIELD_RE for top-level vars -- both | ||
| # patterns count, mirroring the audit script's accounting so the baseline | ||
| # is a stable reproducible number. | ||
| _TYPED_PARAM_RE = re.compile(r"\b\w+\s*:\s*(Mcp\w+)\b") |
There was a problem hiding this comment.
Fixed in 616232f — tightened _TYPED_FIELD_RE from ^\s*(?:static\s+)?var to ^(?:static\s+)?var so it only matches truly top-level field declarations. Indented vars inside function bodies are still counted (via _TYPED_ANNOTATION_RE) — they're just no longer triple-counted by both regexes. Baseline drops 1258 → 1254 to reflect the more accurate count.
Aggregated /simplify findings on PR #411: 1. Merge three file-walk passes (_registered_mcp_class_names, _load_surface_files, _scan_file) into a single _scan_load_surface that reads each .gd once. Drops 2x redundant disk I/O on every CI run. 2. Table-drive the four regex passes via _PASSES + _format_ident, collapsing four near-duplicate finditer blocks into one loop. Removes the `if member == "new"` outlier branch by routing it through the shared formatter. 3. Add Kind = Literal[...] for the four-valued offender kind. Documents the contract for free; no runtime cost. 4. Rename _TYPED_PARAM_RE -> _TYPED_ANNOTATION_RE and clarify the docstring around its intentional overlap with _TYPED_FIELD_RE so the double-counting reads as design rather than a bug. Kind value updated to typed_annotation to match. 5. Trim the module docstring's "## Layered enforcement" section from a PR-transition narrative to a 4-line statement of current behavior. The history belongs in #399, not in this file in perpetuity. Verified post-refactor: - BASELINE_VIOLATION_COUNT == count (1258, unchanged). - Both ratchet directions still fire (synthetic +1 and -1 baselines fail with the right error messages). - ruff check + ruff format clean. No behavior change beyond the rename of one offender-kind string in failure messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. Comment-stripper now respects string literals.
Naive `re.sub(r"#.*$", ...)` was truncating valid lines like
`remainder.begins_with("#")` (real example in
clients/_toml_strategy.gd; 5 .gd files in the addon tree contain
`#` inside string literals). On any future PR where an offender
landed later on such a line, the count would silently shift.
Replaced with a small character-walk tokenizer that handles plain,
triple-quoted, and prefixed (raw r, StringName &, NodePath ^)
GDScript string forms, matching the existing tokenizer in
test_gdscript_no_adjacent_string_concat.py. Newlines preserved so
line numbers stay accurate.
2. Tightened _TYPED_FIELD_RE from `^\s*(?:static\s+)?var` to
`^(?:static\s+)?var`. The old `\s*` was matching any indentation,
so indented `var` decls inside function bodies were counted both
here and in _TYPED_ANNOTATION_RE -- triple-counting against
_TYPED_ANNOTATION_RE's coverage. Tightening makes _TYPED_FIELD_RE
the dedicated counter for top-level fields (the worst case for
parse-time hazard, since they execute at script-load), while
_TYPED_ANNOTATION_RE continues to sweep up all annotations
including indented locals.
Net effect: baseline drops 1258 -> 1254. The four removed sites are
indented `var x: McpFoo` declarations inside function bodies that were
being over-counted by the loose field regex. The new count is more
accurate; the ratchet semantics are unchanged.
Verified:
- pytest passes with new baseline.
- Both ratchet directions still fire (synthetic +/- 1).
- ruff check + format clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Implements Step 1 of #399: invert
tests/unit/test_plugin_self_update_safety.pyfrom a 13-file targeted allowlist to a deny-by-default ratcheting lint..gdunderplugin/addons/godot_ai/, minus a hardcoded 5-fileOFF_LOAD_SURFACEset (3 game-side runtime helpers, 2 test-only utilities -- documented in-place with the rationale that each is provably outside the editor-process load graph).Mcp*.new(...), andMcp*.<identifier>member access.total <= BASELINE_VIOLATION_COUNT. On regression, prints a paste-over-ready offender list. On improvement, demands the developer lower the baseline -- this is the ratchet, locking fixes in.Recorded baseline
BASELINE_VIOLATION_COUNT = 1258, recorded againstorigin/betaat PR open. End state is 0, driven down by Steps 2-3 of #399.This is higher than the ~600 figure cited in #399 because the original audit script's
STATIC_ACCESS_REonly matchedMcp*.[A-Z_]+(uppercase constants). AMcp*.method()invocation hits the exact same parse-time class_name registry lookup asMcp*.CONST, so the test extends the regex to all identifiers afterMcp*.. The recorded baseline reflects the true hazard surface.Judgment calls
test_targeted_load_scripts_have_no_typed_fields_against_plugin_class_names,test_plugin_gd_does_not_construct_via_class_name,test_targeted_load_scripts_do_not_construct_via_class_name,test_targeted_load_scripts_do_not_access_members_via_class_name). Each was a strict subset of the new deny-by-default scan over a hand-maintained 13-file list. With the closure-wide scan in place, those files are still covered -- just as part of the full surface, not a parallel zero-tolerance bucket. Keeping both would have created two failure messages for the same offender, with the targeted version going stale as the allowlist diverged from reality.test_update_backup_suffix_stays_in_syncandtest_plugin_gd_documents_the_untyped_policy). Neither overlaps with the parse-hazard scan; they enforce the literal-suffix anti-drift guard from Self-update 2.3.2 -> 2.4.0 prints ~30 transient Parse Errors from handlers, dock panels, and update_mixed_state #398 and the in-source policy comment that prevents future contributors from "fixing" the apparent oversight.Closure of #410
script/audit-self-update-load-surfaceis deleted in this PR -- its logic now lives inside the test. PR #410 should be closed as absorbed.After this PR merges, run:
gh pr comment 410 --repo hi-godot/godot-ai --body "Absorbed into #<this-PR-number>; the audit logic now lives in tests/unit/test_plugin_self_update_safety.py with extended coverage." gh pr close 410 --repo hi-godot/godot-ai(I cannot close #410 from inside this PR's body / from this branch; leaving as a manual follow-up.)
How to verify locally
pytest tests/unit/test_plugin_self_update_safety.py -v-- 3 tests pass with baseline 1258.var x: McpErrorCodesto any handler not already at the limit (e.g.signal_handler.gd); rerun the test. Should fail withregressed: 1260 sites (baseline 1258)and the new line listed.BASELINE_VIOLATION_COUNTby 1 in the test file. Should fail with the "lower the baseline" message asking you to set it to 1258.Test plan
pytest tests/unit/test_plugin_self_update_safety.py -vpasses with baseline 1258pytest -qpasses (915 tests, no regressions)ruff check tests/cleanruff format tests/cleanOut of scope (and intentionally not done): fixing any of the 1258 violations. Step 1 is observation-only; Steps 2-3 of #399 do the fixing.
Refs: #242, #244, #398, #399. Absorbs #410.