Skip to content

Fix #397: pass encoding="utf-8" to read_text() in tests/#402

Merged
dsarno merged 3 commits into
betafrom
claude/issue-397-utf8-read-text
May 7, 2026
Merged

Fix #397: pass encoding="utf-8" to read_text() in tests/#402
dsarno merged 3 commits into
betafrom
claude/issue-397-utf8-read-text

Conversation

@dsarno

@dsarno dsarno commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Path.read_text() with no arguments uses locale.getpreferredencoding() — cp1252 on Windows. test_plugin_self_update_safety.py rglob-walks the plugin tree and chokes on the verbatim Korean string in plugin/addons/godot_ai/utils/uv_cache_cleanup.gd, failing 4 tests on Windows. CI is Linux-only (#35) so the hazard was silent until a developer ran pytest on a Windows box.

This patch passes encoding="utf-8" to all 87 bare .read_text() calls across tests/unit/, and adds a new tests/unit/test_read_text_encoding.py AST lint that fails on any future bare call so the regression can't reappear.

Closes #397

Test plan

  • ruff check src/ tests/ + ruff format --check src/ tests/ — clean
  • pytest -q — 894 passed (was 893; +1 from the new lint)
  • Verified the new lint fails when bare read_text() is reintroduced (stashed the fix on test_plugin_self_update_safety.py, ran the new test, got the expected offender list).
  • Confirmed the underlying hazard: Path('plugin/addons/godot_ai/utils/uv_cache_cleanup.gd').read_text(encoding='cp1252') raises UnicodeDecodeError on the Korean line, while encoding='utf-8' reads cleanly.

Generated by Claude Code

Path.read_text() with no arguments uses locale.getpreferredencoding().
On Windows that's cp1252, which raises UnicodeDecodeError on the
verbatim Korean string in plugin/addons/godot_ai/utils/uv_cache_cleanup.gd.
test_plugin_self_update_safety.py rglob-walks the plugin tree and
crashed 4 tests on Windows; CI is Linux-only (issue #35) so the hazard
was silent until a developer ran pytest on Windows.

Adds encoding="utf-8" to all 87 bare .read_text() calls across
tests/unit/, plus a new test_read_text_encoding.py lint that fails
when any future bare call lands.

Closes #397
Copilot AI review requested due to automatic review settings May 7, 2026 03:10

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Python test suite against Windows’ non-UTF-8 default locale encoding by explicitly using UTF-8 for Path.read_text() and adding a regression test to prevent reintroducing bare read_text() calls.

Changes:

  • Updated tests/unit/ to pass encoding="utf-8" to Path.read_text() calls to avoid UnicodeDecodeError on Windows.
  • Added an AST-based lint test that fails if any test file calls .read_text() without an explicit encoding=.
  • Minor formatting cleanups in a few touched tests while making the encoding changes.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/test_tool_domains.py Adds encoding="utf-8" to file reads used by tool-domain and catalog parsing tests.
tests/unit/test_self_update_smoke_harness.py Ensures fixture validation reads plugin files as UTF-8.
tests/unit/test_self_update_orphan_recovery.py Reads plugin/lifecycle sources with UTF-8 for Windows safety.
tests/unit/test_script_create_import_settle.py Reads GDScript sources with UTF-8 for contract assertions.
tests/unit/test_resource_save_pause_processing.py Reads multiple handler/plugin sources with UTF-8.
tests/unit/test_read_text_encoding.py New AST lint test preventing future bare .read_text() calls in tests/.
tests/unit/test_plugin_self_update_safety.py Fixes the Windows failure by reading .gd files with UTF-8 during rglob.
tests/unit/test_path_traversal_guard.py Reads validator/handler sources with UTF-8.
tests/unit/test_error_code_parity.py Reads error_codes.gd with UTF-8 for parity checks.
tests/unit/test_error_code_distribution.py Reads handler .gd files with UTF-8 while scanning for code uses; formatting tweak.
tests/unit/test_editor_focus_refocus.py Reads multiple plugin sources with UTF-8; minor assert formatting tweaks.
tests/unit/test_dock_cli_timeout.py Reads dock/CLI GDScript sources with UTF-8 for timeout-contract checks.
tests/unit/test_deferred_timeout_contract.py Reads dispatcher/error-codes sources with UTF-8.
tests/unit/test_camera_current_contract.py Reads camera handler source with UTF-8 for contract checks.
tests/unit/test_audit_data_loss_safeguards.py Reads runner/atomic-write sources with UTF-8; minor assert formatting tweaks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/unit/test_read_text_encoding.py Outdated
Comment on lines +17 to +20
This test enforces the hygiene rule across `tests/`: every bare
`.read_text()` call (no kwargs) is an offender. CI is Linux-only
(issue #35) so this is the only line of defense before a Windows
developer hits the same trap.
Comment thread tests/unit/test_read_text_encoding.py Outdated
"Without it, Python falls back to locale.getpreferredencoding(), "
"which is cp1252 on Windows and raises UnicodeDecodeError on any "
"non-ASCII byte in the file (issue #397). Add encoding='utf-8'. "
f"Bare calls: {offenders}"
@codecov

codecov Bot commented May 7, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

claude added 2 commits May 7, 2026 03:13
- Trim oversized module docstring to its WHY (cp1252 trap on Windows)
- Trim assert message to one line; the offender list is the actionable part
- Fast-path skip files where "read_text" doesn't appear before AST parse
  (~50% lint runtime saved on this tree per /simplify efficiency review)
- Add four detector self-tests (positive, negative, unrelated-name,
  multi-offender) so a future refactor can't silently turn the lint into a
  no-op — mirrors pattern in test_no_direct_undo_redo_in_gdscript_tests.py
  and test_gdscript_no_adjacent_string_concat.py
Note in the helper docstring that the matcher catches only the keyword
form. Positional `read_text("utf-8")` and `**kwargs` splat are opaque to
AST and would falsely trip the lint — codebase convention is the keyword
form, so the match stays exact.
@dsarno dsarno merged commit 274870a into beta May 7, 2026
11 checks passed
@dsarno dsarno deleted the claude/issue-397-utf8-read-text branch May 7, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants