Fix #397: pass encoding="utf-8" to read_text() in tests/ by dsarno · Pull Request #402 · hi-godot/godot-ai

dsarno · 2026-05-07T03:10:09Z

Summary

Path.read_text() with no arguments uses locale.getpreferredencoding() — cp1252 on Windows. test_plugin_self_update_safety.py rglob-walks the plugin tree and chokes on the verbatim Korean string in plugin/addons/godot_ai/utils/uv_cache_cleanup.gd, failing 4 tests on Windows. CI is Linux-only (#35) so the hazard was silent until a developer ran pytest on a Windows box.

This patch passes encoding="utf-8" to all 87 bare .read_text() calls across tests/unit/, and adds a new tests/unit/test_read_text_encoding.py AST lint that fails on any future bare call so the regression can't reappear.

Closes #397

Test plan

ruff check src/ tests/ + ruff format --check src/ tests/ — clean
pytest -q — 894 passed (was 893; +1 from the new lint)
Verified the new lint fails when bare read_text() is reintroduced (stashed the fix on test_plugin_self_update_safety.py, ran the new test, got the expected offender list).
Confirmed the underlying hazard: Path('plugin/addons/godot_ai/utils/uv_cache_cleanup.gd').read_text(encoding='cp1252') raises UnicodeDecodeError on the Korean line, while encoding='utf-8' reads cleanly.

Generated by Claude Code

Path.read_text() with no arguments uses locale.getpreferredencoding(). On Windows that's cp1252, which raises UnicodeDecodeError on the verbatim Korean string in plugin/addons/godot_ai/utils/uv_cache_cleanup.gd. test_plugin_self_update_safety.py rglob-walks the plugin tree and crashed 4 tests on Windows; CI is Linux-only (issue #35) so the hazard was silent until a developer ran pytest on Windows. Adds encoding="utf-8" to all 87 bare .read_text() calls across tests/unit/, plus a new test_read_text_encoding.py lint that fails when any future bare call lands. Closes #397

Copilot

Pull request overview

This PR hardens the Python test suite against Windows’ non-UTF-8 default locale encoding by explicitly using UTF-8 for Path.read_text() and adding a regression test to prevent reintroducing bare read_text() calls.

Changes:

Updated tests/unit/ to pass encoding="utf-8" to Path.read_text() calls to avoid UnicodeDecodeError on Windows.
Added an AST-based lint test that fails if any test file calls .read_text() without an explicit encoding=.
Minor formatting cleanups in a few touched tests while making the encoding changes.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/unit/test_tool_domains.py	Adds `encoding="utf-8"` to file reads used by tool-domain and catalog parsing tests.
tests/unit/test_self_update_smoke_harness.py	Ensures fixture validation reads plugin files as UTF-8.
tests/unit/test_self_update_orphan_recovery.py	Reads plugin/lifecycle sources with UTF-8 for Windows safety.
tests/unit/test_script_create_import_settle.py	Reads GDScript sources with UTF-8 for contract assertions.
tests/unit/test_resource_save_pause_processing.py	Reads multiple handler/plugin sources with UTF-8.
tests/unit/test_read_text_encoding.py	New AST lint test preventing future bare `.read_text()` calls in `tests/`.
tests/unit/test_plugin_self_update_safety.py	Fixes the Windows failure by reading `.gd` files with UTF-8 during `rglob`.
tests/unit/test_path_traversal_guard.py	Reads validator/handler sources with UTF-8.
tests/unit/test_error_code_parity.py	Reads `error_codes.gd` with UTF-8 for parity checks.
tests/unit/test_error_code_distribution.py	Reads handler `.gd` files with UTF-8 while scanning for code uses; formatting tweak.
tests/unit/test_editor_focus_refocus.py	Reads multiple plugin sources with UTF-8; minor assert formatting tweaks.
tests/unit/test_dock_cli_timeout.py	Reads dock/CLI GDScript sources with UTF-8 for timeout-contract checks.
tests/unit/test_deferred_timeout_contract.py	Reads dispatcher/error-codes sources with UTF-8.
tests/unit/test_camera_current_contract.py	Reads camera handler source with UTF-8 for contract checks.
tests/unit/test_audit_data_loss_safeguards.py	Reads runner/atomic-write sources with UTF-8; minor assert formatting tweaks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+This test enforces the hygiene rule across `tests/`: every bare
+`.read_text()` call (no kwargs) is an offender. CI is Linux-only
+(issue #35) so this is the only line of defense before a Windows
+developer hits the same trap.


+        "Without it, Python falls back to locale.getpreferredencoding(), "
+        "which is cp1252 on Windows and raises UnicodeDecodeError on any "
+        "non-ASCII byte in the file (issue #397). Add encoding='utf-8'. "
+        f"Bare calls: {offenders}"


codecov · 2026-05-07T03:13:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- Trim oversized module docstring to its WHY (cp1252 trap on Windows) - Trim assert message to one line; the offender list is the actionable part - Fast-path skip files where "read_text" doesn't appear before AST parse (~50% lint runtime saved on this tree per /simplify efficiency review) - Add four detector self-tests (positive, negative, unrelated-name, multi-offender) so a future refactor can't silently turn the lint into a no-op — mirrors pattern in test_no_direct_undo_redo_in_gdscript_tests.py and test_gdscript_no_adjacent_string_concat.py

Note in the helper docstring that the matcher catches only the keyword form. Positional `read_text("utf-8")` and `**kwargs` splat are opaque to AST and would falsely trip the lint — codebase convention is the keyword form, so the match stays exact.

Copilot AI review requested due to automatic review settings May 7, 2026 03:10

Copilot started reviewing on behalf of dsarno May 7, 2026 03:10 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

claude added 2 commits May 7, 2026 03:13

dsarno merged commit 274870a into beta May 7, 2026
11 checks passed

github-actions Bot mentioned this pull request May 7, 2026

Windows: pytest fails 4 tests in test_plugin_self_update_safety.py — read_text() defaults to cp1252 #397

Closed

dsarno deleted the claude/issue-397-utf8-read-text branch May 7, 2026 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #397: pass encoding="utf-8" to read_text() in tests/#402

Fix #397: pass encoding="utf-8" to read_text() in tests/#402
dsarno merged 3 commits into
betafrom
claude/issue-397-utf8-read-text

dsarno commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

codecov Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dsarno commented May 7, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

codecov Bot commented May 7, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants