Fix #397: pass encoding="utf-8" to read_text() in tests/#402
Merged
Conversation
Path.read_text() with no arguments uses locale.getpreferredencoding(). On Windows that's cp1252, which raises UnicodeDecodeError on the verbatim Korean string in plugin/addons/godot_ai/utils/uv_cache_cleanup.gd. test_plugin_self_update_safety.py rglob-walks the plugin tree and crashed 4 tests on Windows; CI is Linux-only (issue #35) so the hazard was silent until a developer ran pytest on Windows. Adds encoding="utf-8" to all 87 bare .read_text() calls across tests/unit/, plus a new test_read_text_encoding.py lint that fails when any future bare call lands. Closes #397
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the Python test suite against Windows’ non-UTF-8 default locale encoding by explicitly using UTF-8 for Path.read_text() and adding a regression test to prevent reintroducing bare read_text() calls.
Changes:
- Updated
tests/unit/to passencoding="utf-8"toPath.read_text()calls to avoidUnicodeDecodeErroron Windows. - Added an AST-based lint test that fails if any test file calls
.read_text()without an explicitencoding=. - Minor formatting cleanups in a few touched tests while making the encoding changes.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_tool_domains.py | Adds encoding="utf-8" to file reads used by tool-domain and catalog parsing tests. |
| tests/unit/test_self_update_smoke_harness.py | Ensures fixture validation reads plugin files as UTF-8. |
| tests/unit/test_self_update_orphan_recovery.py | Reads plugin/lifecycle sources with UTF-8 for Windows safety. |
| tests/unit/test_script_create_import_settle.py | Reads GDScript sources with UTF-8 for contract assertions. |
| tests/unit/test_resource_save_pause_processing.py | Reads multiple handler/plugin sources with UTF-8. |
| tests/unit/test_read_text_encoding.py | New AST lint test preventing future bare .read_text() calls in tests/. |
| tests/unit/test_plugin_self_update_safety.py | Fixes the Windows failure by reading .gd files with UTF-8 during rglob. |
| tests/unit/test_path_traversal_guard.py | Reads validator/handler sources with UTF-8. |
| tests/unit/test_error_code_parity.py | Reads error_codes.gd with UTF-8 for parity checks. |
| tests/unit/test_error_code_distribution.py | Reads handler .gd files with UTF-8 while scanning for code uses; formatting tweak. |
| tests/unit/test_editor_focus_refocus.py | Reads multiple plugin sources with UTF-8; minor assert formatting tweaks. |
| tests/unit/test_dock_cli_timeout.py | Reads dock/CLI GDScript sources with UTF-8 for timeout-contract checks. |
| tests/unit/test_deferred_timeout_contract.py | Reads dispatcher/error-codes sources with UTF-8. |
| tests/unit/test_camera_current_contract.py | Reads camera handler source with UTF-8 for contract checks. |
| tests/unit/test_audit_data_loss_safeguards.py | Reads runner/atomic-write sources with UTF-8; minor assert formatting tweaks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+17
to
+20
| This test enforces the hygiene rule across `tests/`: every bare | ||
| `.read_text()` call (no kwargs) is an offender. CI is Linux-only | ||
| (issue #35) so this is the only line of defense before a Windows | ||
| developer hits the same trap. |
| "Without it, Python falls back to locale.getpreferredencoding(), " | ||
| "which is cp1252 on Windows and raises UnicodeDecodeError on any " | ||
| "non-ASCII byte in the file (issue #397). Add encoding='utf-8'. " | ||
| f"Bare calls: {offenders}" |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
- Trim oversized module docstring to its WHY (cp1252 trap on Windows) - Trim assert message to one line; the offender list is the actionable part - Fast-path skip files where "read_text" doesn't appear before AST parse (~50% lint runtime saved on this tree per /simplify efficiency review) - Add four detector self-tests (positive, negative, unrelated-name, multi-offender) so a future refactor can't silently turn the lint into a no-op — mirrors pattern in test_no_direct_undo_redo_in_gdscript_tests.py and test_gdscript_no_adjacent_string_concat.py
Note in the helper docstring that the matcher catches only the keyword
form. Positional `read_text("utf-8")` and `**kwargs` splat are opaque to
AST and would falsely trip the lint — codebase convention is the keyword
form, so the match stays exact.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Path.read_text()with no arguments useslocale.getpreferredencoding()— cp1252 on Windows.test_plugin_self_update_safety.pyrglob-walks the plugin tree and chokes on the verbatim Korean string inplugin/addons/godot_ai/utils/uv_cache_cleanup.gd, failing 4 tests on Windows. CI is Linux-only (#35) so the hazard was silent until a developer ran pytest on a Windows box.This patch passes
encoding="utf-8"to all 87 bare.read_text()calls acrosstests/unit/, and adds a newtests/unit/test_read_text_encoding.pyAST lint that fails on any future bare call so the regression can't reappear.Closes #397
Test plan
ruff check src/ tests/+ruff format --check src/ tests/— cleanpytest -q— 894 passed (was 893; +1 from the new lint)read_text()is reintroduced (stashed the fix ontest_plugin_self_update_safety.py, ran the new test, got the expected offender list).Path('plugin/addons/godot_ai/utils/uv_cache_cleanup.gd').read_text(encoding='cp1252')raisesUnicodeDecodeErroron the Korean line, whileencoding='utf-8'reads cleanly.Generated by Claude Code