Commit e35dcd2
authored
fix(core): handle multibyte UTF-8 characters in socket message consumption (#34151)
## Current Behavior
When socket data chunks split a multibyte UTF-8 character (e.g., CJK
characters like Korean, Chinese, Japanese) at an arbitrary byte
boundary, `Buffer.toString()` decodes incomplete byte sequences as
replacement characters (�), causing message corruption.
This can occur when:
- File paths contain non-ASCII characters
- Project names include multibyte characters
- Any JSON message contains international text
## Expected Behavior
Multibyte UTF-8 characters should be properly decoded even when split
across multiple socket data chunks. The fix uses Node.js `StringDecoder`
which buffers incomplete multibyte sequences until the remaining bytes
arrive.
## Related Issue(s)
Fixes socket message corruption for paths/names containing multibyte
characters.1 parent 2511215 commit e35dcd2
2 files changed
Lines changed: 22 additions & 1 deletion
Lines changed: 18 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
49 | 67 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
1 | 3 | | |
2 | 4 | | |
3 | 5 | | |
4 | 6 | | |
5 | 7 | | |
6 | 8 | | |
| 9 | + | |
7 | 10 | | |
8 | | - | |
| 11 | + | |
9 | 12 | | |
10 | 13 | | |
11 | 14 | | |
| |||
0 commit comments