Skip to content

Commit e35dcd2

Browse files
authored
fix(core): handle multibyte UTF-8 characters in socket message consumption (#34151)
## Current Behavior When socket data chunks split a multibyte UTF-8 character (e.g., CJK characters like Korean, Chinese, Japanese) at an arbitrary byte boundary, `Buffer.toString()` decodes incomplete byte sequences as replacement characters (�), causing message corruption. This can occur when: - File paths contain non-ASCII characters - Project names include multibyte characters - Any JSON message contains international text ## Expected Behavior Multibyte UTF-8 characters should be properly decoded even when split across multiple socket data chunks. The fix uses Node.js `StringDecoder` which buffers incomplete multibyte sequences until the remaining bytes arrive. ## Related Issue(s) Fixes socket message corruption for paths/names containing multibyte characters.
1 parent 2511215 commit e35dcd2

2 files changed

Lines changed: 22 additions & 1 deletion

File tree

packages/nx/src/utils/consume-messages-from-socket.spec.ts

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,4 +46,22 @@ describe('consumeMessagesFromSocket', () => {
4646

4747
expect(messages).toEqual([{ one: 1 }, { two: 2 }, { three: 3 }]);
4848
});
49+
50+
it('should handle multibyte UTF-8 characters split across chunks', () => {
51+
const messages = [] as any[];
52+
const r = consumeMessagesFromSocket((message) =>
53+
messages.push(JSON.parse(message))
54+
);
55+
56+
// "한글테스트" path included in JSON
57+
const json = JSON.stringify({ path: '/test/한글테스트.tsx' });
58+
const buffer = Buffer.from(json + MESSAGE_END_SEQ, 'utf8');
59+
60+
// Split in the middle of a multibyte character
61+
const mid = Math.floor(buffer.length / 2);
62+
r(buffer.subarray(0, mid));
63+
r(buffer.subarray(mid));
64+
65+
expect(messages).toEqual([{ path: '/test/한글테스트.tsx' }]);
66+
});
4967
});

packages/nx/src/utils/consume-messages-from-socket.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
1+
import { StringDecoder } from 'string_decoder';
2+
13
const VERY_END_CODE = 4;
24
export const MESSAGE_END_SEQ =
35
'NX_MSG_END' + String.fromCharCode(VERY_END_CODE);
46

57
export function consumeMessagesFromSocket(callback: (message: string) => void) {
68
let message = '';
9+
const decoder = new StringDecoder('utf8');
710
return (data) => {
8-
const chunk = data.toString();
11+
const chunk = decoder.write(data);
912
message += chunk;
1013

1114
// Check if accumulated message ends with MESSAGE_END_SEQ (not just the chunk)

0 commit comments

Comments
 (0)