Describe the bug
The MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words. café becomes caf退.
src/core/operations/MIMEDecoding.mjs, inside the =?charset?B?...?= handling branch:
text = (0, _Base.fromBase64)(text);
// ...
return _codepage.default.utils.decode(65001, encodedText);
fromBase64 defaults its returnType parameter to "string", which calls byteArrayToUtf8 on the decoded bytes before returning. The result is a UTF-8 decoded string (e.g. "café" — 4 characters). This string is then passed to codepage.utils.decode(65001, ...), which splits it into characters and maps each via charCodeAt(0), treating each code point as a raw byte value. For multi-byte UTF-8 characters, the code point no longer matches the original bytes, so the second UTF-8 decode produces garbage.
Concretely for =?UTF-8?B?Y2Fmw6k=?=:
- Base64 decodes to bytes
[99, 97, 102, 195, 169] (UTF-8 for "café")
fromBase64 returns the string "café" (5 bytes → 4 chars)
codepage.decode(65001, "café") splits into char codes [99, 97, 102, 233]
233 (0xE9) is treated as a UTF-8 lead byte → decoded as part of a 3-byte sequence → produces U+9000 (退)
To Reproduce
https://gchq.github.io/CyberChef/#recipe=MIME_Decoding()&input=U3ViamVjdDogPT9VVEYtOD9CP1kyRm13Nms9Pz0
Expected: Subject: café. Actual: Subject: caf退.
Additional context
Suggested fix — pass "byteArray" as the returnType so fromBase64 returns raw bytes instead of a decoded string:
- text = (0, _Base.fromBase64)(text);
+ text = (0, _Base.fromBase64)(text, undefined, "byteArray");
This makes codepage.decode(65001, ...) receive a Uint8Array directly, bypassing the charCodeAt string path. The UTF-8 decoding then happens exactly once.
Describe the bug
The MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words.
cafébecomescaf退.src/core/operations/MIMEDecoding.mjs, inside the=?charset?B?...?=handling branch:fromBase64defaults itsreturnTypeparameter to"string", which callsbyteArrayToUtf8on the decoded bytes before returning. The result is a UTF-8 decoded string (e.g."café"— 4 characters). This string is then passed tocodepage.utils.decode(65001, ...), which splits it into characters and maps each viacharCodeAt(0), treating each code point as a raw byte value. For multi-byte UTF-8 characters, the code point no longer matches the original bytes, so the second UTF-8 decode produces garbage.Concretely for
=?UTF-8?B?Y2Fmw6k=?=:[99, 97, 102, 195, 169](UTF-8 for "café")fromBase64returns the string"café"(5 bytes → 4 chars)codepage.decode(65001, "café")splits into char codes[99, 97, 102, 233]233(0xE9) is treated as a UTF-8 lead byte → decoded as part of a 3-byte sequence → producesU+9000(退)To Reproduce
https://gchq.github.io/CyberChef/#recipe=MIME_Decoding()&input=U3ViamVjdDogPT9VVEYtOD9CP1kyRm13Nms9Pz0
Expected:
Subject: café. Actual:Subject: caf退.Additional context
Suggested fix — pass
"byteArray"as thereturnTypesofromBase64returns raw bytes instead of a decoded string:This makes
codepage.decode(65001, ...)receive aUint8Arraydirectly, bypassing thecharCodeAtstring path. The UTF-8 decoding then happens exactly once.