Skip to content

Bug report: MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words #2280

@williballenthin

Description

@williballenthin

Describe the bug
The MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words. café becomes caf退.

src/core/operations/MIMEDecoding.mjs, inside the =?charset?B?...?= handling branch:

text = (0, _Base.fromBase64)(text);
// ...
return _codepage.default.utils.decode(65001, encodedText);

fromBase64 defaults its returnType parameter to "string", which calls byteArrayToUtf8 on the decoded bytes before returning. The result is a UTF-8 decoded string (e.g. "café" — 4 characters). This string is then passed to codepage.utils.decode(65001, ...), which splits it into characters and maps each via charCodeAt(0), treating each code point as a raw byte value. For multi-byte UTF-8 characters, the code point no longer matches the original bytes, so the second UTF-8 decode produces garbage.

Concretely for =?UTF-8?B?Y2Fmw6k=?=:

  1. Base64 decodes to bytes [99, 97, 102, 195, 169] (UTF-8 for "café")
  2. fromBase64 returns the string "café" (5 bytes → 4 chars)
  3. codepage.decode(65001, "café") splits into char codes [99, 97, 102, 233]
  4. 233 (0xE9) is treated as a UTF-8 lead byte → decoded as part of a 3-byte sequence → produces U+9000 (退)

To Reproduce
https://gchq.github.io/CyberChef/#recipe=MIME_Decoding()&input=U3ViamVjdDogPT9VVEYtOD9CP1kyRm13Nms9Pz0

Expected: Subject: café. Actual: Subject: caf退.

Additional context
Suggested fix — pass "byteArray" as the returnType so fromBase64 returns raw bytes instead of a decoded string:

- text = (0, _Base.fromBase64)(text);
+ text = (0, _Base.fromBase64)(text, undefined, "byteArray");

This makes codepage.decode(65001, ...) receive a Uint8Array directly, bypassing the charCodeAt string path. The UTF-8 decoding then happens exactly once.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions