Bug report: MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words

**Describe the bug**
The MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words. `café` becomes `caf退`.

`src/core/operations/MIMEDecoding.mjs`, inside the `=?charset?B?...?=` handling branch:

```javascript
text = (0, _Base.fromBase64)(text);
// ...
return _codepage.default.utils.decode(65001, encodedText);
```

`fromBase64` defaults its `returnType` parameter to `"string"`, which calls `byteArrayToUtf8` on the decoded bytes before returning. The result is a UTF-8 decoded string (e.g. `"café"` — 4 characters). This string is then passed to `codepage.utils.decode(65001, ...)`, which splits it into characters and maps each via `charCodeAt(0)`, treating each code point as a raw byte value. For multi-byte UTF-8 characters, the code point no longer matches the original bytes, so the second UTF-8 decode produces garbage.

Concretely for `=?UTF-8?B?Y2Fmw6k=?=`:

1. Base64 decodes to bytes `[99, 97, 102, 195, 169]` (UTF-8 for "café")
2. `fromBase64` returns the string `"café"` (5 bytes → 4 chars)
3. `codepage.decode(65001, "café")` splits into char codes `[99, 97, 102, 233]`
4. `233` (`0xE9`) is treated as a UTF-8 lead byte → decoded as part of a 3-byte sequence → produces `U+9000` (退)

**To Reproduce**
https://gchq.github.io/CyberChef/#recipe=MIME_Decoding()&input=U3ViamVjdDogPT9VVEYtOD9CP1kyRm13Nms9Pz0

Expected: `Subject: café`. Actual: `Subject: caf退`.

**Additional context**
Suggested fix — pass `"byteArray"` as the `returnType` so `fromBase64` returns raw bytes instead of a decoded string:

```diff
- text = (0, _Base.fromBase64)(text);
+ text = (0, _Base.fromBase64)(text, undefined, "byteArray");
```

This makes `codepage.decode(65001, ...)` receive a `Uint8Array` directly, bypassing the `charCodeAt` string path. The UTF-8 decoding then happens exactly once.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug report: MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words #2280

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug report: MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words #2280

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions