Describe the bug
The Text Encoding Brute Force operation declares inputType = "string", which means CyberChef's framework passes it a UTF-8 decoded string. The operation then feeds this string to codepage.utils.decode(charset, input), which interprets each character's charCodeAt(0) as a raw byte value. For non-ASCII input, the char codes no longer correspond to the original bytes, producing incorrect decoding for all code pages.
In src/operations/TextEncodingBruteForce.mjs:
this.inputType = "string"; // BUG: should be "byteArray"
When inputType is "string", the CyberChef framework calls byteArrayToUtf8 on the input bytes before passing them to the operation's run method. The run method then passes this UTF-8 string to codepage.decode for each code page.
codepage.decode(cp, data) has a string branch:
if (typeof data === "string") return decode(cp, data.split("").map(cca));
// cca = x => x.charCodeAt(0)
This treats each Unicode code point as a byte value. For a proper UTF-8 string like "café", the é character has code point 233 (0xE9), but the original byte sequence was [0xC3, 0xA9] (two bytes). The single value 233 is then decoded under the target code page as if it were a single byte — wrong for every encoding.
To Reproduce
https://gchq.github.io/CyberChef/#recipe=From_Hex('Auto')Text_Encoding_Brute_Force('Decode')&input=NjMgNjEgNjYgYzMgYTk&oenc=65001
const input = Buffer.from([0x63, 0x61, 0x66, 0xc3, 0xa9]);
const out = chef.textEncodingBruteForce(input, { Mode: "Decode" });
console.log(
"input bytes:",
[...input].map((b) => b.toString(16).padStart(2, "0")).join(" ")
);
console.log("dish type:", out.type);
console.log("utf8:", JSON.stringify(out.value["UTF-8 (65001)"]));
console.log(
"cp500:",
JSON.stringify(out.value["IBM EBCDIC International (500)"])
);
Output I got:
input bytes: 63 61 66 c3 a9
dish type: 6
utf8: "caf退"
cp500: "Ä/ÃZ"
raw-byte CP500 decode should be:
python3 - <<'PY'
print(bytes([0x63, 0x61, 0x66, 0xc3, 0xa9]).decode("cp500"))
PY
Expected output:
Describe the bug
The Text Encoding Brute Force operation declares
inputType = "string", which means CyberChef's framework passes it a UTF-8 decoded string. The operation then feeds this string tocodepage.utils.decode(charset, input), which interprets each character'scharCodeAt(0)as a raw byte value. For non-ASCII input, the char codes no longer correspond to the original bytes, producing incorrect decoding for all code pages.In
src/operations/TextEncodingBruteForce.mjs:When
inputTypeis"string", the CyberChef framework callsbyteArrayToUtf8on the input bytes before passing them to the operation'srunmethod. Therunmethod then passes this UTF-8 string tocodepage.decodefor each code page.codepage.decode(cp, data)has a string branch:This treats each Unicode code point as a byte value. For a proper UTF-8 string like
"café", theécharacter has code point 233 (0xE9), but the original byte sequence was[0xC3, 0xA9](two bytes). The single value233is then decoded under the target code page as if it were a single byte — wrong for every encoding.To Reproduce
https://gchq.github.io/CyberChef/#recipe=From_Hex('Auto')Text_Encoding_Brute_Force('Decode')&input=NjMgNjEgNjYgYzMgYTk&oenc=65001
Output I got:
raw-byte CP500 decode should be:
Expected output: