Describe the bug
The Unescape Unicode Characters operation fails to decode valid code points with more than 4 hex digits when using the U+ prefix, breaking support for astral plane characters like emoji.
src/core/operations/UnescapeUnicodeCharacters.mjs, run() method, line 55
run(input, args) {
const prefix = prefixToRegex[args[0]],
regex = new RegExp(prefix+"([a-f\\d]{4})", "ig");
// ...
}
The regex is hardcoded to exactly 4 hex digits for all prefixes. This rejects notation like U+1F600 (😀) and U+000041 (zero-padded A). It also breaks round-trips. Escape Unicode Characters can emit 6-digit output like U+000041 when configured with Padding: 6, but Unescape cannot decode it.
To Reproduce
add Unescape Unicode Characters with prefix U+, input U+1F600. Expected: 😀. Actual: no match.
Screenshots
Additional context
Proposed fix widens the quantifier for U+ only:
run(input, args) {
const prefix = prefixToRegex[args[0]],
regex = args[0] === "U+"
? new RegExp(prefix+"([a-f\\d]{4,6})", "ig")
: new RegExp(prefix+"([a-f\\d]{4})", "ig");
// ...
}
Standard U+ notation allows variable-length hex sequences from 4 to 6 digits. The \u and %u forms are legacy and expect exactly 4 digits (or surrogate pairs), so they retain the fixed-length requirement.
Describe the bug
The
Unescape Unicode Charactersoperation fails to decode valid code points with more than 4 hex digits when using theU+prefix, breaking support for astral plane characters like emoji.src/core/operations/UnescapeUnicodeCharacters.mjs,run()method, line 55The regex is hardcoded to exactly 4 hex digits for all prefixes. This rejects notation like
U+1F600(😀) andU+000041(zero-padded A). It also breaks round-trips.Escape Unicode Characterscan emit 6-digit output likeU+000041when configured withPadding: 6, butUnescapecannot decode it.To Reproduce
add
Unescape Unicode Characterswith prefixU+, inputU+1F600. Expected:😀. Actual: no match.Screenshots
Additional context
Proposed fix widens the quantifier for
U+only:Standard
U+notation allows variable-length hex sequences from 4 to 6 digits. The\uand%uforms are legacy and expect exactly 4 digits (or surrogate pairs), so they retain the fixed-length requirement.