Utf16 bom support by joeleonjr · Pull Request #4326 · trufflesecurity/trufflehog

joeleonjr · 2025-07-21T18:28:57Z

Description:

Secrets in UTF-16-Encoded Files are not always detected due to data chunk changes made in the UTF-8 extractSubstrings() function.

In engine.go, TH loops each decoder, passing the Chunk Data in for processing. The UTF-8 decoder runs first. If the data chunk is invalid UTF-8, the UTF-8 decoder will execute the function extractSubstrings(). The result of that function is applied to the Chunk's Data field, which is then passed into all subsequent decoders. Part of that function alters the data structure of valid UTF-16 data, making detecting some secrets impossible.

Here's an example to test out:

echo <VALID_DETECTABLE_SECRET> > secret.txt
printf '\xFF\xFE' > utf16le.txt && iconv -f UTF-8 -t UTF-16LE secret.txt >> utf16le.txt
printf '\xFF\xFE' > utf16le.txt && iconv -f UTF-8 -t UTF-16LE secret.txt >> utf16le.txt
trufflehog filesystem utf*

Originally, I thought the problem was we did not address the UTF-16 Byte Order Marks (BOM) #FEFF and #FFFE. However, the existing logic takes care of those in the utf16ToUTF8 function in utf16.go. I added two test cases to prove that.

The only change needed is creating a copy of the chunk prior to processing each decoder.

If that change is too expensive, I have 2 other ideas:

Move extractSubstrings out from the UTF-8 decoder and invoke it directly engine.go prior to running FindDetectorMatches during a failed UTF-8 decode.
Store the results of that function in a separate variable for later processing in FindDetectorMatches.

zricethezav · 2025-08-08T13:51:36Z

@joeleonjr lgtm

* added UTF-16 BOM support * removed BOM removal; doesn't make a difference

joeleonjr added 2 commits July 21, 2025 13:58

added UTF-16 BOM support

dc23e71

removed BOM removal; doesn't make a difference

3e14971

zricethezav approved these changes Aug 8, 2025

View reviewed changes

Merge branch 'main' into utf16-BOM

537b3d4

joeleonjr marked this pull request as ready for review August 8, 2025 14:15

joeleonjr requested a review from a team as a code owner August 8, 2025 14:16

joeleonjr requested a review from a team August 8, 2025 14:16

shahzadhaider1 approved these changes Aug 8, 2025

View reviewed changes

amanfcp approved these changes Aug 11, 2025

View reviewed changes

zricethezav merged commit c319bb8 into trufflesecurity:main Aug 14, 2025
13 checks passed

blsaccess mentioned this pull request Aug 15, 2025

Update trufflehog to 3.90.5 blacklanternsecurity/bbot#2590

Merged

peterfraedrich pushed a commit to peterfraedrich/trufflehog that referenced this pull request Mar 15, 2026

Utf16 bom support (trufflesecurity#4326)

8698a22

* added UTF-16 BOM support * removed BOM removal; doesn't make a difference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utf16 bom support#4326

Utf16 bom support#4326
zricethezav merged 3 commits into
trufflesecurity:mainfrom
joeleonjr:utf16-BOM

joeleonjr commented Jul 21, 2025

Uh oh!

zricethezav commented Aug 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

joeleonjr commented Jul 21, 2025

Description:

Uh oh!

zricethezav commented Aug 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants