Skip to content

Expanded test coverage for binary content#4332

Merged
kashifkhan0771 merged 14 commits into
trufflesecurity:mainfrom
kashifkhan0771:tests/oss-268
Aug 5, 2025
Merged

Expanded test coverage for binary content#4332
kashifkhan0771 merged 14 commits into
trufflesecurity:mainfrom
kashifkhan0771:tests/oss-268

Conversation

@kashifkhan0771

@kashifkhan0771 kashifkhan0771 commented Jul 24, 2025

Copy link
Copy Markdown
Contributor

Description:

This PR adds some test cases for git diff with binary content of different types.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@kashifkhan0771 kashifkhan0771 requested a review from a team as a code owner July 24, 2025 08:19
@kashifkhan0771 kashifkhan0771 requested a review from a team July 24, 2025 08:19
Comment thread pkg/gitparse/gitparse_test.go Outdated
// TODO - Add test coverage for binary diffs (if it isn't already elsewhere)

if actual.IsBinary {
if actual.contentWriter != nil {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that for binary content, the actual content is always nil by default. 😕

@ahrav ahrav Jul 24, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a correct. A git diff focuses on showing line-by-line changes, which is a concept that doesn't apply to binary files (like images, compiled code, or PDFs).

Instead of generating a meaningless content diff, Git simply records that the binary file has been modified. To access the actual raw content of the file, a different approach is needed. The standard command is git cat-file, which lets us stream the data directly for scanning here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I read a bit about this that by default, git diff doesn’t render the content of binary files unless some configuration tweaks are made.

In that case I think these test cases are solid. We’re generating binary content and parsing it without any errors, which is essentially the goal here, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Does the git cat-file command only return binary gibberish, or does it provide human-readable output?

@rosecodym

Copy link
Copy Markdown
Contributor

Correct me if I'm wrong: The test you modified doesn't actually parse the binary data, right? I can't find anywhere that your new, synthetic binary data is actually read in the test code, so I'm concerned that it might mislead readers. (I ran the updated test locally after fiddling with the test data and everything still passed.)

@kashifkhan0771

Copy link
Copy Markdown
Contributor Author

Correct me if I'm wrong: The test you modified doesn't actually parse the binary data, right? I can't find anywhere that your new, synthetic binary data is actually read in the test code, so I'm concerned that it might mislead readers. (I ran the updated test locally after fiddling with the test data and everything still passed.)

Discussed locally. We need to check If we can find somewhere in the codebase that makes sense to test actual binary content.

@kashifkhan0771 kashifkhan0771 marked this pull request as draft July 25, 2025 11:40
@kashifkhan0771 kashifkhan0771 marked this pull request as ready for review August 4, 2025 09:46
@kashifkhan0771

Copy link
Copy Markdown
Contributor Author

Discussed locally. We need to check If we can find somewhere in the codebase that makes sense to test actual binary content.

I reverted the previous changes in the gitparse package and added binary content tests to both the chunker_test.go and filesystem scan. These should serve as a solid foundation for expanding test coverage of binary content scanning.

@kashifkhan0771 kashifkhan0771 changed the title Expanded test coverage for binary git diffs Expanded test coverage for binary content Aug 4, 2025
@kashifkhan0771

Copy link
Copy Markdown
Contributor Author

I am planning to add some test cases to git_test.go as well.

@camgunz camgunz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG! Love this approach 👍🏻

@kashifkhan0771 kashifkhan0771 merged commit 9d7c0af into trufflesecurity:main Aug 5, 2025
13 checks passed
@kashifkhan0771 kashifkhan0771 deleted the tests/oss-268 branch August 5, 2025 07:40
peterfraedrich pushed a commit to peterfraedrich/trufflehog that referenced this pull request Mar 15, 2026
* Expanded test coverage for binary git diffs

* reverted old changes and added binary test in chunker

* revert more

* added filesystem binary file scan test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants