fix: handle UnicodeDecodeError on usernames with special characters#2853
Open
salmanrajz wants to merge 2 commits intosherlock-project:masterfrom
Open
fix: handle UnicodeDecodeError on usernames with special characters#2853salmanrajz wants to merge 2 commits intosherlock-project:masterfrom
salmanrajz wants to merge 2 commits intosherlock-project:masterfrom
Conversation
Fixes sherlock-project#2730. Usernames containing non-ASCII characters (e.g. 'Émile') can trigger a UnicodeDecodeError inside the requests library during redirect handling. This exception is not a subclass of requests.exceptions.RequestException, so it escaped all existing except blocks in get_response() and crashed the program. Added a catch for UnicodeError (parent of both UnicodeDecodeError and UnicodeEncodeError) so these sites are gracefully skipped instead of crashing the entire scan. Added regression tests in tests/test_unicode.py.
7adf61b to
4656d95
Compare
Contributor
Author
|
CI Note: The Our new tests in |
Pornhub was added to the remote false_positive_exclusions.txt, causing test_remove_nsfw and test_nsfw_explicit_selection to fail since the site gets filtered out before the test runs. Replaced with Xvideos and Erome which are NSFW-flagged but not excluded.
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2730
Problem
Usernames containing non-ASCII characters (e.g.
Émile) crash Sherlock with aUnicodeDecodeError. The exception is raised inside therequestslibrary during redirect handling when a server returns a non-UTF-8 encoded Location header.UnicodeDecodeErroris not a subclass ofrequests.exceptions.RequestException, so it escapes all existing except blocks inget_response()and propagates up as an unhandled crash.Fix
Added a catch for
UnicodeError(parent of bothUnicodeDecodeErrorandUnicodeEncodeError) inget_response(). Sites that trigger encoding errors are now gracefully reported asEncoding Errorinstead of crashing the entire scan.Changes
sherlock_project/sherlock.py: Addedexcept UnicodeErrorhandler inget_response()tests/test_unicode.py: Added regression tests for bothUnicodeDecodeErrorandUnicodeEncodeErrorTesting