Skip to content

fix: handle UnicodeDecodeError on usernames with special characters#2853

Open
salmanrajz wants to merge 2 commits intosherlock-project:masterfrom
salmanrajz:fix/unicode-decode-error-special-chars
Open

fix: handle UnicodeDecodeError on usernames with special characters#2853
salmanrajz wants to merge 2 commits intosherlock-project:masterfrom
salmanrajz:fix/unicode-decode-error-special-chars

Conversation

@salmanrajz
Copy link
Copy Markdown
Contributor

Fixes #2730

Problem

Usernames containing non-ASCII characters (e.g. Émile) crash Sherlock with a UnicodeDecodeError. The exception is raised inside the requests library during redirect handling when a server returns a non-UTF-8 encoded Location header.

UnicodeDecodeError is not a subclass of requests.exceptions.RequestException, so it escapes all existing except blocks in get_response() and propagates up as an unhandled crash.

Fix

Added a catch for UnicodeError (parent of both UnicodeDecodeError and UnicodeEncodeError) in get_response(). Sites that trigger encoding errors are now gracefully reported as Encoding Error instead of crashing the entire scan.

Changes

  • sherlock_project/sherlock.py: Added except UnicodeError handler in get_response()
  • tests/test_unicode.py: Added regression tests for both UnicodeDecodeError and UnicodeEncodeError

Testing

$ python -m pytest tests/test_unicode.py -v
tests/test_unicode.py::test_get_response_handles_unicode_decode_error PASSED
tests/test_unicode.py::test_get_response_handles_unicode_encode_error PASSED

@salmanrajz salmanrajz requested a review from ppfeister as a code owner March 31, 2026 15:53
Fixes sherlock-project#2730. Usernames containing non-ASCII characters (e.g. 'Émile')
can trigger a UnicodeDecodeError inside the requests library during
redirect handling. This exception is not a subclass of
requests.exceptions.RequestException, so it escaped all existing
except blocks in get_response() and crashed the program.

Added a catch for UnicodeError (parent of both UnicodeDecodeError and
UnicodeEncodeError) so these sites are gracefully skipped instead of
crashing the entire scan.

Added regression tests in tests/test_unicode.py.
@salmanrajz salmanrajz force-pushed the fix/unicode-decode-error-special-chars branch from 7adf61b to 4656d95 Compare March 31, 2026 15:58
@salmanrajz
Copy link
Copy Markdown
Contributor Author

CI Note: The tox-lint and docker-build-test checks pass. The tox-matrix failures are all caused by 3 pre-existing broken tests in test_ux.py (test_remove_nsfw, test_nsfw_explicit_selection) that reference Pornhub which appears to have been removed from the site list. These failures are unrelated to this PR.

Our new tests in test_unicode.py pass across all matrix combinations.

Pornhub was added to the remote false_positive_exclusions.txt, causing
test_remove_nsfw and test_nsfw_explicit_selection to fail since the
site gets filtered out before the test runs. Replaced with Xvideos and
Erome which are NSFW-flagged but not excluded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash: UnicodeDecodeError on usernames with special characters

1 participant