Skip to content

feat: add --check-sites command for batch site validation#2859

Open
mvanhorn wants to merge 1 commit intosherlock-project:masterfrom
mvanhorn:feat/check-sites
Open

feat: add --check-sites command for batch site validation#2859
mvanhorn wants to merge 1 commit intosherlock-project:masterfrom
mvanhorn:feat/check-sites

Conversation

@mvanhorn
Copy link
Copy Markdown

@mvanhorn mvanhorn commented Apr 3, 2026

Summary

Adds a --check-sites flag that batch-validates every site in data.json by testing each site's username_claimed value against the existing detection logic. Reports broken, timed out, and healthy sites with a summary table.

Why this matters

Sites break constantly: domains expire, WAFs change, pages restructure. Right now there's no way to proactively check which sites are still working. Maintainers rely on individual user reports.

Source Evidence
#2818 False positive: cracked.sh
#2815 False positive: 1337x.to
#2808 False negative: omg.lol
#2805 False negative: Patched
#2804 False negative: GNOME VCS
#2803 False negative: BoardGameGeek

Six open FP/FN reports in recent months. A batch validation command catches these proactively.

Changes

  • Added check_sites() function in sherlock.py that iterates over all sites, runs the existing sherlock() detection on each site's username_claimed user, and classifies results as OK/BROKEN/TIMEOUT
  • Added --check-sites argparse flag. Changed username positional to nargs="*" with a custom validation that requires either usernames or --check-sites
  • Uses the base QueryNotify class (silent) to suppress per-site output during health checks
  • Exit code 1 when broken sites detected (CI-friendly)
  • Updated test in test_ux.py to match the new nargs="*" behavior for the unrecognized-argument edge case

Demo

check-sites demo

Testing

  • Existing pytest suite passes (21 passed, 4 deselected)
  • Manual testing with --check-sites --site GitHub --site Reddit --site Twitter --timeout 15 shows clean output
  • Normal username search unaffected

The check_sites() function reuses the existing sherlock() function and SherlockFuturesSession for parallel requests, so it runs against the same detection logic users already rely on.

This contribution was developed with AI assistance (Claude Code).

Adds a --check-sites flag that validates every site in data.json by
testing each site's username_claimed value against the existing
detection logic. Reports broken, timed out, and healthy sites.

This helps maintainers proactively identify broken sites instead of
relying on individual user reports for false positives/negatives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mvanhorn mvanhorn requested a review from ppfeister as a code owner April 3, 2026 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant