Skip to content

feat: add duplicate entry detection to site validation CI#2851

Open
mvanhorn wants to merge 1 commit intosherlock-project:masterfrom
mvanhorn:osc/2653-duplicate-site-detection
Open

feat: add duplicate entry detection to site validation CI#2851
mvanhorn wants to merge 1 commit intosherlock-project:masterfrom
mvanhorn:osc/2653-duplicate-site-detection

Conversation

@mvanhorn
Copy link
Copy Markdown

Summary

Adds duplicate entry detection to validate_modified_targets.yml. When new sites are added to data.json, the workflow now checks for:

  1. Case-insensitive name duplicates - e.g., "GitHub" vs "github"
  2. URL-based duplicates - new entries sharing the same urlMain domain as an existing entry

Changes

A new "Check for duplicate entries" step runs inline Python between "Discover modified targets" and "Validate remote manifest". It compares new keys against the full existing manifest. Findings are written to duplicate_findings.md and prepended to the validation summary comment using the existing #### heading style.

Improvements over the previous attempt (#2658):

  • Integrated into existing summary comment (no separate bot comment)
  • Uses #### headings matching existing format
  • Errors exit cleanly with code 0 (non-blocking)
  • No template literal injection risk

Testing

The duplicate check is pure Python against two JSON files. It runs independently of the pytest validation steps and won't block PRs if it fails. The workflow only triggers on PRs that modify data.json.

Fixes #2653

This contribution was developed with AI assistance (Claude Code).

Adds a new step to the modified target validation workflow that checks
for duplicate entries when new sites are added to data.json. Detects
both case-insensitive name matches and shared urlMain domains against
existing entries. Findings are prepended to the validation summary
comment using the existing #### heading style.

Fixes sherlock-project#2653

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check if new entries duplicate existing in site validation ci

1 participant