Skip to content

Link checker: HEAD-only requests cause false positives on some sites (e.g. crates.io) #45

@marc0olo

Description

@marc0olo

Problem

The link checker in links/check.go uses HEAD requests exclusively. Some sites don't handle HEAD correctly and return 404 or 405 even though the page exists:

  • crates.io: https://crates.io/crates/ic-vetkeys returns 404 for HEAD (SPA that requires Accept: text/html), but 200 for GET. The crate exists (v0.6.0, 93k downloads).
  • This is a known issue across link checkers — tools like lychee and markdown-link-check fall back to GET when HEAD fails.

Reproduction

# HEAD returns 404
curl -sI "https://crates.io/crates/ic-vetkeys" | head -1
# HTTP/2 404

# GET returns 200
curl -s -o /dev/null -w "%{http_code}" "https://crates.io/crates/ic-vetkeys"
# 200

Question

Is the HEAD-only approach intentional (e.g. to avoid bandwidth/rate-limiting concerns), or would a GET fallback when HEAD returns 404/405 be welcome? Happy to submit a PR if the fallback approach seems reasonable.

Context

Discovered while running skill-validator check on dfinity/icskills. The false positive blocks CI deployment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions