Skip to content

504 Gateway Time-out When Querying Recent CDX Entries for Large Websites #285

@ixalodecte

Description

@ixalodecte

I am encountering a 504 Gateway Time-out error when attempting to retrieve recent CDX entries for a large websites using the following URL:

http://web.archive.org/cdx/search/cdx?url=https://www.nih.gov/&from=20250301&matchType=prefix&limit=10

However, when I search using an older date (e.g., http://web.archive.org/cdx/search/cdx?url=https://www.nih.gov/&from=20240301&matchType=prefix&limit=10), or when I simply remove the from parameter, I receive results relatively quickly.

Note that there is no such problem when querying for a smaller website, or when setting matchType=exact. It makes sense because there are fewer entries to manage. However, I find the behavior with dates more strange. Is there a known issue with querying recent dates, or am I doing something wrong?

Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions