Skip to content

CDX API doesn't return latest results with fastLatest=true when using matchType=prefix #299

@onnimonni

Description

@onnimonni

Hey!

I noticed that the fastLatest=true returns latest results only if url parameter doesn't contain paths (so when it's using domain matching?).

How to reproduce

According to the docs this request with url=archive.org/web/* should return newest results but it doesn't:

$ curl 'http://web.archive.org/cdx/search/cdx?url=archive.org/web/*&fastLatest=true&limit=-5&fl=timestamp,urlkey,original'
20200122042803 org,archive)/webzine/taxonomy/term/144/0/feed http://www.archive.org:80/webzine/taxonomy/term/144/0/feed
20200221171811 org,archive)/webzine/taxonomy/term/144/0/feed http://archive.org:80/webzine/taxonomy/term/144/0/feed
20200221171811 org,archive)/webzine/taxonomy/term/144/0/feed http://www.archive.org:80/webzine/taxonomy/term/144/0/feed
20231123153412 org,archive)/webzon-livetv https://archive.org/webzon-livetv
20231123174327 org,archive)/webzon-livetv https://archive.org/webzon-livetv

When I use url=archive.org instead it works exactly right:

$ curl 'http://web.archive.org/cdx/search/cdx?url=archive.org&fastLatest=true&limit=-5&fl=timestamp,urlkey,original'
20251128081228 org,archive)/ https://archive.org/
20251202025429 org,archive)/ https://archive.org/
20251206001055 org,archive)/ https://archive.org/
20251206180030 org,archive)/ https://archive.org/
20251208072934 org,archive)/ https://archive.org/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions