Skip to content

Improving perceived DOI registration/publicizing speed. #7393

@qqmyers

Description

@qqmyers

With file PIDs enabled, and with large datasets (100s to 1000s of files), the time to register files at creation/upload (implemented in #7334 ), and to publicize those DOIs during publication, is large enough to be noticeable (simple tests on test servers show something like 2 DOI changes per second so ~10 minutes for 1K files). Given realistic file sizes, it isn't clear whether this is that significant during uploads (presumably 1000 'average' files take longer than 10 minutes to upload for most people?), but it is probably a big part of the wait at publication.

I'm opening this issue to consider options for how the real or perceived performance might be improved. One option would be to make the changes asynchronously, e.g. the way that archiving can be done in a post-publication workflow. This would result in the user seeing file DOIs appear over time, or becoming public over time rather than being assigned/public when the save/publish operation completes. (Actually, since it takes significant time for DataCite to push newly publicized DOIs out to its index servers, there's already a delay between publish and the DOIs being resolvable).

It may also be possible to speed submission by parallelizing the calls to DataCite, or, if those need to be throttled and the delay is partially in Dataverse preparing the metadata to send, just parallelizing the prep steps and putting the DataCite calls in a pool.

Some of things that would help prioritize and refine this issue would be to know how much of a concern this is to installations and to know whether others have experience in sending large numbers of DOI requests to DataCite and or know what performance DataCite supports and whether they have bulk/asynchronous calls we could use.

I may investigate some, but this issue may sit for a while without community input.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions