Skip to content

Switching to the DataCite REST API for retrieving registration metadata #12270

Merged
stevenwinship merged 7 commits intodevelopfrom
12070-datacite-getmetadata
Apr 6, 2026
Merged

Switching to the DataCite REST API for retrieving registration metadata #12270
stevenwinship merged 7 commits intodevelopfrom
12070-datacite-getmetadata

Conversation

@landreev
Copy link
Copy Markdown
Contributor

@landreev landreev commented Mar 31, 2026

What this PR does / why we need it:

This is to address an issue with UTF8 characters when relying on the traditionally used MDS API, which results in unnecessary registration updates. See #12070.

This is a very minimal, proof of concept implementation. It appears to be working for its intended purpose.

Which issue(s) this PR closes:

Special notes for your reviewer:

As discussed during review, a release note has been added telling instances that having a valid REST API url configured is now a requirement. As an extra fail-safe getMetadata() will fall back to MDS if it cannot obtain the metadata via REST API.

We do not have any tests covering DataCiteRESTfullClient and none are added in this PR. The only practical/useful way of testing this functionality I can think of is to make the RestAssured tests rely on a "real" (non-"fake") DataCite authority and test registering real DOIs. I am hesitant however to introduce another dependency on an external service in that suite.

Suggestions on how to test this:

Can be easily tested on any dev. instance. But a real test DataCite authority must be used ("fake" will not do, in other words), as in:

        <jvm-options>-Ddataverse.pid.testdatacite.label=DataCite</jvm-options>
        <jvm-options>-Ddataverse.pid.testdatacite.type=datacite</jvm-options>
        <jvm-options>-Ddataverse.pid.testdatacite.authority=10.70122</jvm-options>
        <jvm-options>-Ddataverse.pid.testdatacite.shoulder=FK2/</jvm-options>
        <jvm-options>-Ddataverse.pid.testdatacite.datacite.mds-api-url=https://mds.test.datacite.org</jvm-options>
        <jvm-options>-Ddataverse.pid.testdatacite.datacite.rest-api-url=https://api.test.datacite.org</jvm-options>
        <jvm-options>-Ddataverse.pid.testdatacite.datacite.username=[REDACTED]</jvm-options>
        <jvm-options>-Ddataverse.pid.testdatacite.datacite.password=[REDACTED]</jvm-options>

reach out directly on slack if you don't have the username/password.

Create a dataset; put something/anything in the description that has UTF8 characters in it. Like the Universidade de Brasília etc. in the issue description.
Publish the dataset. Check on https://doi.test.datacite.org/repositories/gdcc.harvard-test and confirm that the DataCite registration has worked. (Keep in mind that these test DOIs do not redirect using the normal DOI resolver shown on the dataset page)

Enable FINE logging on
edu.harvard.iq.dataverse.pidproviders.doi.datacite.level=FINE
Set <jvm-options>-Ddataverse.feature.only-update-datacite-when-needed=true</jvm-options>, if not present, restart payara.

Testing the "before" case, i.e. the develop branch or 6.10:

run the /modifyRegistrationMetadata api on the dataset.
You will see messages in the log indicating that the metadata needed to be updated (not true!) and that the DOI has been re-registered. There should be messages in the log indicating that the differences between the local metadata and (what it thinks) is registered w/ DataCite are due to the UTF8 characters in the fields.

Testing "after", w/ this PR deployed:

run /modifyRegistrationMetadata
You will see a confirmation in the log that the metadata registered with DataCite is already up-to-date, so there was no need to re-registger.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

DOI metadata from DataCite. This is to address an apparent issue
with UTF8 characters when relying on the MDS API used traditionally.
(#12070)
@github-actions

This comment has been minimized.

landreev added 2 commits April 3, 2026 12:16
Removing the old-style constructors that are no longer needed. #12070
@github-actions

This comment has been minimized.

@landreev
Copy link
Copy Markdown
Contributor Author

landreev commented Apr 3, 2026

Added a detailed "how to test" in the PR description.

@landreev landreev marked this pull request as ready for review April 6, 2026 14:45
@landreev landreev moved this to In Review 🔎 in IQSS Dataverse Project Apr 6, 2026
@github-project-automation github-project-automation Bot moved this from In Review 🔎 to Ready for QA ⏩ in IQSS Dataverse Project Apr 6, 2026
@github-actions

This comment has been minimized.

@landreev
Copy link
Copy Markdown
Contributor Author

landreev commented Apr 6, 2026

(Note that it says that the last Jenkins run failed - as of writing this, Apr. 6, 12:18PM - but that's because I killed that build as unnecessary; as it was triggered by a cosmetic comment change. The last Jenkins test that actually ran, number 3, did pass)

@coveralls
Copy link
Copy Markdown

coveralls commented Apr 6, 2026

Coverage Status

Coverage is 24.839%12070-datacite-getmetadata into develop. No base build found for develop.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:12070-datacite-getmetadata
ghcr.io/gdcc/configbaker:12070-datacite-getmetadata

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@stevenwinship stevenwinship self-assigned this Apr 6, 2026
@stevenwinship stevenwinship moved this from Ready for QA ⏩ to QA ✅ in IQSS Dataverse Project Apr 6, 2026
@stevenwinship stevenwinship merged commit 6658f45 into develop Apr 6, 2026
19 of 20 checks passed
@github-project-automation github-project-automation Bot moved this from QA ✅ to Merged 🚀 in IQSS Dataverse Project Apr 6, 2026
@stevenwinship stevenwinship deleted the 12070-datacite-getmetadata branch April 6, 2026 19:29
@stevenwinship stevenwinship removed their assignment Apr 6, 2026
@scolapasta scolapasta moved this from Merged 🚀 to Done 🧹 in IQSS Dataverse Project Apr 7, 2026
@pdurbin pdurbin added this to the 6.11 milestone Apr 7, 2026
@landreev landreev restored the 12070-datacite-getmetadata branch April 14, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done 🧹

Development

Successfully merging this pull request may close these issues.

DataCite implementation: DataCiteRESTfullClient.getMetadata() relies on the MDS API which appears to garble UTF8 characters

6 participants