IQSS/7068 Reserve File Pids#7334
Conversation
(analogous to dataset pid register if needed) Conflicts: src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java src/main/java/propertyFiles/Bundle.properties
|
Adding this to code review, because I think it is desirable and non-controversial functionality for those who have turned on File PIDs. I'd be interested in thoughts from the Harvard Dataverse team as we've seen some issues with long running jobs around PID registration, and I wonder if this would potentially make it worse since there's now two steps (reservation and flipping to public) instead of one. |
pdurbin
left a comment
There was a problem hiding this comment.
Overall, this looks fine. I'm only choosing "request changes" because I think we need a release note. I'm also suggesting a couple code changes but I don't see either one as showstoppers.
| String currentGlobalIdProtocol = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Protocol, ""); | ||
| String currentGlobalAuthority = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Authority, ""); | ||
| String dataFilePIDFormat = ctxt.settings().getValueForKey(SettingsServiceBean.Key.DataFilePIDFormat, "DEPENDENT"); | ||
| boolean shouldRegister = ctxt.systemConfig().isFilePIDsEnabled() // We use file PIDs |
There was a problem hiding this comment.
I thinking about how we iterate though all the files below. Couldn't we short circuit that iteration and simply return early if file level PIDs are not enabled?
There was a problem hiding this comment.
pulled the test of shouldRegister out of the loop.
| logger.fine("IsFilePIDsEnabled: " + ctxt.systemConfig().isFilePIDsEnabled()); | ||
| logger.fine("RegWhenPub: " + !idServiceBean.registerWhenPublished()); | ||
| logger.fine("OK provider: " + ((currentGlobalIdProtocol.equals(protocol) && currentGlobalAuthority.equals(authority)) // the dataset PID is a protocol/authority Dataverse can create new PIDs in | ||
| || dataFilePIDFormat.equals("INDEPENDENT"))); |
There was a problem hiding this comment.
I know it's more code but rather than having hardcoded strings like "INDEPENDENT" my preference is to reference the place where the string comes from like this: SystemConfig.DataFilePIDFormat.INDEPENDENT.toString()
That way if you do a "find usages" in your editor, it easily comes up. And if we ever change the string (which is doubtful), it's easier to find.
This might just be a me think. Please feel free to get opinions from others.
There was a problem hiding this comment.
why have constants if you don't use them - changed "INDEPENDENT" and "DEPENDENT" everywhere
|
Sorry, sorry, I just noticed merge conflicts. @qqmyers can you please fix? |
IQSS/7068-Reserve_file_PIDs Conflicts: src/main/java/edu/harvard/iq/dataverse/engine/command/impl/FinalizeDatasetPublicationCommand.java
|
Thanks @qqmyers and @pdurbin for talking with me about this. I made release note additions and API doc edits because I thought we recommended reservation for file PIDs similar to how we did reservations for Dataset PIDs when we moved to the new pre-publish verification workflow in 5.0 (https://github.com/IQSS/dataverse/blob/develop/doc/release-notes/5.0-release-notes.md#dataverse-installations-using-datacite-upgrade-action-recommended). I backed these changes out as we don't need to reserve PIDs for existing files and the API does not currently support this. This does not result in a failure during publishing, it just means that those File PIDs will be registered at that time. |
Conflicts: src/main/java/edu/harvard/iq/dataverse/engine/command/impl/AbstractDatasetCommand.java src/main/java/propertyFiles/Bundle.properties
IQSS/7068-Reserve_file_PIDs
| * @param datafileId | ||
| * @param storageLocation | ||
| * @return {@code true} iff the global identifier is unique. | ||
| */ |
There was a problem hiding this comment.
Not related to the PR - this just doesn't fit the method below which doesn't involve PIDs
|
Looks good overall. What is the back story, was there a specific reason why it got stuck and sat for 4 years? Was it just because of the concerns about file uploads becoming slower? |
IQSS/7068-Reserve_file_PIDs
|
As far as I know, performance was the only potential issue - could still be an issue (although QDR uses it), but easier to manage overall since file PIDs can be enabled/disabled as desired now. |
landreev
left a comment
There was a problem hiding this comment.
(re-)approving, in order to move the PR along into "ready for QA".
What this PR does / why we need it: Reserves file PIDs at creation time, if enabled, analogous to how datasets are now handled.
Which issue(s) this PR closes:
Closes #7068
Special notes for your reviewer:
Suggestions on how to test this: Should be able to see file PIDs/DOIs after file creation in the datacite console (e.g. doi.test.datacite.org). Deleting a file, or the dataset, prior to publication should remove the draft PID. Nominally this code calls whatever PID provider you use, so it should work with other providers capable of reserving PIDs. I'm not sure it is necessary to test them though - if the PID provider works, and this code works with DOIs, it should also work for those others.
Does this PR introduce a user interface change? If mockups are available, please link/include them here: no
Is there a release notes update needed for this change?:
Additional documentation: