Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
9baf2d5
File Metadata Update
stevenwinship Apr 18, 2025
6d871aa
new version checking
stevenwinship Apr 18, 2025
fd800fc
fix test
stevenwinship Apr 18, 2025
dea4904
Merge branch 'develop' into 11392-edit-file-metadata-empty-values
stevenwinship Apr 21, 2025
8dcb065
fix test
stevenwinship Apr 22, 2025
2fce5fd
add to test
stevenwinship Apr 22, 2025
26f07b7
adding info for debugging jenkins test failure
stevenwinship Apr 22, 2025
10939ce
remove jenkins debug
stevenwinship Apr 23, 2025
f5ddcff
per review comments
stevenwinship Apr 24, 2025
1615fb9
per review comments
stevenwinship Apr 24, 2025
8a57fc8
refactor to use last update timestamp instead of version number
stevenwinship Apr 25, 2025
6fedf6e
comment on data/timestamp compare
stevenwinship Apr 25, 2025
eaf49a9
refactor so both datafiles and datasets validate update timestamp the…
stevenwinship Apr 29, 2025
1320d51
refactor optional qp name from sourceInternalVersionTimestamp to sour…
stevenwinship Apr 30, 2025
b61bb1c
Merge branch 'develop' into 11392-edit-file-metadata-empty-values
stevenwinship May 29, 2025
8ef92d2
Merge branch 'develop' into 11392-edit-file-metadata-empty-values
stevenwinship Jun 24, 2025
7563cf8
Suggested doc edits (#11590)
qqmyers Jun 24, 2025
7a7a84f
Merge branch 'develop' into 11392-edit-file-metadata-empty-values
stevenwinship Jun 24, 2025
0eaca6c
Merge branch 'develop' into 11392-edit-file-metadata-empty-values
stevenwinship Jun 26, 2025
0c72a89
remove unused bundle entry
stevenwinship Jul 14, 2025
509e55a
update changelog to move this PR to 6.8
stevenwinship Jul 14, 2025
c465180
Merge branch 'develop' into 11392-edit-file-metadata-empty-values
stevenwinship Jul 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/release-notes/11392-edit-file-metadata-empty-values.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Edit File Metadata empty values should clear data

Previously the API POST /files/{id}/metadata would ignore fields with empty values. Now the API updates the fields with the empty values essentially clearing the data. Missing fields will still be ignored.

An optional query parameter (sourceInternalVersionTimestamp) was added to ensure the metadata update doesn't overwrite stale data.

See also [the guides](https://dataverse-guide--11359.org.readthedocs.build/en/11359/api/native-api.html#updating-file-metadata), #11392, and #11359.
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/api/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ v6.7
----

- An undocumented :doc:`search` parameter called "show_my_data" has been removed. It was never exercised by tests and is believed to be unused. API users should use the :ref:`api-mydata` API instead.
- For POST /api/files/{id}/metadata passing an empty string (“description”:””) or array (“categories”:[]) will no longer be ignored. Empty fields will now clear out the values in the file's metadata. To ignore the fields simply do not include them in the Json string.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we trying to get this in 6.7? If not, this should be placed under "v6.8" in the changelog.

sourceInternalVersionNumber is gone right? Should we mention that?

Also, I would suggest replacing the smart quotes with straight quotes and writing "JSON" in all caps.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


v6.6
----
Expand Down
7 changes: 5 additions & 2 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4612,6 +4612,8 @@ Updating File Metadata

Updates the file metadata for an existing file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the new metadata. No metadata from the previous version of this file will be persisted, so if you want to update a specific field first get the json with the above command and alter the fields you want.

Optional Parameter for verifying that the Dataset Version being edited is the latest version can be added &sourceInternalVersionTimestamp=datetime(in format: "yyyy-MM-dd'T'HH:mm:ss'Z'"). This is to prevent stale data from being edited. The value for sourceInternalVersionTimestamp comes from ``lastUpdateTime`` in the response to get $SERVER_URL/api/files/$ID API call

A curl example using an ``ID``

.. code-block:: bash
Expand Down Expand Up @@ -4639,18 +4641,19 @@ A curl example using a ``PERSISTENT_ID``
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
export UPDATE_TIME=2025-04-25T13:58:28Z

curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
-F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"dataFileTags":["Survey"],"restrict":false}' \
"$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID"
"$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID&sourceInternalVersionTimestamp=$UPDATE_TIME"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
-F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"dataFileTags":["Survey"],"restrict":false}' \
"https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000"
"https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000&sourceInternalVersionTimestamp=2025-04-25T13:58:28Z"

Note: To update the 'tabularTags' property of file metadata, use the 'dataFileTags' key when making API requests. This property is used to update the 'tabularTags' of the file metadata.

Expand Down
17 changes: 17 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/api/AbstractApiBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import edu.harvard.iq.dataverse.search.savedsearch.SavedSearchServiceBean;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
import edu.harvard.iq.dataverse.util.BundleUtil;
import edu.harvard.iq.dataverse.util.DateUtil;
import edu.harvard.iq.dataverse.util.FileUtil;
import edu.harvard.iq.dataverse.util.SystemConfig;
import edu.harvard.iq.dataverse.util.json.JsonParser;
Expand All @@ -51,6 +52,7 @@

import java.io.InputStream;
import java.net.URI;
import java.time.Instant;
import java.util.*;
import java.util.concurrent.Callable;
import java.util.logging.Level;
Expand Down Expand Up @@ -451,6 +453,21 @@ protected void validateInternalVersionNumberIsNotOutdated(Dataset dataset, int i
}
}

protected void validateInternalTimestampIsNotOutdated(DataFile dataFile, String sourceInternalVersionTimestamp) throws WrappedResponse {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we go this route, I think we'll want to replace the method above as well - perhaps convenience methods that take a file or dataset calling one method having a datasetversion param that has the main logic?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting overloading
validateInternalVersion(Dataset ds, String v)
validateInternalVersion(Datafile fd, String v)

or a single method
validateInternalVersion(DvObject dvo, String v)

Copy link
Copy Markdown
Contributor Author

@stevenwinship stevenwinship Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting changing the dataset method to also compare the timestamp?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly suggesting that we have a validateInternalVersion(DatasetVersion dv, String v) method - with the logic to get that datasetversion from the dataset or file outside that. I hadn't thought about whether that's two methods or one for dvObject (or whether it would be easier to get the right DatasetVersion back in the calling methods so we don't have to find it again for this validation test).

I am also suggesting that whatever we decide to do should be done for both the file and dataset api calls, so ~yes, assuming we're agreed on using the timestamp, it would mean removing the existing dataset check and using your new code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think refactoring the dataset api should be done in a separate issue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers Since the previous issue that introduced the sourceInternalVersionNumber hasn't been released I will refactor it with this issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenwinship Please, after you are done with this PR let me know if there are any breaking changes we will need to address regarding the dataset metadata update endpoint usage, thanks 🫡

Copy link
Copy Markdown
Contributor Author

@stevenwinship stevenwinship Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@g-saracca Ok. Just so you know the qp sourceInternalVersionNumber is removed and replaced with sourceLastUpdateTime. So to update you'll need to send the lastUpdateTime instead of the internalVersionNumber

Copy link
Copy Markdown
Contributor Author

@stevenwinship stevenwinship Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look at the new qp I realize I don't like the name. I will be refactoring the qp name from sourceInternalVersionTimestamp to sourceLastUpdateTime to match the json attribute

Date date = sourceInternalVersionTimestamp != null ? DateUtil.parseDate(sourceInternalVersionTimestamp, "yyyy-MM-dd'T'HH:mm:ss'Z'") : null;
if (date == null) {
throw new WrappedResponse(
badRequest(BundleUtil.getStringFromBundle("jsonparser.error.parsing.date", Collections.singletonList(sourceInternalVersionTimestamp)))
);
}
Instant instant = date.toInstant();
Comment thread
stevenwinship marked this conversation as resolved.
if (dataFile.getFileMetadata().getDatasetVersion().getLastUpdateTime().toInstant().getEpochSecond() != instant.getEpochSecond()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a > relationship be used here? Is there ever a case where the exact timestamp isn't known? E.g. some edit operation (like changing a file restriction or editing terms) where you'd know the local time of the api call but not the exact db timestamp and would want to avoid an extra call to get it (while still protecting from someone else having changed it in the next few minutes)? I'm torn - anything but != opens a small hole, but the simplification might be worth the small risk (although there are possible time skew issues).

Copy link
Copy Markdown
Contributor Author

@stevenwinship stevenwinship Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think if you were restricting access you would not need to send the timestamp. I can't image one person restricting and another person unrestricting.

If we don't know the timestamp then we need to have a configuration value to say an update can't be made within X minutes of another update. I think it's normal to get the data before updating. Any ui would do this and I don't think a straight api call to make a blind change is something we should condone.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we allow blind by making this parameter optional.
FWIW: This whole idea of enabling the SPA to work like the JSF and maintain consistency is new and we haven't really discussed whether it means we always need to be returning the timestamp on any PUT/POST call, or if there's a better approach, or if we want some hybrid. The JSF ~works/at least detects the problem because the bean for the view holds the state and, as soon as we make any change, we cause the view to be updated. The SPA could work the same way, but it might be nice if we could allow changes to restricting a file or editing it's metadata without requiring the whole dataset/dataset version to be reloaded.

throw new WrappedResponse(
badRequest(BundleUtil.getStringFromBundle("abstractApiBean.error.datafileInternalVersionTimestampIsOutdated", Collections.singletonList(sourceInternalVersionTimestamp)))
);
}
}

protected DataFile findDataFileOrDie(String id) throws WrappedResponse {
DataFile datafile;
if (id.equals(PERSISTENT_ID_KEY)) {
Expand Down
11 changes: 9 additions & 2 deletions src/main/java/edu/harvard/iq/dataverse/api/Files.java
Original file line number Diff line number Diff line change
Expand Up @@ -410,8 +410,8 @@ public Response deleteFileInDataset(@Context ContainerRequestContext crc, @PathP
@AuthRequired
@Path("{id}/metadata")
public Response updateFileMetadata(@Context ContainerRequestContext crc, @FormDataParam("jsonData") String jsonData,
@PathParam("id") String fileIdOrPersistentId
) throws DataFileTagException, CommandException {
@PathParam("id") String fileIdOrPersistentId, @QueryParam("sourceInternalVersionTimestamp") String sourceInternalVersionTimestamp
) throws CommandException {

FileMetadata upFmd = null;

Expand All @@ -429,6 +429,13 @@ public Response updateFileMetadata(@Context ContainerRequestContext crc, @FormDa
return error(BAD_REQUEST, "Error attempting get the requested data file.");
}

if (sourceInternalVersionTimestamp != null) {
try {
validateInternalTimestampIsNotOutdated(df, sourceInternalVersionTimestamp);
} catch (WrappedResponse wr) {
return wr.getResponse();
}
}

//You shouldn't be trying to edit a datafile that has been replaced
List<Long> result = em.createNamedQuery("DataFile.findDataFileThatReplacedId", Long.class)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -194,46 +194,28 @@ public boolean getTabIngest() {
return this.tabIngest;
}

public boolean hasCategories(){
if ((categories == null)||(this.categories.isEmpty())){
return false;
}
return true;
public boolean hasCategories() {
return categories != null;
}

public boolean hasFileDataTags(){
if ((dataFileTags == null)||(this.dataFileTags.isEmpty())){
return false;
}
return true;
public boolean hasFileDataTags() {
return dataFileTags != null;
}

public boolean hasDescription(){
if ((description == null)||(this.description.isEmpty())){
return false;
}
return true;
return description != null;
}

public boolean hasDirectoryLabel(){
if ((directoryLabel == null)||(this.directoryLabel.isEmpty())){
return false;
}
return true;
public boolean hasDirectoryLabel() {
return directoryLabel != null;
}

public boolean hasLabel(){
if ((label == null)||(this.label.isEmpty())){
return false;
}
return true;
public boolean hasLabel() {
return label != null;
}

public boolean hasProvFreeform(){
if ((provFreeForm == null)||(this.provFreeForm.isEmpty())){
return false;
}
return true;
public boolean hasProvFreeform() {
return provFreeForm != null;
}

public boolean hasStorageIdentifier() {
Expand All @@ -245,15 +227,15 @@ public String getStorageIdentifier() {
}

public boolean hasFileName() {
return ((fileName!=null)&&(!fileName.isEmpty()));
return fileName != null;
}

public String getFileName() {
return fileName;
}

public boolean hasMimetype() {
return ((mimeType!=null)&&(!mimeType.isEmpty()));
return mimeType != null;
}

public String getMimeType() {
Expand All @@ -266,7 +248,7 @@ public void setCheckSum(String checkSum, ChecksumType type) {
}

public boolean hasCheckSum() {
return ((checkSumValue!=null)&&(!checkSumValue.isEmpty()));
return checkSumValue != null;
}

public String getCheckSum() {
Expand Down Expand Up @@ -294,15 +276,10 @@ public void setFileSize(long fileSize) {
* @param tags
*/
public void setCategories(List<String> newCategories) {

if (newCategories != null) {
newCategories = Util.removeDuplicatesNullsEmptyStrings(newCategories);
if (newCategories.isEmpty()) {
newCategories = null;
}
this.categories = newCategories;
}

this.categories = newCategories;
}

/**
Expand Down Expand Up @@ -495,27 +472,20 @@ private void addFileDataTags(List<String> potentialTags) throws DataFileTagExcep
}

potentialTags = Util.removeDuplicatesNullsEmptyStrings(potentialTags);

if (potentialTags.isEmpty()){
return;
}


// Make a new list
this.dataFileTags = new ArrayList<>();
List<String> newList = new ArrayList<>();

// Add valid potential tags to the list
for (String tagToCheck : potentialTags){
if (DataFileTag.isDataFileTag(tagToCheck)){
this.dataFileTags.add(tagToCheck);
newList.add(tagToCheck);
}else{
String errMsg = BundleUtil.getStringFromBundle("file.addreplace.error.invalid_datafile_tag");
throw new DataFileTagException(errMsg + " [" + tagToCheck + "]. Please use one of the following: " + DataFileTag.getListofLabelsAsString());
}
}
// Shouldn't happen....
if (dataFileTags.isEmpty()){
dataFileTags = null;
}
this.dataFileTags = newList;
}

private void msg(String s){
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -873,7 +873,8 @@ public static JsonObjectBuilder json(DataFile df, FileMetadata fileMetadata, boo
.add("tabularData", df.isTabularData())
.add("tabularTags", getTabularFileTags(df))
.add("creationDate", df.getCreateDateFormattedYYYYMMDD())
.add("publicationDate", df.getPublicationDateFormattedYYYYMMDD());
.add("publicationDate", df.getPublicationDateFormattedYYYYMMDD())
.add("lastUpdateTime", format(fileMetadata.getDatasetVersion().getLastUpdateTime()));
Dataset dfOwner = df.getOwner();
if (dfOwner != null) {
builder.add("fileAccessRequest", dfOwner.isFileAccessRequest());
Expand Down
1 change: 1 addition & 0 deletions src/main/java/propertyFiles/Bundle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -3205,3 +3205,4 @@ updateDatasetFieldsCommand.api.processDatasetUpdate.parseError=Error parsing dat

#AbstractApiBean.java
abstractApiBean.error.datasetInternalVersionNumberIsOutdated=Dataset internal version number {0} is outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this bundle entry? It seems to be unused now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

abstractApiBean.error.datafileInternalVersionTimestampIsOutdated=File Metadata internal version timestamp {0} is outdated
Loading