Skip to content

Commit d40ce32

Browse files
authored
Merge pull request #10781 from IQSS/10623-globus-improvements
Improved handling of Globus uploads (experimental async framework)
2 parents ea02478 + 682c89f commit d40ce32

27 files changed

Lines changed: 1146 additions & 313 deletions
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
A new alternative implementation of Globus polling during upload data transfers has been added in this release. This experimental framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. See `globus-use-experimental-async-framework` under [Feature Flags](https://dataverse-guide--10781.org.readthedocs.build/en/10781/installation/config.html#feature-flags) and [dataverse.files.globus-monitoring-server](https://dataverse-guide--10781.org.readthedocs.build/en/10781/installation/config.html#dataverse-files-globus-monitoring-server) in the Installation Guide. See also #10623 and #10781.

doc/sphinx-guides/source/developers/big-data-support.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,3 +187,5 @@ As described in that document, Globus transfers can be initiated by choosing the
187187
An overview of the control and data transfer interactions between components was presented at the 2022 Dataverse Community Meeting and can be viewed in the `Integrations and Tools Session Video <https://youtu.be/3ek7F_Dxcjk?t=5289>`_ around the 1 hr 28 min mark.
188188

189189
See also :ref:`Globus settings <:GlobusSettings>`.
190+
191+
An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`.

doc/sphinx-guides/source/developers/globus-api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,8 @@ As the transfer can take significant time and the API call is asynchronous, the
185185

186186
Once the transfer completes, Dataverse will remove the write permission for the principal.
187187

188+
An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This new framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`.
189+
188190
Note that when using a managed endpoint that uses the Globus S3 Connector, the checksum should be correct as Dataverse can validate it. For file-based endpoints, the checksum should be included if available but Dataverse cannot verify it.
189191

190192
In the remote/reference case, where there is no transfer to monitor, the standard /addFiles API call (see :ref:`direct-add-to-dataset-api`) is used instead. There are no changes for the Globus case.

doc/sphinx-guides/source/installation/config.rst

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3312,6 +3312,13 @@ The email for your institution that you'd like to appear in bag-info.txt. See :r
33123312

33133313
Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_BAGIT_SOURCEORG_EMAIL``.
33143314

3315+
.. _dataverse.files.globus-monitoring-server:
3316+
3317+
dataverse.files.globus-monitoring-server
3318+
++++++++++++++++++++++++++++++++++++++++
3319+
3320+
This setting is required in conjunction with the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`). Setting it to true designates the Dataverse instance to serve as the dedicated polling server. It is needed so that the new framework can be used in a multi-node installation.
3321+
33153322
.. _feature-flags:
33163323

33173324
Feature Flags
@@ -3348,7 +3355,10 @@ please find all known feature flags below. Any of these flags can be activated u
33483355
- Removes the reason field in the `Publish/Return To Author` dialog that was added as a required field in v6.2 and makes the reason an optional parameter in the :ref:`return-a-dataset` API call.
33493356
- ``Off``
33503357
* - disable-dataset-thumbnail-autoselect
3351-
- Turns off automatic selection of a dataset thumbnail from image files in that dataset. When set to ``On``, a user can still manually pick a thumbnail image, or upload a dedicated thumbnail image.
3358+
- Turns off automatic selection of a dataset thumbnail from image files in that dataset. When set to ``On``, a user can still manually pick a thumbnail image or upload a dedicated thumbnail image.
3359+
- ``Off``
3360+
* - globus-use-experimental-async-framework
3361+
- Activates a new experimental implementation of Globus polling of ongoing remote data transfers that does not rely on the instance staying up continuously for the duration of the transfers and saves the state information about Globus upload requests in the database. Added in v6.4. Affects :ref:`:GlobusPollingInterval`. Note that the JVM option :ref:`dataverse.files.globus-monitoring-server` described above must also be enabled on one (and only one, in a multi-node installation) Dataverse instance.
33523362
- ``Off``
33533363

33543364
**Note:** Feature flags can be set via any `supported MicroProfile Config API source`_, e.g. the environment variable
@@ -4828,10 +4838,12 @@ The list of parent dataset field names for which the LDN Announce workflow step
48284838

48294839
The URL where the `dataverse-globus <https://github.com/scholarsportal/dataverse-globus>`_ "transfer" app has been deployed to support Globus integration. See :ref:`globus-support` for details.
48304840

4841+
.. _:GlobusPollingInterval:
4842+
48314843
:GlobusPollingInterval
48324844
++++++++++++++++++++++
48334845

4834-
The interval in seconds between Dataverse calls to Globus to check on upload progress. Defaults to 50 seconds. See :ref:`globus-support` for details.
4846+
The interval in seconds between Dataverse calls to Globus to check on upload progress. Defaults to 50 seconds (or to 10 minutes, when the ``globus-use-experimental-async-framework`` feature flag is enabled). See :ref:`globus-support` for details.
48354847

48364848
:GlobusSingleFileTransfer
48374849
+++++++++++++++++++++++++

src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -412,12 +412,20 @@ public boolean checkDatasetLock(Long datasetId) {
412412
List<DatasetLock> lock = lockCounter.getResultList();
413413
return lock.size()>0;
414414
}
415-
415+
416+
public List<DatasetLock> getLocksByDatasetId(Long datasetId) {
417+
TypedQuery<DatasetLock> locksQuery = em.createNamedQuery("DatasetLock.getLocksByDatasetId", DatasetLock.class);
418+
locksQuery.setParameter("datasetId", datasetId);
419+
return locksQuery.getResultList();
420+
}
421+
416422
public List<DatasetLock> getDatasetLocksByUser( AuthenticatedUser user) {
417423

418424
return listLocks(null, user);
419425
}
420426

427+
// @todo: we'll be better off getting rid of this method and using the other
428+
// version of addDatasetLock() (that uses datasetId instead of Dataset).
421429
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
422430
public DatasetLock addDatasetLock(Dataset dataset, DatasetLock lock) {
423431
lock.setDataset(dataset);
@@ -467,6 +475,7 @@ public DatasetLock addDatasetLock(Long datasetId, DatasetLock.Reason reason, Lon
467475
* is {@code aReason}.
468476
* @param dataset the dataset whose locks (for {@code aReason}) will be removed.
469477
* @param aReason The reason of the locks that will be removed.
478+
* @todo this should probably take dataset_id, not a dataset
470479
*/
471480
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
472481
public void removeDatasetLocks(Dataset dataset, DatasetLock.Reason aReason) {

src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2127,8 +2127,12 @@ public void handleFileUpload(FileUploadEvent event) throws IOException {
21272127
}
21282128

21292129
/**
2130-
* Using information from the DropBox choose, ingest the chosen files
2131-
* https://www.dropbox.com/developers/dropins/chooser/js
2130+
* External, aka "Direct" Upload.
2131+
* The file(s) have been uploaded to physical storage (such as S3) directly,
2132+
* this call is to create and add the DataFiles to the Dataset on the Dataverse
2133+
* side. The method does NOT finalize saving the datafiles in the database -
2134+
* that will happen when the user clicks 'Save', similar to how the "normal"
2135+
* uploads are handled.
21322136
*
21332137
* @param event
21342138
*/
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
package edu.harvard.iq.dataverse;
2+
3+
import jakarta.persistence.Column;
4+
import jakarta.persistence.Index;
5+
import jakarta.persistence.NamedQueries;
6+
import jakarta.persistence.NamedQuery;
7+
import jakarta.persistence.Table;
8+
import java.io.Serializable;
9+
import jakarta.persistence.Entity;
10+
import jakarta.persistence.GeneratedValue;
11+
import jakarta.persistence.GenerationType;
12+
import jakarta.persistence.Id;
13+
14+
/**
15+
*
16+
* @author landreev
17+
*
18+
* The name of the class is provisional. I'm open to better-sounding alternatives,
19+
* if anyone can think of any.
20+
* But I wanted to avoid having the word "Globus" in the entity name. I'm adding
21+
* it specifically for the Globus use case. But I'm guessing there's a chance
22+
* this setup may come in handy for other types of datafile uploads that happen
23+
* externally. (?)
24+
*/
25+
@NamedQueries({
26+
@NamedQuery(name = "ExternalFileUploadInProgress.deleteByTaskId",
27+
query = "DELETE FROM ExternalFileUploadInProgress f WHERE f.taskId=:taskId"),
28+
@NamedQuery(name = "ExternalFileUploadInProgress.findByTaskId",
29+
query = "SELECT f FROM ExternalFileUploadInProgress f WHERE f.taskId=:taskId")})
30+
@Entity
31+
@Table(indexes = {@Index(columnList="taskid")})
32+
public class ExternalFileUploadInProgress implements Serializable {
33+
34+
private static final long serialVersionUID = 1L;
35+
@Id
36+
@GeneratedValue(strategy = GenerationType.IDENTITY)
37+
private Long id;
38+
39+
public Long getId() {
40+
return id;
41+
}
42+
43+
public void setId(Long id) {
44+
this.id = id;
45+
}
46+
47+
/**
48+
* Rather than saving various individual fields defining the datafile,
49+
* which would essentially replicate the DataFile table, we are simply
50+
* storing the full json record as passed to the API here.
51+
*/
52+
@Column(columnDefinition = "TEXT", nullable=false)
53+
private String fileInfo;
54+
55+
/**
56+
* This is Globus-specific task id associated with the upload in progress
57+
*/
58+
@Column(nullable=false)
59+
private String taskId;
60+
61+
public ExternalFileUploadInProgress() {
62+
}
63+
64+
public ExternalFileUploadInProgress(String taskId, String fileInfo) {
65+
this.taskId = taskId;
66+
this.fileInfo = fileInfo;
67+
}
68+
69+
public String getFileInfo() {
70+
return fileInfo;
71+
}
72+
73+
public void setFileInfo(String fileInfo) {
74+
this.fileInfo = fileInfo;
75+
}
76+
77+
public String getTaskId() {
78+
return taskId;
79+
}
80+
81+
public void setTaskId(String taskId) {
82+
this.taskId = taskId;
83+
}
84+
85+
@Override
86+
public int hashCode() {
87+
int hash = 0;
88+
hash += (id != null ? id.hashCode() : 0);
89+
return hash;
90+
}
91+
92+
@Override
93+
public boolean equals(Object object) {
94+
// TODO: Warning - this method won't work in the case the id fields are not set
95+
if (!(object instanceof ExternalFileUploadInProgress)) {
96+
return false;
97+
}
98+
ExternalFileUploadInProgress other = (ExternalFileUploadInProgress) object;
99+
if ((this.id == null && other.id != null) || (this.id != null && !this.id.equals(other.id))) {
100+
return false;
101+
}
102+
return true;
103+
}
104+
105+
@Override
106+
public String toString() {
107+
return "edu.harvard.iq.dataverse.ExternalFileUploadInProgress[ id=" + id + " ]";
108+
}
109+
110+
}

src/main/java/edu/harvard/iq/dataverse/MailServiceBean.java

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,7 @@ public String getMessageTextBasedOnNotification(UserNotification userNotificatio
624624
comment
625625
)) ;
626626
return downloadCompletedMessage;
627+
627628
case GLOBUSUPLOADCOMPLETEDWITHERRORS:
628629
dataset = (Dataset) targetObject;
629630
messageText = BundleUtil.getStringFromBundle("notification.email.greeting.html");
@@ -634,8 +635,30 @@ public String getMessageTextBasedOnNotification(UserNotification userNotificatio
634635
comment
635636
)) ;
636637
return uploadCompletedWithErrorsMessage;
638+
639+
case GLOBUSUPLOADREMOTEFAILURE:
640+
dataset = (Dataset) targetObject;
641+
messageText = BundleUtil.getStringFromBundle("notification.email.greeting.html");
642+
String uploadFailedRemotelyMessage = messageText + BundleUtil.getStringFromBundle("notification.mail.globus.upload.failedRemotely", Arrays.asList(
643+
systemConfig.getDataverseSiteUrl(),
644+
dataset.getGlobalId().asString(),
645+
dataset.getDisplayName(),
646+
comment
647+
)) ;
648+
return uploadFailedRemotelyMessage;
637649

638-
case GLOBUSDOWNLOADCOMPLETEDWITHERRORS:
650+
case GLOBUSUPLOADLOCALFAILURE:
651+
dataset = (Dataset) targetObject;
652+
messageText = BundleUtil.getStringFromBundle("notification.email.greeting.html");
653+
String uploadFailedLocallyMessage = messageText + BundleUtil.getStringFromBundle("notification.mail.globus.upload.failedLocally", Arrays.asList(
654+
systemConfig.getDataverseSiteUrl(),
655+
dataset.getGlobalId().asString(),
656+
dataset.getDisplayName(),
657+
comment
658+
)) ;
659+
return uploadFailedLocallyMessage;
660+
661+
case GLOBUSDOWNLOADCOMPLETEDWITHERRORS:
639662
dataset = (Dataset) targetObject;
640663
messageText = BundleUtil.getStringFromBundle("notification.email.greeting.html");
641664
String downloadCompletedWithErrorsMessage = messageText + BundleUtil.getStringFromBundle("notification.mail.globus.download.completedWithErrors", Arrays.asList(
@@ -764,6 +787,8 @@ public Object getObjectOfNotification (UserNotification userNotification){
764787
return versionService.find(userNotification.getObjectId());
765788
case GLOBUSUPLOADCOMPLETED:
766789
case GLOBUSUPLOADCOMPLETEDWITHERRORS:
790+
case GLOBUSUPLOADREMOTEFAILURE:
791+
case GLOBUSUPLOADLOCALFAILURE:
767792
case GLOBUSDOWNLOADCOMPLETED:
768793
case GLOBUSDOWNLOADCOMPLETEDWITHERRORS:
769794
return datasetService.find(userNotification.getObjectId());

src/main/java/edu/harvard/iq/dataverse/UserNotification.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,8 @@ public enum Type {
3939
CHECKSUMIMPORT, CHECKSUMFAIL, CONFIRMEMAIL, APIGENERATED, INGESTCOMPLETED, INGESTCOMPLETEDWITHERRORS,
4040
PUBLISHFAILED_PIDREG, WORKFLOW_SUCCESS, WORKFLOW_FAILURE, STATUSUPDATED, DATASETCREATED, DATASETMENTIONED,
4141
GLOBUSUPLOADCOMPLETED, GLOBUSUPLOADCOMPLETEDWITHERRORS,
42-
GLOBUSDOWNLOADCOMPLETED, GLOBUSDOWNLOADCOMPLETEDWITHERRORS, REQUESTEDFILEACCESS;
42+
GLOBUSDOWNLOADCOMPLETED, GLOBUSDOWNLOADCOMPLETEDWITHERRORS, REQUESTEDFILEACCESS,
43+
GLOBUSUPLOADREMOTEFAILURE, GLOBUSUPLOADLOCALFAILURE;
4344

4445
public String getDescription() {
4546
return BundleUtil.getStringFromBundle("notification.typeDescription." + this.name());

src/main/java/edu/harvard/iq/dataverse/api/ApiConstants.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,8 @@ private ApiConstants() {
1717
public static final String DS_VERSION_LATEST = ":latest";
1818
public static final String DS_VERSION_DRAFT = ":draft";
1919
public static final String DS_VERSION_LATEST_PUBLISHED = ":latest-published";
20+
21+
// addFiles call
22+
public static final String API_ADD_FILES_COUNT_PROCESSED = "Total number of files";
23+
public static final String API_ADD_FILES_COUNT_SUCCESSFUL = "Number of files successfully added";
2024
}

0 commit comments

Comments
 (0)