Skip to content

Commit b4fe522

Browse files
Merge branch 'develop' into 12001-api-support-termofuse-guestbook
2 parents c095efc + 8db3dd2 commit b4fe522

37 files changed

Lines changed: 923 additions & 2718 deletions

doc/release-notes/12167-ore-bag-archiving-changes.md

Lines changed: 0 additions & 54 deletions
This file was deleted.

doc/sphinx-guides/source/admin/big-data-administration.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,6 @@ There are a broad range of options (that are not turned on by default) for impro
302302
- :ref:`:DisableSolrFacetsWithoutJsession` - disables facets for users who have disabled cookies (e.g. for bots)
303303
- :ref:`:DisableUncheckedTypesFacet` - only disables the facet showing the number of collections, datasets, files matching the query (this facet is potentially less useful than others)
304304
- :ref:`:StoreIngestedTabularFilesWithVarHeaders` - by default, Dataverse stores ingested files without headers and dynamically adds them back at download time. Once this setting is enabled, Dataverse will leave the headers in place (for newly ingested files), reducing the cost of downloads
305-
- :ref:`dataverse.bagit.zip.max-file-size`, :ref:`dataverse.bagit.zip.max-data-size`, and :ref:`dataverse.bagit.zip.holey` - options to control the size and temporary storage requirements when generating archival Bags - see :ref:`BagIt Export`
306305

307306

308307
Scaling Infrastructure

doc/sphinx-guides/source/installation/config.rst

Lines changed: 0 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -2259,22 +2259,10 @@ These archival Bags include all of the files and metadata in a given dataset ver
22592259

22602260
The Dataverse Software offers an internal archive workflow which may be configured as a PostPublication workflow via an admin API call to manually submit previously published Datasets and prior versions to a configured archive such as Chronopolis. The workflow creates a `JSON-LD <http://www.openarchives.org/ore/0.9/jsonld>`_ serialized `OAI-ORE <https://www.openarchives.org/ore/>`_ map file, which is also available as a metadata export format in the Dataverse Software web interface.
22612261

2262-
The size of the zipped archival Bag can be limited, and files that don't fit within that limit can either be transferred separately (placed so that they are correctly positioned according to the BagIt specification when the zipped bag in unzipped in place) or just referenced for later download (using the BagIt concept of a 'holey' bag with a list of files in a ``fetch.txt`` file) can now be configured for all archivers. These settings allow for managing large datasets by excluding files over a certain size or total data size, which can be useful for archivers with size limitations or to reduce transfer times. See the :ref:`dataverse.bagit.zip.max-file-size`, :ref:`dataverse.bagit.zip.max-data-size`, and :ref:`dataverse.bagit.zip.holey` JVM options for more details.
2263-
22642262
At present, archiving classes include the DuraCloudSubmitToArchiveCommand, LocalSubmitToArchiveCommand, GoogleCloudSubmitToArchive, and S3SubmitToArchiveCommand , which all extend the AbstractSubmitToArchiveCommand and use the configurable mechanisms discussed below. (A DRSSubmitToArchiveCommand, which works with Harvard's DRS also exists and, while specific to DRS, is a useful example of how Archivers can support single-version-only semantics and support archiving only from specified collections (with collection specific parameters)).
22652263

22662264
All current options support the :ref:`Archival Status API` calls and the same status is available in the dataset page version table (for contributors/those who could view the unpublished dataset, with more detail available to superusers).
22672265

2268-
Two settings that can be used with all current Archivers are:
2269-
2270-
- \:BagGeneratorThreads - the number of threads to use when adding data files to the zipped bag. The default is 2. Values of 4 or more may increase performance on larger machines but may cause problems if file access is throttled
2271-
- \:ArchiveOnlyIfEarlierVersionsAreArchived - when true, requires dataset versions to be archived in order by confirming that all prior versions have been successfully archived before allowing a new version to be archived. Default is false
2272-
2273-
These must be included in the \:ArchiverSettings for the Archiver to work
2274-
2275-
Archival Bags are created per dataset version. By default, if a version is republished (via the superuser-only 'Update Current Version' publication option in the UI/API), a new archival bag is not created for the version.
2276-
If the archiver used is capable of deleting existing bags (Google, S3, and File Archivers) superusers can trigger a manual update of the archival bag, and, if the :ref:`dataverse.bagit.archive-on-version-update` flag is set to true, this will be done automatically when 'Update Current Version' is used.
2277-
22782266
.. _Duracloud Configuration:
22792267

22802268
Duracloud Configuration
@@ -3727,14 +3715,6 @@ The email for your institution that you'd like to appear in bag-info.txt. See :r
37273715

37283716
Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_BAGIT_SOURCEORG_EMAIL``.
37293717

3730-
.. _dataverse.bagit.archive-on-version-update:
3731-
3732-
dataverse.bagit.archive-on-version-update
3733-
+++++++++++++++++++++++++++++++++++++++++
3734-
3735-
Indicates whether archival bag creation should be triggered (if configured) when a version is updated and was already successfully archived,
3736-
i.e via the Update-Current-Version publication option. Setting the flag true only works if the archiver being used supports deleting existing archival bags.
3737-
37383718
.. _dataverse.files.globus-monitoring-server:
37393719

37403720
dataverse.files.globus-monitoring-server
@@ -3897,21 +3877,6 @@ This can instead be restricted to only superusers who can publish the dataset us
38973877

38983878
Example: ``dataverse.coar-notify.relationship-announcement.notify-superusers-only=true``
38993879

3900-
.. _dataverse.bagit.zip.holey:
3901-
3902-
``dataverse.bagit.zip.holey``
3903-
A boolean that, if true, will cause the BagIt archiver to create a "holey" bag. In a holey bag, files that are not included in the bag are listed in the ``fetch.txt`` file with a URL from which they can be downloaded. This is used in conjunction with ``dataverse.bagit.zip.max-file-size`` and/or ``dataverse.bagit.zip.max-data-size``. Default: false.
3904-
3905-
.. _dataverse.bagit.zip.max-data-size:
3906-
3907-
``dataverse.bagit.zip.max-data-size``
3908-
The maximum total (uncompressed) size of data files (in bytes) to include in a BagIt zip archive. If the total size of the dataset files exceeds this limit, files will be excluded from the zipped bag (starting from the largest) until the total size is under the limit. Excluded files will be handled as defined by ``dataverse.bagit.zip.holey`` - just listed if that setting is true or being transferred separately and placed next to the zipped bag. When not set, there is no limit.
3909-
3910-
.. _dataverse.bagit.zip.max-file-size:
3911-
3912-
``dataverse.bagit.zip.max-file-size``
3913-
The maximum (uncompressed) size of a single file (in bytes) to include in a BagIt zip archive. Any file larger than this will be excluded. Excluded files will be handled as defined by ``dataverse.bagit.zip.holey`` - just listed if that setting is true or being transferred separately and placed next to the zipped bag. When not set, there is no limit.
3914-
39153880
.. _feature-flags:
39163881

39173882
Feature Flags
@@ -5405,11 +5370,6 @@ This setting specifies which storage system to use by identifying the particular
54055370

54065371
For examples, see the specific configuration above in :ref:`BagIt Export`.
54075372

5408-
:ArchiveOnlyIfEarlierVersionsAreArchived
5409-
++++++++++++++++++++++++++++++++++++++++
5410-
5411-
This setting, if true, only allows creation of an archival Bag for a dataset version if all prior versions have been successfully archived. The default is false (any version can be archived independently as long as other settings allow it)
5412-
54135373
:ArchiverSettings
54145374
+++++++++++++++++
54155375

src/main/java/edu/harvard/iq/dataverse/DataFile.java

Lines changed: 6 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -109,22 +109,18 @@ public class DataFile extends DvObject implements Comparable {
109109
* The list of types should be limited to the list above in the technote
110110
* because the string gets passed into MessageDigest.getInstance() and you
111111
* can't just pass in any old string.
112-
*
113-
* The URIs are used in the OAI_ORE export. They are taken from the associated XML Digital Signature standards.
114112
*/
115113
public enum ChecksumType {
116114

117-
MD5("MD5", "http://www.w3.org/2001/04/xmldsig-more#md5"),
118-
SHA1("SHA-1", "http://www.w3.org/2000/09/xmldsig#sha1"),
119-
SHA256("SHA-256", "http://www.w3.org/2001/04/xmlenc#sha256"),
120-
SHA512("SHA-512", "http://www.w3.org/2001/04/xmlenc#sha512");
115+
MD5("MD5"),
116+
SHA1("SHA-1"),
117+
SHA256("SHA-256"),
118+
SHA512("SHA-512");
121119

122120
private final String text;
123-
private final String uri;
124121

125-
private ChecksumType(final String text, final String uri) {
122+
private ChecksumType(final String text) {
126123
this.text = text;
127-
this.uri = uri;
128124
}
129125

130126
public static ChecksumType fromString(String text) {
@@ -135,30 +131,13 @@ public static ChecksumType fromString(String text) {
135131
}
136132
}
137133
}
138-
throw new IllegalArgumentException(
139-
"ChecksumType must be one of these values: " + Arrays.asList(ChecksumType.values()) + ".");
140-
}
141-
142-
public static ChecksumType fromUri(String uri) {
143-
if (uri != null) {
144-
for (ChecksumType checksumType : ChecksumType.values()) {
145-
if (uri.equals(checksumType.uri)) {
146-
return checksumType;
147-
}
148-
}
149-
}
150-
throw new IllegalArgumentException(
151-
"ChecksumType must be one of these values: " + Arrays.asList(ChecksumType.values()) + ".");
134+
throw new IllegalArgumentException("ChecksumType must be one of these values: " + Arrays.asList(ChecksumType.values()) + ".");
152135
}
153136

154137
@Override
155138
public String toString() {
156139
return text;
157140
}
158-
159-
public String toUri() {
160-
return uri;
161-
}
162141
}
163142

164143
//@Expose

0 commit comments

Comments
 (0)