Skip to content

Commit 055ae30

Browse files
committed
Merge branch 'develop' into 11912-edit-template-api
2 parents 7f6c9fa + 89cf927 commit 055ae30

69 files changed

Lines changed: 4387 additions & 999 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/container_base_push.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ jobs:
5050
# In case this is a push to develop, we care about buildtime.
5151
# Configure a remote ARM64 build host in addition to the local AMD64 in two steps.
5252
- name: Setup SSH agent
53-
uses: webfactory/ssh-agent@v0.9.1
53+
uses: webfactory/ssh-agent@v0.10.0
5454
with:
5555
ssh-private-key: ${{ secrets.BUILDER_ARM64_SSH_PRIVATE_KEY }}
5656
- name: Provide the known hosts key and the builder config
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
## Highlights
2+
3+
### Review Datasets
4+
5+
Dataverse now supports review datasets, a type of dataset that can be used to review resources such as other datasets in the Dataverse installation itself or various resources in external data repositories. APIs and a new "review" metadata block (with an "Item Reviewed" field) are in place but the UI for this feature will only available in a future version of the new React-based [Dataverse Frontend](https://github.com/IQSS/dataverse-frontend). See also the [guides](https://dataverse-guide--11753.org.readthedocs.build/en/11753/api/native-api.html#add-dataset-type), #11747, #12015, #11887, #12115, and #11753.
6+
7+
## Other Features Added
8+
9+
- Citation Style Language (CSL) output now includes "type:software" or "type:review" when those dataset types are used. See the [guides](https://dataverse-guide--11753.org.readthedocs.build/en/11753/api/native-api.html#get-citation-in-other-formats) and #11753.
10+
11+
## Updated APIs
12+
13+
- The Change Collection Attributes API now supports `allowedDatasetTypes`. See the [guides](https://dataverse-guide--11753.org.readthedocs.build/en/11753/api/native-api.html#change-collection-attributes), #12115, and #11753.
14+
15+
## Bugs Fixed
16+
17+
- 500 error when deleting dataset type by name. See #11833 and #11753.
18+
- Dataset Type facet works in JSF but not the SPA. See #11758 and #11753.
19+
20+
## Backward Incompatible Changes
21+
22+
### Dataset Types Must Be Allowed, Per-Collection, Before Use
23+
24+
In previous releases of Dataverse, as soon as additional dataset types were added (such as "software", "workflow", etc.), they could be used by all users when creating datasets (via API only). As of this release, on a per-collection basis, superusers must allow these dataset types to be used. See #12115 and #11753.
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
## Archiving, OAI-ORE, and BagIt Export
2+
3+
This release includes multiple updates to the OAI-ORE metadata export and the process of creating archival bags, improving performance, fixing bugs, and adding significant new functionality.
4+
5+
### General Archiving Improvements
6+
- Multiple performance and scaling improvements have been made for creating archival bags for large datasets, including:
7+
- The duration of archiving tasks triggered from the version table or API are no longer limited by the transaction time limit.
8+
- Temporary storage space requirements have increased by `1/:BagGeneratorThreads` of the zipped bag size. (This is a consequence of changes to avoid timeout errors on larger files/datasets.)
9+
- The size of individual data files and the total dataset size that will be included in an archival bag can now be limited. Admins can choose whether files above these limits are transferred along with, but outside, the zipped bag (creating a complete archival copy) or are just referenced (using the concept of a "holey" bag and just listing the oversized files and the Dataverse URLs from which they can be retrieved in a `fetch.txt` file). In the holey bag case, an active service on the archiving platform must retrieve the oversized files (using appropriate credentials as needed) to make a complete copy.
10+
- Superusers can now see a pending status in the dataset version table while archiving is active.
11+
- Workflows are now triggered outside the transactions related to publication, assuring that workflow locks and status updates are always recorded.
12+
- Potential conflicts between archiving/workflows, indexing, and metadata exports after publication have been resolved, avoiding cases where the status/last update times for these actions were not recorded.
13+
- A bug has been fixed where superusers would incorrectly see the "Submit" button to launch archiving from the dataset page version table.
14+
- The local, S3, and Google archivers have been updated to support deleting existing archival files for a version to allow re-creating the bag for a given version.
15+
- For archivers that support file deletion, it is now possible to recreate an archival bag after "Update Current Version" has been used (replacing the original bag). By default, Dataverse will mark the current version's archive as out-of-date, but will not automatically re-archive it.
16+
- A new 'obsolete' status has been added to indicate when an archival bag exists for a version but it was created prior to an "Update Current Version" change.
17+
- Improvements have been made to file retrieval for bagging, including retries on errors and when download requests are being throttled.
18+
- A bug causing `:BagGeneratorThreads` to be ignored has been fixed, and the default has been reduced to 2.
19+
- Retrieval of files for inclusion in an archival bag is no longer counted as a download.
20+
- It is now possible to require that all previous versions have been successfully archived before archiving of a newly published version can succeed. (This is intended to support use cases where deduplication of files between dataset versions will be done and is a step towards supporting the Oxford Common File Layout (OCFL).)
21+
- The pending status has changed to use the same JSON format as other statuses
22+
23+
### OAI-ORE Export Updates
24+
- The export now uses URIs for checksum algorithms, conforming with JSON-LD requirements.
25+
- A bug causing failures with deaccessioned versions has been fixed. This occurred when the deaccession note ("Deaccession Reason" in the UI) was null, which is permissible via the API.
26+
- The `https://schema.org/additionalType` has been updated to "Dataverse OREMap Format v1.0.2" to reflect format changes.
27+
28+
### Archival Bag (BagIt) Updates
29+
- The `bag-info.txt` file now correctly includes information for dataset contacts, fixing a bug where nothing was included when multiple contacts were defined. (Multiple contacts were always included in the OAI-ORE file in the bag; only the baginfo file was affected).
30+
- Values used in the `bag-info.txt` file that may be multi-line (i.e. with embedded CR or LF characters) are now properly indented and wrapped per the BagIt specification (`Internal-Sender-Identifier`, `External-Description`, `Source-Organization`, `Organization-Address`).
31+
- The dataset name is no longer used as a subdirectory within the `data/` directory to reduce issues with unzipping long paths on some filesystems.
32+
- For dataset versions with no files, the empty `manifest-<alg>.txt` file will now use the algorithm from the `:FileFixityChecksumAlgorithm` setting instead of defaulting to MD5.
33+
- A new key, `Dataverse-Bag-Version`, has been added to `bag-info.txt` with the value "1.0" to allow for tracking changes to Dataverse's archival bag generation over time.
34+
- When using the `holey` bag option discussed above, the required `fetch.txt` file will be included.
35+
36+
37+
### New Configuration Settings
38+
39+
This release introduces several new settings to control archival and bagging behavior.
40+
41+
- `:ArchiveOnlyIfEarlierVersionsAreArchived` (Default: `false`)
42+
When set to `true`, dataset versions must be archived in order. That is, all prior versions of a dataset must be archived before the latest version can be archived.
43+
44+
The following JVM options (MicroProfile Config Settings) control bag size and holey bag support:
45+
- `dataverse.bagit.zip.holey`
46+
- `dataverse.bagit.zip.max-data-size`
47+
- `dataverse.bagit.zip.max-file-size`
48+
49+
- `dataverse.bagit.archive-on-version-update` (Default: `false`)
50+
Indicates whether archival bag creation should be triggered (if configured) when a version is updated and was already successfully archived, i.e., via the Update-Current-Version publication option. Setting the flag to `true` only works if the archiver being used supports deleting existing archival bags.
51+
52+
###Backward Incompatibility
53+
54+
The name of archival zipped bag produced by the LocalSubmitToArchiveCommand archiver now has a '.' character before the version number mirror the name used by other archivers, e.g. the name will be like doi-10-5072-fk2-fosg5q.v1.0.zip rather than doi-10-5072-fk2-fosg5qv1.0.zip
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
{
2+
"datasetType": "review",
3+
"datasetVersion": {
4+
"license": {
5+
"name": "CC0 1.0",
6+
"uri": "http://creativecommons.org/publicdomain/zero/1.0"
7+
},
8+
"metadataBlocks": {
9+
"citation": {
10+
"fields": [
11+
{
12+
"typeName": "title",
13+
"value": "Review of Percent of Children That Have Asthma",
14+
"typeClass": "primitive",
15+
"multiple": false
16+
},
17+
{
18+
"value": [
19+
{
20+
"authorName": {
21+
"value": "Wazowski, Mike",
22+
"typeClass": "primitive",
23+
"multiple": false,
24+
"typeName": "authorName"
25+
}
26+
}
27+
],
28+
"typeClass": "compound",
29+
"multiple": true,
30+
"typeName": "author"
31+
},
32+
{
33+
"value": [
34+
{
35+
"datasetContactEmail": {
36+
"value": "mwazowski@mailinator.com",
37+
"typeClass": "primitive",
38+
"multiple": false,
39+
"typeName": "datasetContactEmail"
40+
}
41+
}
42+
],
43+
"typeClass": "compound",
44+
"multiple": true,
45+
"typeName": "datasetContact"
46+
},
47+
{
48+
"value": [
49+
{
50+
"dsDescriptionValue": {
51+
"value": "This is a review of a dataset.",
52+
"typeClass": "primitive",
53+
"multiple": false,
54+
"typeName": "dsDescriptionValue"
55+
}
56+
}
57+
],
58+
"typeClass": "compound",
59+
"multiple": true,
60+
"typeName": "dsDescription"
61+
},
62+
{
63+
"value": [
64+
"Medicine, Health and Life Sciences"
65+
],
66+
"typeClass": "controlledVocabulary",
67+
"multiple": true,
68+
"typeName": "subject"
69+
},
70+
{
71+
"value": {
72+
"itemReviewedUrl": {
73+
"value": "https://datacommons.org/tools/statvar#sv=Percent_Person_Children_WithAsthma",
74+
"typeClass": "primitive",
75+
"multiple": false,
76+
"typeName": "itemReviewedUrl"
77+
},
78+
"itemReviewedType": {
79+
"value": "Dataset",
80+
"typeClass": "controlledVocabulary",
81+
"multiple": false,
82+
"typeName": "itemReviewedType"
83+
},
84+
"itemReviewedCitation": {
85+
"value": "\"Statistical Variable Explorer - Data Commons.\" Datacommons.org, 2026, datacommons.org/tools/statvar#sv=Percent_Person_Children_WithAsthma. Accessed 9 Mar. 2026.",
86+
"typeClass": "primitive",
87+
"multiple": false,
88+
"typeName": "itemReviewedCitation"
89+
}
90+
},
91+
"typeClass": "compound",
92+
"multiple": false,
93+
"typeName": "itemReviewed"
94+
}
95+
]
96+
}
97+
}
98+
}
99+
}

doc/sphinx-guides/source/admin/big-data-administration.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,7 @@ There are a broad range of options (that are not turned on by default) for impro
302302
- :ref:`:DisableSolrFacetsWithoutJsession` - disables facets for users who have disabled cookies (e.g. for bots)
303303
- :ref:`:DisableUncheckedTypesFacet` - only disables the facet showing the number of collections, datasets, files matching the query (this facet is potentially less useful than others)
304304
- :ref:`:StoreIngestedTabularFilesWithVarHeaders` - by default, Dataverse stores ingested files without headers and dynamically adds them back at download time. Once this setting is enabled, Dataverse will leave the headers in place (for newly ingested files), reducing the cost of downloads
305+
- :ref:`dataverse.bagit.zip.max-file-size`, :ref:`dataverse.bagit.zip.max-data-size`, and :ref:`dataverse.bagit.zip.holey` - options to control the size and temporary storage requirements when generating archival Bags - see :ref:`BagIt Export`
305306

306307

307308
Scaling Infrastructure

doc/sphinx-guides/source/admin/dataverses-datasets.rst

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,64 @@ If the :AllowedCurationLabels setting has a value, one of the available choices
109109

110110
Individual datasets can be configured to use specific curationLabelSets as well. See the "Datasets" section below.
111111

112+
.. _review-datasets-setup:
113+
114+
Configure a Collection for Review Datasets
115+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
116+
117+
:ref:`review-datasets-user` are a specialized type of dataset that can be used to review resources (such as datasets) in the Dataverse installation itself or resources in external data repositories.
118+
119+
Review datasets require some setup, as described below.
120+
121+
Load the Review Metadata Block
122+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123+
124+
First, download the Review metadata block tsv file from :ref:`experimental-metadata`.
125+
126+
Then, load the block and update Solr. See the following sections of :doc:`metadatacustomization` for details:
127+
128+
- :ref:`load-tsv`
129+
- :ref:`update-solr-schema`
130+
131+
Create and Enable Custom "Rubric" Metadata Blocks
132+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
133+
134+
The Review metadata block gives you a few basic fields common to all reviews such as the URL of the item being reviewed.
135+
136+
You probably will want to create your own metadata blocks specific to the resources you are reviewing, your own "rubric". See :doc:`metadatacustomization` for details on creating and enabling custom metadata blocks.
137+
138+
Instead of creating a new custom metadata block from scratch (if you simply want to evaluate the feature, for example), you can use the metadata blocks at https://github.com/IQSS/dataverse.harvard.edu
139+
140+
After loading the block, don't forget to update the Solr schema!
141+
142+
Create a Review Dataset Type
143+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
144+
145+
Review datasets are built on the :ref:`dataset-types` feature. Dataset types can only be created via API so follow the steps under :ref:`api-add-dataset-type`. Copy and paste from below or download :download:`review.json <../../../../scripts/api/data/datasetTypes/review.json>` and pass it to the API.
146+
147+
.. literalinclude:: ../../../../scripts/api/data/datasetTypes/review.json
148+
:language: json
149+
150+
Create a Collection for Reviews and Configure Permissions
151+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
152+
153+
Follow the normal steps:
154+
155+
- :ref:`create-dataverse`.
156+
- :ref:`dataverse-permissions`.
157+
158+
Allow the Review Dataset Type for the Collection
159+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
160+
161+
Non-dataset types, such as the "review" type, are only available when a collection admin has enabled them, via API.
162+
163+
Using the API :ref:`collection-attributes-api`, change the ``allowedDatasetTypes`` attribute so that it includes "review". If you only want to allow reviews, you can pass just ``review``. If you want to allow multiple dataset types, you can pass a comma-separated list, such as ``review,dataset``.
164+
165+
Invite Users to Create Review Datasets
166+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
167+
168+
At this point, users should be able to create review datasets via API, if you gave them permission on the collection. You can point them to :ref:`creating-a-review-dataset` for details.
169+
112170
Datasets
113171
--------
114172

doc/sphinx-guides/source/admin/metadatacustomization.rst

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -444,6 +444,8 @@ Please note that metadata fields share a common namespace so they must be unique
444444

445445
We'll use this command again below to update the Solr schema to accomodate metadata fields we've added.
446446

447+
.. _load-tsv:
448+
447449
Loading TSV files into a Dataverse Installation
448450
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
449451

@@ -664,10 +666,6 @@ When creating new metadata blocks, please review the :doc:`/style/text` section
664666

665667
If there are tips that you feel are omitted from this document, please open an issue at https://github.com/IQSS/dataverse/issues and consider making a pull request to make improvements. You can find this document at https://github.com/IQSS/dataverse/blob/develop/doc/sphinx-guides/source/admin/metadatacustomization.rst
666668

667-
Alternatively, you are welcome to request "edit" access to this "Tips for Dataverse Software metadata blocks from the community" Google doc: https://docs.google.com/document/d/1XpblRw0v0SvV-Bq6njlN96WyHJ7tqG0WWejqBdl7hE0/edit?usp=sharing
668-
669-
The thinking is that the tips can become issues and the issues can eventually be worked on as features to improve the Dataverse Software metadata system.
670-
671669
Development Tasks Specific to Changing Fields in Core Metadata Blocks
672670
---------------------------------------------------------------------
673671

doc/sphinx-guides/source/admin/user-administration.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ This enables additional settings for each user in the notifications tab of their
9898
* ``CREATEDS`` Your dataset is created
9999
* ``CREATEDV`` Dataverse collection is created
100100
* ``DATASETCREATED`` Dataset was created by user
101+
* ``DATASETMENTIONED`` Dataset was mentioned via :doc:`/api/linkeddatanotification`
101102
* ``DATASETMOVED`` Dataset was moved by user
102103
* ``FILESYSTEMIMPORT`` Dataset has been successfully uploaded and verified
103104
* ``GRANTFILEACCESS`` Access to file is granted

doc/sphinx-guides/source/api/changelog.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ This API changelog is experimental and we would love feedback on its usefulness.
1010
v6.9
1111
----
1212

13+
- When creating datasets that contain a datasetType, that datasetType must be allowed at the collection level. This can be accomplished by passing ``allowedDatasetTypes`` to the :ref:`collection-attributes-api` API.
1314
- The POST /api/admin/makeDataCount/{id}/updateCitationsForDataset processing is now asynchronous and the response no longer includes the number of citations. The response can be OK if the request is queued or 503 if the queue is full (default queue size is 1000).
1415
- The way to set per-format size limits for tabular ingest has changed. JSON input is now used. See :ref:`:TabularIngestSizeLimit`.
1516
- In the past, the settings API would accept any key and value. This is no longer the case because validation has been added. See :ref:`settings_put_single`, for example.

0 commit comments

Comments
 (0)