Skip to content

Commit 4aaf954

Browse files
committed
add docs and release note for slim version of Croissant #12123
1 parent f7e176a commit 4aaf954

3 files changed

Lines changed: 6 additions & 5 deletions

File tree

doc/release-notes/11254-croissant-builtin.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
## Croissant Support Is Now Built In
1+
## Croissant Support Is Now Built In, Slim Version Added
22

33
Croissant is a metadata export format for machine learning datasets that (until this release) was optional and implemented as external exporter. The code has been merged into the main Dataverse code base which means the Croissant format is automatically available in your installation of Dataverse, alongside older formats like Dublin Core and DDI. If you were using the external Croissant exporter, the merged code is equivalent to version 0.1.6. Croissant bugs and feature requests should now be filed against the main Dataverse repo (https://github.com/IQSS/dataverse) and the old repo (https://github.com/gdcc/exporter-croissant) should be considered retired.
44

55
As described in the [Discoverability](https://dataverse-guide--12130.org.readthedocs.build/en/12130/admin/discoverability.html#id6) section of the Admin Guide, Croissant is inserted into the "head" of the HTML of dataset landing pages, as requested by the [Google Dataset Search](https://datasetsearch.research.google.com) team so that their tool can filter by datasets that support Croissant. In previous versions of Dataverse, when Croissant was optional and hadn't been enabled, we used the older "Schema.org JSON-LD" format in the "head". If you'd like to keep this behavior, you can use the feature flag [dataverse.legacy.schemaorg-in-html-head](https://dataverse-guide--12130.org.readthedocs.build/en/12130/installation/config.html#dataverse.legacy.schemaorg-in-html-head).
66

7-
We are aware that the amount of data in the "head" of the HTML can grow quite large for both Croissant and Schema.org JSON-LD. This is especially true of Croissant which exposes variable-level information. We plan to address this in https://github.com/IQSS/dataverse/issues/12123 . We also plan to support Croissant 1.1 in the future and are tracking this at https://github.com/IQSS/dataverse/issues/12014 .
7+
Both Croissant and Schema.org JSON-LD formats can become quite large when the dataset has many files or (for Croissant) when the files have many variables. As of this release, the "head" of the HTML contains a "slim" version of Croissant that doesn't contain information about files or variables. The original, full version of Croissant is still available via the "Export Metadata" dropdown. Both "croissant" and "croissantSlim" formats are available via API.
88

9-
See also #11254 and #12130.
9+
See also #11254, #12123, #12130, and #12191.
1010

1111
## New Settings
1212

doc/sphinx-guides/source/admin/discoverability.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,11 @@ Croissant Metadata in the ``<head>`` of Dataset Landing Pages
3737

3838
`Croissant <https://github.com/mlcommons/croissant>`_ is a metadata format for machine learning datasets.
3939

40-
In Dataverse, the ``<head>`` of the HTML source of a dataset landing page includes Croissant metadata like this::
40+
In Dataverse, the ``<head>`` of the HTML source of a dataset landing page includes a "slim" version of Croissant metadata like this::
4141

4242
<script type="application/ld+json">{"@context":..."cr":"http://mlcommons.org/croissant/"...
4343

44-
This is the same Croissant file you can download from a dataset landing page by clicking "Metadata" then "Export Metadata" (see :ref:`metadata-export-formats`) and the API (see ``croissant`` at :ref:`export-dataset-metadata-api`).
44+
This slim version does not have any information about files or variables but is otherwise is identical to the Croissant file you can download from a dataset landing page by clicking "Metadata" then "Export Metadata" (see :ref:`metadata-export-formats`). From the API you can download both ``croissant`` and ``croissantSlim`` formats (see :ref:`export-dataset-metadata-api`).
4545

4646
We include Croissant in the ``<head>`` because it's `recommended <https://github.com/mlcommons/croissant/issues/530#issuecomment-1964227662>`_ by Google for `Google Dataset Search <https://datasetsearch.research.google.com>`_, where they offer a filter to narrow results to only datasets with support for Croissant.
4747

doc/sphinx-guides/source/api/native-api.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2007,6 +2007,7 @@ Available Dataset Metadata Exporters
20072007
The following dataset metadata exporters ship with Dataverse:
20082008

20092009
- ``croissant``
2010+
- ``croissantSlim``
20102011
- ``Datacite``
20112012
- ``dataverse_json``
20122013
- ``dcterms``

0 commit comments

Comments
 (0)