You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make Croissant built-in to Dataverse (no longer an external exporter) (#12130)
* copy post-0.1.6 croissant code from external repo
This commit, specifically:
gdcc/exporter-croissant@a0c3b80
* add spotless config, limit it to croissant for now
* put Croissant in <head> by default, add flag for old behavior #11254
* add release note #11254
* remove generated files
* gitignore files generated by tests
* add croissant to expected export formats #11254
* list new setting and fix typo #11254
* convert feature flag to jvm option #11254
* wire ui:param to backing bean, group related settings #11254
Croissant is a metadata export format for machine learning datasets that (until this release) was optional and implemented as external exporter. The code has been merged into the main Dataverse code base which means the Croissant format is automatically available in your installation of Dataverse, alongside older formats like Dublin Core and DDI. If you were using the external Croissant exporter, the merged code is equivalent to version 0.1.6. Croissant bugs and feature requests should now be filed against the main Dataverse repo (https://github.com/IQSS/dataverse) and the old repo (https://github.com/gdcc/exporter-croissant) should be considered retired.
4
+
5
+
As described in the [Discoverability](https://dataverse-guide--12130.org.readthedocs.build/en/12130/admin/discoverability.html#id6) section of the Admin Guide, Croissant is inserted into the "head" of the HTML of dataset landing pages, as requested by the [Google Dataset Search](https://datasetsearch.research.google.com) team so that their tool can filter by datasets that support Croissant. In previous versions of Dataverse, when Croissant was optional and hadn't been enabled, we used the older "Schema.org JSON-LD" format in the "head". If you'd like to keep this behavior, you can use the feature flag [dataverse.legacy.schemaorg-in-html-head](https://dataverse-guide--12130.org.readthedocs.build/en/12130/installation/config.html#dataverse.legacy.schemaorg-in-html-head).
6
+
7
+
We are aware that the amount of data in the "head" of the HTML can grow quite large for both Croissant and Schema.org JSON-LD. This is especially true of Croissant which exposes variable-level information. We plan to address this in https://github.com/IQSS/dataverse/issues/12123 . We also plan to support Croissant 1.1 in the future and are tracking this at https://github.com/IQSS/dataverse/issues/12014 .
If you enable the Croissant metadata export format (see :ref:`external-exporters`) the ``<head>`` will show Croissant metadata instead. It looks similar, but you should see ``"cr": "http://mlcommons.org/croissant/"`` in the output.
44
+
This is the same Croissant file you can download from a dataset landing page by clicking "Metadata" then "Export Metadata" (see :ref:`metadata-export-formats`) and the API (see ``croissant`` at :ref:`export-dataset-metadata-api`).
44
45
45
-
For backward compatibility, if you enable Croissant, the older Schema.org JSON-LD format (``schema.org`` in the API) will still be available from both the web interface (see :ref:`metadata-export-formats`) and the API (see :ref:`export-dataset-metadata-api`).
46
+
We include Croissant in the ``<head>`` because it's `recommended <https://github.com/mlcommons/croissant/issues/530#issuecomment-1964227662>`_ by Google for `Google Dataset Search <https://datasetsearch.research.google.com>`_, where they offer a filter to narrow results to only datasets with support for Croissant.
46
47
47
-
The Dataverse team has been working with Google on both formats. Google has `indicated <https://github.com/mlcommons/croissant/issues/530#issuecomment-1964227662>`_ that for `Google Dataset Search <https://datasetsearch.research.google.com>`_ (the main reason we started adding this extra metadata in the ``<head>`` of dataset pages), Croissant is the successor to the older format.
48
+
Before Croissant was invented, Google recommended a different format that Dataverse refers to as "Schema.org JSON-LD" in the user interface (and ``schema.org`` in the API). If you prefer to put that older format in the ``<head>``, which was the behavior in older versions of Dataverse, see :ref:`dataverse.legacy.schemaorg-in-html-head`.
Copy file name to clipboardExpand all lines: doc/sphinx-guides/source/api/native-api.rst
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2006,6 +2006,7 @@ Available Dataset Metadata Exporters
2006
2006
2007
2007
The following dataset metadata exporters ship with Dataverse:
2008
2008
2009
+
- ``croissant``
2009
2010
- ``Datacite``
2010
2011
- ``dataverse_json``
2011
2012
- ``dcterms``
@@ -2034,7 +2035,7 @@ Please note that the ``schema.org`` format has changed in backwards-incompatible
2034
2035
2035
2036
Both forms are valid according to Google's Structured Data Testing Tool at https://search.google.com/structured-data/testing-tool . Schema.org JSON-LD is an evolving standard that permits a great deal of flexibility. For example, https://schema.org/docs/gs.html#schemaorg_expected indicates that even when objects are expected, it's ok to just use text. As with all metadata export formats, we will try to keep the Schema.org JSON-LD format backward-compatible to make integrations more stable, despite the flexibility that's afforded by the standard.
2036
2037
2037
-
The standard has further evolved into a format called Croissant. For details, see :ref:`schema.org-head` in the Admin Guide.
2038
+
The standard has further evolved into a format called Croissant. For details, see :ref:`croissant-head` in the Admin Guide.
2038
2039
2039
2040
The ``schema.org`` format changed after Dataverse 6.4 as well. Previously its content type was "application/json" but now it is "application/ld+json".
Copy file name to clipboardExpand all lines: doc/sphinx-guides/source/developers/coding-style.rst
+14-1Lines changed: 14 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,8 @@ Java
13
13
Formatting Code
14
14
~~~~~~~~~~~~~~~
15
15
16
+
How to format Java code is being discussed on `Zulip <https://dataverse.zulipchat.com/#narrow/channel/379673-dev/topic/code.20formatting.20.28Spotless.2C.20Checkstyle.2C.20etc.2E.29/near/432974039>`_ and the `dev mailing list <https://groups.google.com/g/dataverse-dev/c/y2Jpk3szTf8/m/NhTJvXblAgAJ>`_.
17
+
16
18
Tabs vs. Spaces
17
19
^^^^^^^^^^^^^^^
18
20
@@ -59,10 +61,21 @@ Place curly braces according to the style below, which is an example you can see
59
61
}
60
62
}
61
63
64
+
Format Code with Spotless
65
+
^^^^^^^^^^^^^^^^^^^^^^^^^
66
+
67
+
In some of our libraries we've had success formatting code with `Spotless <https://github.com/diffplug/spotless>`_. See https://github.com/gdcc/xoai/issues/35 for an early discussion.
68
+
69
+
We've added Spotless to the main repo but have limited it to certain files. If you'd like to use Spotless on files you're editing, update the config in pom.xml to include them.
70
+
71
+
To run Spotless on your code:
72
+
73
+
``mvn spotless:apply``
74
+
62
75
Format Code You Changed with Netbeans
63
76
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
64
77
65
-
IQSS has standardized on Netbeans. It is much appreciated when you format your code (but only the code you touched!) using the out-of-the-box Netbeans configuration. If you have created an entirely new Java class, you can just click Source -> Format. If you are adjusting code in an existing class, highlight the code you changed and then click Source -> Format. Keeping the "diff" in your pull requests small makes them easier to code review.
78
+
For a long time IQSS standardized on Netbeans. For files not included in the Spotless config mentioned above, it is much appreciated when you format your code (but only the code you touched!) using the out-of-the-box Netbeans configuration. If you have created an entirely new Java class, you can just click Source -> Format. If you are adjusting code in an existing class, highlight the code you changed and then click Source -> Format. Keeping the "diff" in your pull requests small makes them easier to code review.
Can also be set via any `supported MicroProfile Config API source`_, e.g. the environment variable ``DATAVERSE_API_MDC_MIN_DELAY_MS``.
3853
3853
3854
+
.. _dataverse.legacy.schemaorg-in-html-head:
3855
+
3856
+
dataverse.legacy.schemaorg-in-html-head
3857
+
+++++++++++++++++++++++++++++++++++++++
3858
+
3859
+
Instead of Croissant, use the legacy format (Schema.org JSON-LD) in the head of dataset landing pages by setting ``dataverse.legacy.schemaorg-in-html-head=true``. See :ref:`croissant-head`.
3860
+
3861
+
Can also be set via any `supported MicroProfile Config API source`_, e.g. the environment variable ``DATAVERSE_LEGACY_SCHEMAORG_IN_HTML_HEAD``.
3862
+
3854
3863
.. dataverse.ldn
3855
3864
3856
3865
Linked Data Notifications (LDN) Allowed Hosts
@@ -4033,7 +4042,6 @@ Only contact DataCite to update a DOI after checking to see if DataCite has outd
Once a dataset has been published, its metadata can be exported in a variety of other metadata standards and formats, which help make datasets more :doc:`discoverable </admin/discoverability>` and usable in other systems, such as other data repositories. On each dataset page's metadata tab, the following exports are available:
0 commit comments