This repository contains a DCAT‑3 exporter for Dataverse. It produces RDF metadata conforming to DCAT 3.0 and the Dutch application profile DCAT‑AP‑NL 3.0. The exporter is designed for extensibility and strict validation using SHACL shapes tailored to DCAT‑AP‑NL.
The system is consciously split into three layers that mirror the flow from Dataverse input to DCAT output:
-
Configuration Parsing (Loaders)
- Purpose: Read and parse configuration files that describe the mapping from the Dataverse input structure to DCAT‑3 resources and properties.
- Main components:
RootConfigLoader— loads the root configuration, including global prefixes, element lists, relation definitions, and base directory settings.ResourceConfigLoader— loads per‑element resource configuration files (one per DCAT resource type), defining JSON path extraction, RDF type, property mappings, and value transformations.
- Outcome: In‑memory configuration model objects that reflect the structure of the configuration (see next section).
-
Mapping (Model Construction)
- Purpose: Transform Dataverse metadata (obtained via
ExportDataProvider) into RDF Jena Models according to configuration. - Main components:
JaywayJsonFinder— navigates the input JSON tree with robust path expressions.ResourceMapper— builds aModelfor each configured DCAT element, assertsrdf:type, sets properties, expands CURIEs usingPrefixes, and produces RDF resources (subjects).
- Outcome: One
Modelper element plus subject collections identified byrdf:type.
- Purpose: Transform Dataverse metadata (obtained via
-
Validation (Model Validation)
- Purpose: Aid the user on providing correct configuration data with meaningful messages
- Main components:
Validators- calls all the specific validators
- Outcome: Valid configuration (or at least reasonably valid), circumventing problems later.
-
Writing (Serialization)
- Purpose: Merge element models, apply configured relations (n:m), then serialize the combined model.
- Main components:
Dcat3ExporterBase— shared orchestration: load root config, build element models, apply relations, and write via a format‑specific Jena writer.- Format implementations:
Dcat3ExporterTurtle→ writer"TURTLE", media typetext/turtle.Dcat3ExporterJsonLd→ writer"JSON-LD", media typeapplication/ld+json.Dcat3ExporterRdfXml→ writer"RDF/XML", media typeapplication/rdf+xml.
- Outcome: Deterministic, profile‑compliant RDF output, independent of configuration keys for format.
Configuration is represented by strongly‑typed model classes that parallel file content:
-
Root level (
RootConfig)baseDir— base directory for locating per‑element files.prefixes— map of CURIE prefixes → IRIs.elements— list ofElementdescriptors, each pointing to a resource configuration file and the element’s RDF type (typeCurieOrIri).relations— list ofRelationdescriptors (subject element id, predicate CURIE/IRI, object element id).trace— optional diagnostics to log the input data snapshot.
-
Resource level (
ResourceConfig)- Declares the mapping rules for a single DCAT resource type, including value extraction (JSON paths), constant values, conditional mappings, and property targets (CURIE/IRI expansion via
Prefixes).
- Declares the mapping rules for a single DCAT resource type, including value extraction (JSON paths), constant values, conditional mappings, and property targets (CURIE/IRI expansion via
This mirroring ensures loaders can validate and report configuration issues early and gives the mapper a stable, explicit contract.
RootConfigLoaderloads the main configuration file from the classpath or filesystem.ResourceConfigLoaderloads per‑element config files referenced byRootConfig.elements[i].file.FileResolver.resolveElementFile(baseDir, element.file)applies the following fallback chain to locate configuration:- Absolute path — if
element.fileis absolute, use it as‑is. - Relative to
baseDir— ifelement.fileis relative andbaseDiris set, resolvebaseDir/element.file. - Classpath resource — if not found on filesystem, attempt to load
element.filefrom the application classpath (e.g.,src/main/resources). - Failure — emit a clear error stating the search order and the path(s) attempted.
- Absolute path — if
This mechanism lets you bundle defaults within the JAR and override them with deployment‑specific files.
- DCAT‑3.0‑AP‑NL: The exporter targets the Dutch application profile of DCAT 3.0. Mappings and prefixes should reflect AP‑NL vocabularies and constraints.
- SHACL: Generated RDF can be validated against SHACL shapes (e.g., node shapes for
dcat:Dataset,dcat:Distribution,vcard:Kind, etc.). The SHACL shapes enforce:- Cardinalities (
sh:minCount,sh:maxCount). - Datatypes (
xsd:string,rdf:langString,xsd:date,xsd:dateTime). - Node kinds (IRI vs literal) and nested shape conformance (
sh:node).
- Cardinalities (
Recommendation: Include a validation step in CI using Apache Jena SHACL to assert conformance to DCAT‑AP‑NL. Store the shapes under
src/main/resources/shacl/and fail builds on violations.
All three format exporters are annotated with @AutoService(Exporter.class). During the build, META-INF/services/io.gdcc.spi.export.Exporter entries are generated automatically so Dataverse can discover them via ServiceLoader.
- Build:
mvn package. - Copy the JAR to the Dataverse SPI exporters directory.
- Restart Payara.
- In Dataverse, use Metadata → Export Metadata and select the desired DCAT‑3 format.
- Unit tests can assert mapping behavior by comparing expected models to the output (
mvn test). - Include a validation script (e.g.,
validate.sh) to run SHACL checks on produced RDF. - Provide scripts to update expected outputs after mapping changes.
To add a new format:
- Create a subclass of
Dcat3ExporterBase. - Implement
getFormatName(),getMediaTypeValue(), andgetJenaWriterName(). - Annotate with
@AutoService(Exporter.class)and rebuild.
To add a new DCAT element:
- Add an
Elemententry toRootConfigreferencing a newResourceConfigfile. - Provide the resource mapping rules in the config file.
- (Optional) Update SHACL shapes if the element introduces new constraints.
- Split TSV file
Separate into:- A DCAT‑AP‑NL 3.0 extension of the metadata.
- An organization-specific TSV for custom fields.