Skip to content

Latest commit

 

History

History
389 lines (294 loc) · 15.7 KB

File metadata and controls

389 lines (294 loc) · 15.7 KB

DCAT-3 Export Properties Mechanism — Quick Reference

This document describes the declarative properties/config files used by the DCAT-3 exporter. It covers root config, resource config (elements), scoping, value sources, and formatting.

1. Root config (dcat-root.properties)

Defines output format, prefixes, elements, and relations.

trace.enabled = false

# Prefixes for CURIEs used in configs
prefix.dcat   = http://www.w3.org/ns/dcat#
prefix.dct    = http://purl.org/dc/terms/
prefix.foaf   = http://xmlns.com/foaf/0.1/
prefix.vcard  = http://www.w3.org/2006/vcard/ns#
prefix.skos   = http://www.w3.org/2004/02/skos/core#
prefix.rdfs   = http://www.w3.org/2000/01/rdf-schema#
prefix.xsd    = http://www.w3.org/2001/XMLSchema#
prefix.spdx   = http://spdx.org/rdf/terms#

# Elements (each loads its own resource config file)
element.catalog.id    = catalog
element.catalog.type  = dcat:Catalog
element.catalog.file  = dcat-catalog.properties

element.dataset.id    = dataset
element.dataset.type  = dcat:Dataset
element.dataset.file  = dcat-dataset.properties

element.distribution.id   = distribution
element.distribution.type = dcat:Distribution
element.distribution.file = dcat-distribution.properties

# Relations between element subjects
relation.catalog_has_dataset.subject     = catalog
relation.catalog_has_dataset.predicate   = dcat:dataset
relation.catalog_has_dataset.object      = dataset
relation.catalog_has_dataset.cardinality = 0..n

relation.dataset_has_distribution.subject     = dataset
relation.dataset_has_distribution.predicate   = dcat:distribution
relation.dataset_has_distribution.object      = distribution
relation.dataset_has_distribution.cardinality = 0..n

trace option

The trace option can be used to trace the internal data received from Dataverse so that proper JSON queries can be defined.

TIP: When exploring the structure of the traced JSON, you can use helpful external tools:

harvestable vs availableToUsers

  • This exporter provides DCAT serializations in RDF/XML, Turtle, and JSON‑LD.
  • Dataverse harvesters only support XML formats, therefore only the RDF/XML variant is harvestable.
  • For Turtle and JSON‑LD, the harvestable property is ignored and effectively overridden to false, regardless of its value in dcat-root.properties.
  • The availableToUsers flag only controls visibility in the Dataverse UI: when set to true, the format will appear in the Metadata → Export menu for manual export by users.

Example (effective behavior):

dcat.format.rdfXml.availableToUsers = true
dcat.format.rdfXml.harvestable     = true

dcat.format.turtle.availableToUsers = true
dcat.format.turtle.harvestable      = false   # ignored/overridden

dcat.format.jsonLd.availableToUsers = true
dcat.format.jsonLd.harvestable      = false   # ignored/overridden

relation

The relations describe which entities are relevant in the application profile. Each of the entities can have a file describing that entity.

2. Resource config (e.g., dcat-distribution.properties)

Controls how to build a resource model (subjects, properties, nodes).

2.1 Scope

Use scope.json to iterate over parts of the input JSON:

# Iterate over each file
scope.json = $.datasetFileDetails[*]

If you accidentally use $.datasetFileDetails (no [*]), the mapper will auto-iterate the array.

2.2 Subject

Define the resource subject IRI:

# Mint an IRI per file id
subject.iri.json   = $.id
subject.iri.format = https://dataverse.nl/distribution/${value}

2.3 Properties (ValueSource)

Each props.<id>.* block describes one property. Supported keys:

  • predicate – CURIE/IRI of the predicate (resolved against prefixes)
  • asliteral | iri | node-ref
  • lang – language tag for literals (optional)
  • datatype – datatype IRI (CURIE allowed), for typed literals (optional)
  • json – JSONPath to read a value (supports $$ for root lookup)
  • json.Nindexed JSONPaths; allows composition in format using ${1}, ${2}, …
  • const – constant value
  • map.* – mapping table (e.g., map.python = text/x-python)
  • format – template to compose values. Supports:
    • ${value} – the base value (from the stream or json)
    • ${1}, ${2}, … – from json.1, json.2, …
    • inline JSONPath placeholders: ${$.path} or ${$$.path}
  • multitrue to emit multiple values from a multi-match JSONPath
  • nodenode id for as=node-ref (see nodes below)
  • when – future conditional emission (reserved)
  • onUnMappedValue – fallback value when input exists but doesn't match any map key (for literal properties)
  • onNoInputValue – fallback value when no input is present from JSON path (for literal properties)

Examples

# Literal title taken from file name
props.title.predicate = dct:title
props.title.as        = literal
props.title.lang      = en
props.title.json      = $.filename

# Typed literal (byte size)
props.byteSize.predicate = dcat:byteSize
props.byteSize.as        = literal
props.byteSize.datatype  = xsd:nonNegativeInteger
props.byteSize.json      = $.filesize

# Media type literal
props.mediaType.predicate = dcat:mediaType
props.mediaType.as        = literal
props.mediaType.json      = $.contentType

# Dataset access URL read from the global root (not per file)
props.accessURL.predicate = dcat:accessURL
props.accessURL.as        = iri
props.accessURL.json      = $$.datasetJson.persistentUrl

# Email IRI using format
nodes.contact.props.email.predicate = vcard:hasEmail
nodes.contact.props.email.as        = iri
nodes.contact.props.email.json      = $..metadataBlocks.citation.fields[?(@.typeName=='datasetContact')].value[0].datasetContactEmail.value
nodes.contact.props.email.format    = mailto:${value}

# Version composed from two JSON paths
props.hasVersion.predicate = dct:hasVersion
props.hasVersion.as        = literal
props.hasVersion.json.1    = $$.datasetJson.datasetVersion.versionNumber
props.hasVersion.json.2    = $$.datasetJson.datasetVersion.versionMinorNumber
props.hasVersion.format    = V${1}.${2}

# Alternate one-liner using inline JSONPaths
# props.hasVersion.format = V${$$.datasetJson.datasetVersion.versionNumber}.${$$.datasetJson.datasetVersion.versionMinorNumber}

2.3.1 Fallback Values for Literal Properties

When working with literal properties that use mapping tables (map.*), you can provide fallback values for cases where:

  1. No input is present: Use onNoInputValue when the JSON path returns no data
  2. Input doesn't match mapping: Use onUnMappedValue when input exists but doesn't match any key in the mapping table
# Status property with mapping and fallbacks
props.status.predicate = dct:status
props.status.as        = literal
props.status.json      = $.publicationState
props.status.map.published = published
props.status.map.draft    = draft
props.status.onUnMappedValue = unknown
props.status.onNoInputValue  = not specified

Behavior:

  • If $.publicationState contains "published" → emits "published"
  • If $.publicationState contains "draft" → emits "draft"
  • If $.publicationState contains "archived" → emits "unknown" (unmapped fallback)
  • If $.publicationState is missing/null → emits "not specified" (no input fallback)

2.3.2 Fallback Values for IRI Nodes

For IRI nodes referenced via as=node-ref, similar fallback logic applies when the node uses mapping tables:

# Access rights node with fallbacks
nodes.accessRights.kind      = iri
nodes.accessRights.type      = dct:RightsStatement
nodes.accessRights.iri.json  = $.accessLevel
nodes.accessRights.map.public    = http://publications.europa.eu/resource/authority/access-right/PUBLIC
nodes.accessRights.map.restricted = http://publications.europa.eu/resource/authority/access-right/RESTRICTED
nodes.accessRights.onUnMappedValue = http://publications.europa.eu/resource/authority/access-right/NON_PUBLIC
nodes.accessRights.onNoInputValue  = http://publications.europa.eu/resource/authority/access-right/PUBLIC

# Reference the node
props.accessRights.predicate = dct:accessRights
props.accessRights.as        = node-ref
props.accessRights.node      = accessRights

Behavior:

  • If $.accessLevel contains "public" → creates IRI node http://publications.europa.eu/resource/authority/access-right/PUBLIC
  • If $.accessLevel contains "internal" → creates IRI node http://publications.europa.eu/resource/authority/access-right/NON_PUBLIC (unmapped fallback)
  • If $.accessLevel is missing/null → creates IRI node http://publications.europa.eu/resource/authority/access-right/PUBLIC (no input fallback)

2.4 Nodes

Use nodes.<id>.* to describe embedded nodes for as=node-ref:

# checksum node
props.checksum.predicate = spdx:checksum
props.checksum.as        = node-ref
props.checksum.node      = checksum

nodes.checksum.kind = bnode               # or "iri" with nodes.checksum.iri.const
nodes.checksum.type = spdx:Checksum
nodes.checksum.props.checksumValue.predicate = spdx:checksumValue
nodes.checksum.props.checksumValue.as        = literal
nodes.checksum.props.checksumValue.json      = $.checksum.value

3. Root vs scoped JSONPath

  • $... – evaluated against the current scope (e.g., the file object in datasetFileDetails[*]).
  • $$... – evaluated against the original document root.

4. Serialization caveats

  • RDF/XML requires absolute IRIs. Use format (e.g., mailto:${value}) to make email addresses valid IRIs.
  • Turtle will show typed literals with quotes (e.g., "4026"^^xsd:nonNegativeInteger). This is correct.

5. Troubleshooting

  • If a JSONPath fails, enable tracing and check the scope you are in; ensure you use $ vs $$ appropriately.
  • When linking elements (dataset → distribution), ensure the subjects are minted (absolute IRIs) and relations are applied after model merging.

6. Validation

The following validations are carried out:

Prefixes

  • Empty prefix keys / invalid IRIs → ERROR
  • Missing prefixes → WARNING

Elements

  • Missing id, typeCurieOrIri, file → ERROR
  • typeCurieOrIri not CURIE/IRI or unknown CURIE prefix → ERROR

Relations

  • Missing subject/object/predicate → ERROR
  • Predicate not CURIE/IRI or unknown prefix → ERROR

Subject

  • No minting strategy at all (const/template/json) → WARNING
  • iriFormat provided without template or json → ERROR

ValueSource

  • Missing predicate → ERROR
  • Bad as value → ERROR
  • node-ref without nodeRef → ERROR
  • No source (json|const|json.*|node) → WARNING

NodeTemplate

  • Empty id → ERROR
  • kind must be bnode or iri → ERROR
  • type must be CURIE/IRI; check prefixes → ERROR

7. Real-World Example: Aggregate Access Control (Administrator Responsibility Pattern)

Scenario

A dataset contains multiple files with different access restrictions:

  • Some files are public (restricted=false)
  • Some files are restricted (restricted=true)

Requirement: Dataset-level accessRights should reflect the most restrictive file access level.

Solution Pattern

This pattern demonstrates handling aggregation of file-level properties to dataset level without adding new DSL features. The key principle is that data governance decisions belong with administrators, not automation logic.

Distribution-Level (Per File) — Automatic Mapping

Each distribution's rights are automatically derived from its file's restricted flag:

# Each file's restricted boolean is directly mapped
nodes.rights.kind      = iri
nodes.rights.type      = dct:RightsStatement
nodes.rights.iri.json  = $.restricted
nodes.rights.map.true  = http://publications.europa.eu/resource/authority/access-right/RESTRICTED
nodes.rights.map.false = http://publications.europa.eu/resource/authority/access-right/PUBLIC
props.rights.predicate = dct:rights
props.rights.as        = node-ref
props.rights.node      = rights

Dataset-Level (Aggregate) — Administrator Responsibility

The dataset's accessRights are set through administrator configuration via metadata:

# Dataset rights are configured by administrator via metadata field
# Admin must set DCATaccessRights to match the most restrictive file level
nodes.ar.kind                = iri
nodes.ar.type                = dct:RightsStatement
nodes.ar.iri.json            = $..DCATaccessRights
nodes.ar.map.public          = http://publications.europa.eu/resource/authority/access-right/PUBLIC
nodes.ar.map.restricted      = http://publications.europa.eu/resource/authority/access-right/RESTRICTED
nodes.ar.map.non-public      = http://publications.europa.eu/resource/authority/access-right/NON_PUBLIC
props.accessRights.predicate = dct:accessRights
props.accessRights.as        = node-ref
props.accessRights.node      = ar

Key Principles

  1. Distribution-level rights are automatic: Each file's restricted boolean determines its distribution's rights
  2. Dataset-level rights are manual: Administrator/curator explicitly configures metadata to reflect governance policy
  3. Each can differ appropriately: DCAT-AP-NL 3.0 treats Dataset and Distribution as separate concerns
  4. Mapping system stays focused: Configuration-driven mapping, not business logic

Administrator Workflow

  1. Publish dataset with files (some may be access-restricted)
  2. Set DCATaccessRights metadata field:
    • If any file is restricted → select "restricted"
    • If all files are public → select "public"
  3. Export to DCAT
    • Mapping reads configured metadata and outputs it
    • Each distribution has its own rights (per file)
    • Dataset has aggregate rights (per admin configuration)

Why This Approach

DCAT-AP-NL 3.0 Compliant: Treats Dataset and Distribution as separate concerns
Governance Best Practice: Access policy decisions belong with data stewards, not automation
Mapping System Simple: Remains focused on configuration, not business logic
Extensible: Organizations can later add pre-processing at Dataverse level if needed

Implementation Reference

See test case: src/test/java/io/gdcc/spi/export/dcat3/Issue49DatasetAccessRightsTest.java
See detailed explanation: ISSUE_49_SOLUTION.md


This mechanism is designed to be declarative, composable, and profile-friendly for DCAT/DCAT‑AP exports.

Contributing new application profiles (national / sectoral)

When adding a new Application Profile (AP), such as DCAT‑AP‑DE, DCAT‑AP‑NO or an organisation‑specific profile, place your mapping files and test fixtures in:

application_profiles/
    <profile_name>/
        mapping/
            dcat-root.properties
            dcat-dataset.properties
            dcat-distribution.properties
            ...
            README.md          # purpose, scope, external spec links

src/test/resources/application_profiles/
    <profile_name>/input/      # export_data_source_*.json fixtures
    <profile_name>/expected/   # optional expected RDF outputs (not required) and order in RDF is non deterministic

You can add a testcase just like is done for the NL profile.

Testing strategy

  • Unit tests use local JSON fixtures combined with the mapping files in the profile.
  • Integration tests load the Application Profile via the JVM system property:

-Ddataverse.dcat3.config=/path/to/profile/mapping/dcat-root.properties

Why this layout

  • Keeps all Application Profiles self‑contained.
  • Allows multiple national/organisational profiles to coexist without clashes.
  • Ensures test‑only data does not pollute production mappings.