diff --git a/docs/adr/0014-standardized-provenance-recording.md b/docs/adr/0014-standardized-provenance-recording.md new file mode 100644 index 00000000..4eb6df4e --- /dev/null +++ b/docs/adr/0014-standardized-provenance-recording.md @@ -0,0 +1,62 @@ + +# Standardized provenance recording + +* Status: proposed +* Deciders: sdruskat, skernchen, notactuallyfinn +* Date: 2025-10-17 + +Technical story: +* https://github.com/softwarepub/hermes/pull/442 +* https://github.com/softwarepub/hermes/issues/363 + +## Context and Problem Statement + +To consolidate traceability of the metadata, and resolution based on metadata sources in case of duplicates, etc., we need to record the provenance of metadata values in a __standardized__ way. +To achieve this, we use the [PROV-O ontology](https://www.w3.org/TR/prov-o/) serialized as [JSON-LD](https://www.w3.org/TR/json-ld/). Additionally, HERMES should make it possible to record as much of the provenance as possible *centrally*, i.e., as part of the core codebase. This is to keep plugin developers from having to supply their own provenance solutions. + +To do this, we need to specify what provenance information is recorded and how it can be implemented in HERMES to make it easy to use. + +## Considered Options + +* Provide HERMES API-methods that also document themselves + +## Decision Outcome + +Chosen option: + +## Pros and Cons of the Options + +### Provide HERMES API-methods that also document themselves + +Provide API-methods for loading, writing, making web requests, etc. that document themselves.
+Those methods take also the function that should be used for the task at hand and just define a framework in which we implement the provenance-data recording.
+Like so: +```python +class HermesPlugin(): + def load(func, path: str, *args, **kwargs): + # TODO: handle and record byte formats properly + with open(path) as fi: + data = func(fi, *args, **kwargs) + prov.record("load", path, func.__name__, data) # also module of func + return data + + def write(func, path: str, data, *args, **kwargs): + # TODO: handle and record byte formats properly + with open(path) as fi: + func(fi, data, *args, **kwargs) + prov.record("write", path, func.__name__, data) # also module of func +``` + +* Good, because allows for recording of provenance information of the plugins +* Good, because it isn't making plugin development harder +* Bad, because API methods may not cover all I/O functionality python provides +* Bad, because it doesn't cover merging, mapping, etc. + +All provenance information should be recorded in the following format where addtional properties of agents, activites and entities are values of suitable vocabularies (from Schema.org, CodeMeta and potentially other schemas): + +![](./hermes-prov-diagram/hermes-prov.svg)
+source: [hermes-prov.drawio](./hermes-prov-diagram/hermes-prov.drawio) diff --git a/docs/adr/hermes-prov-diagram/hermes-prov.drawio b/docs/adr/hermes-prov-diagram/hermes-prov.drawio new file mode 100644 index 00000000..4173db36 --- /dev/null +++ b/docs/adr/hermes-prov-diagram/hermes-prov.drawio @@ -0,0 +1,2138 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/adr/hermes-prov-diagram/hermes-prov.drawio.license b/docs/adr/hermes-prov-diagram/hermes-prov.drawio.license new file mode 100644 index 00000000..2e24f7a4 --- /dev/null +++ b/docs/adr/hermes-prov-diagram/hermes-prov.drawio.license @@ -0,0 +1,3 @@ +SPDX-FileCopyrightText: 2025 German Aerospace Center (DLR) + +SPDX-License-Identifier: CC-BY-SA-4.0 diff --git a/docs/adr/hermes-prov-diagram/hermes-prov.svg b/docs/adr/hermes-prov-diagram/hermes-prov.svg new file mode 100644 index 00000000..dbe5a2ae --- /dev/null +++ b/docs/adr/hermes-prov-diagram/hermes-prov.svg @@ -0,0 +1,4 @@ + + + +
actedOnBehalfOf
wasDerivedFrom
wasAssociatedWith
wasAttributedTo
wasAttributedTo
wasAttributedTo
wasAttributedTo
wasAssociatedWith
actedOnBehalfOf
cff-plugin
harvest-plugin
version, settings
CITATION.cff
text, path uri
wasAttributedTo
CITATION python
software-metadata
wasAssociatedWith
load
func, args, kwargs
map
wasDerivedFrom
wasDerivedFrom
wasGeneratedBy
wasGeneratedBy
used
used
wasAttributedTo
wasAssociatedWith
.hermes/harvest/cff/
codemeta.json
wasDerivedFrom
wasGeneratedBy
used
.hermes/harvest/
cff/context.json
text, path uri
.hermes/harvest/
cff/expanded.json
text, path uri
wasDerivedFrom
wasDerivedFrom
wasGeneratedBy
wasGeneratedBy
load
wasDerivedFrom
wasGeneratedBy
used
CITATION data
write
deposit 2
deposit-plugin
version, settings
update metadata
used
wasAssociatedWith
wasDerivedFrom
deposit data after deposit 2
wasGeneratedBy
used
wasAttributedTo
actedOnBehalfOf
pyproject-plugin
harvest-plugin
version, settings
hermes
version
pyproject.toml
text, path uri
.hermes/harvest/pyproject/
codemeta.json
wasAttributedTo
pyproject python
software-metadata
wasAssociatedWith
load
func, args, kwargs
map
write
wasDerivedFrom
wasDerivedFrom
wasDerivedFrom
wasGeneratedBy
wasGeneratedBy
used
used
used
wasAttributedTo
wasAssociatedWith
.hermes/harvest/pyproject/
expanded.json
text, path uri
wasDerivedFrom
wasGeneratedBy
process
process-"plugin"
strategies
actedOnBehalfOf
.hermes/process/result/
codemeta.json
pyproject data
processed data
load
merge and process
write
wasDerivedFrom
wasDerivedFrom
wasDerivedFrom
wasGeneratedBy
wasGeneratedBy
wasGeneratedBy
used
used
wasAttributedTo
wasAssociatedWith
wasAssociatedWith
.hermes/process/result/
context.json
text, path uri
.hermes/process/result/
expanded.json
text, path uri
wasDerivedFrom
wasGeneratedBy
wasGeneratedBy
wasAttributedTo
wasAttributedTo
used
HERMESCache
actedOnBehalfOf
user added
merging strategies
used
curate
curate-plugin
version, settings
actedOnBehalfOf
wasAttributedTo
processed data
wasAssociatedWith
load
wasDerivedFrom
wasGeneratedBy
used
User
actedOnBehalfOf
curated data
wasDerivedFrom
wasInfluencedBy
write
wasDerivedFrom
used
wasAssociatedWith
.hermes/curate/result/
context.json
text, path uri
.hermes/curate/result/
expanded.json
text, path uri
wasDerivedFrom
wasGeneratedBy
wasGeneratedBy
wasAttributedTo
wasAttributedTo
deposit 1
deposit-plugin
version, settings
actedOnBehalfOf
wasAttributedTo
curated data
wasAssociatedWith
load
wasDerivedFrom
wasGeneratedBy
used
write
wasDerivedFrom
used
wasAssociatedWith
.hermes/deposit/result/
context.json
text
path uri
.hermes/deposit/result/
expanded.json
text
path uri
wasDerivedFrom
wasGeneratedBy
wasGeneratedBy
wasAttributedTo
wasAttributedTo
update metadata
used
wasAssociatedWith
actedOnBehalfOf
wasAttributedTo
wasAssociatedWith
wasAttributedTo
wasAttributedTo
wasAttributedTo
wasAssociatedWith
wasAttributedTo
deposit data
wasAssociatedWith
load
wasGeneratedBy
used
wasDerivedFrom
cff-plugin
postprocess-
plugin
version, settings
map
wasAssociatedWith
used
wasAttributedTo
new data for CITATION.cff
wasDerivedFrom
wasGeneratedBy
CITATION.cff
text, path uri
wasAssociatedWith
load
func, args, kwargs
used
wasAttributedTo
data from CITATION.cff
wasDerivedFrom
wasGeneratedBy
merge
used
wasAssociatedWith
used
new CITATION.cff data
wasDerivedFrom
wasGeneratedBy
wasAttributedTo
wasDerivedFrom
CITATION.cff
path uri
wasAssociatedWith
write
func, args, kwargs
used
wasDerivedFrom
wasGeneratedBy
wasDerivedFrom
deposit data after deposit 1
wasGeneratedBy
pyproject-plugin
postprocess-
plugin
version, settings
map
wasAssociatedWith
used
wasAttributedTo
new data for pyproject.toml
wasDerivedFrom
wasGeneratedBy
pyproject.toml
text, path uri
wasAssociatedWith
load
func, args, kwargs
used
wasAttributedTo
data from pyproject.toml
wasDerivedFrom
wasGeneratedBy
merge
used
used
new pyproject.toml data
wasDerivedFrom
wasGeneratedBy
wasAttributedTo
wasDerivedFrom
pyproject.toml
path uri
wasAssociatedWith
write
func, args, kwargs
used
wasDerivedFrom
wasGeneratedBy
actedOnBehalfOf
HARVEST
PROCESS
CURATE
DEPOSIT
POST-PROCESS
Legend
design
meaning
provenance: Agent
provenance: Entity
provenance: Activity
bold text
record those properties always
solid lining
record as detailed as possible
dashed lining
record without many details
grayed out
optional / not always existent
red background
documentation unclear
name
properties
name
properties
name
properties
.hermes/harvest/pyproject/
context.json
text, path uri
wasDerivedFrom
wasGeneratedBy
wasGeneratedBy
wasAttributedTo
wasDerivedFrom
wasAttributedTo
wasAssociatedWith
\ No newline at end of file diff --git a/docs/adr/hermes-prov-diagram/hermes-prov.svg.license b/docs/adr/hermes-prov-diagram/hermes-prov.svg.license new file mode 100644 index 00000000..2e24f7a4 --- /dev/null +++ b/docs/adr/hermes-prov-diagram/hermes-prov.svg.license @@ -0,0 +1,3 @@ +SPDX-FileCopyrightText: 2025 German Aerospace Center (DLR) + +SPDX-License-Identifier: CC-BY-SA-4.0