diff --git a/docs/adr/0014-standardized-provenance-recording.md b/docs/adr/0014-standardized-provenance-recording.md new file mode 100644 index 00000000..4eb6df4e --- /dev/null +++ b/docs/adr/0014-standardized-provenance-recording.md @@ -0,0 +1,62 @@ + +# Standardized provenance recording + +* Status: proposed +* Deciders: sdruskat, skernchen, notactuallyfinn +* Date: 2025-10-17 + +Technical story: +* https://github.com/softwarepub/hermes/pull/442 +* https://github.com/softwarepub/hermes/issues/363 + +## Context and Problem Statement + +To consolidate traceability of the metadata, and resolution based on metadata sources in case of duplicates, etc., we need to record the provenance of metadata values in a __standardized__ way. +To achieve this, we use the [PROV-O ontology](https://www.w3.org/TR/prov-o/) serialized as [JSON-LD](https://www.w3.org/TR/json-ld/). Additionally, HERMES should make it possible to record as much of the provenance as possible *centrally*, i.e., as part of the core codebase. This is to keep plugin developers from having to supply their own provenance solutions. + +To do this, we need to specify what provenance information is recorded and how it can be implemented in HERMES to make it easy to use. + +## Considered Options + +* Provide HERMES API-methods that also document themselves + +## Decision Outcome + +Chosen option: + +## Pros and Cons of the Options + +### Provide HERMES API-methods that also document themselves + +Provide API-methods for loading, writing, making web requests, etc. that document themselves. +Those methods take also the function that should be used for the task at hand and just define a framework in which we implement the provenance-data recording. +Like so: +```python +class HermesPlugin(): + def load(func, path: str, *args, **kwargs): + # TODO: handle and record byte formats properly + with open(path) as fi: + data = func(fi, *args, **kwargs) + prov.record("load", path, func.__name__, data) # also module of func + return data + + def write(func, path: str, data, *args, **kwargs): + # TODO: handle and record byte formats properly + with open(path) as fi: + func(fi, data, *args, **kwargs) + prov.record("write", path, func.__name__, data) # also module of func +``` + +* Good, because allows for recording of provenance information of the plugins +* Good, because it isn't making plugin development harder +* Bad, because API methods may not cover all I/O functionality python provides +* Bad, because it doesn't cover merging, mapping, etc. + +All provenance information should be recorded in the following format where addtional properties of agents, activites and entities are values of suitable vocabularies (from Schema.org, CodeMeta and potentially other schemas): + + +source: [hermes-prov.drawio](./hermes-prov-diagram/hermes-prov.drawio) diff --git a/docs/adr/hermes-prov-diagram/hermes-prov.drawio b/docs/adr/hermes-prov-diagram/hermes-prov.drawio new file mode 100644 index 00000000..4173db36 --- /dev/null +++ b/docs/adr/hermes-prov-diagram/hermes-prov.drawio @@ -0,0 +1,2138 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/adr/hermes-prov-diagram/hermes-prov.drawio.license b/docs/adr/hermes-prov-diagram/hermes-prov.drawio.license new file mode 100644 index 00000000..2e24f7a4 --- /dev/null +++ b/docs/adr/hermes-prov-diagram/hermes-prov.drawio.license @@ -0,0 +1,3 @@ +SPDX-FileCopyrightText: 2025 German Aerospace Center (DLR) + +SPDX-License-Identifier: CC-BY-SA-4.0 diff --git a/docs/adr/hermes-prov-diagram/hermes-prov.svg b/docs/adr/hermes-prov-diagram/hermes-prov.svg new file mode 100644 index 00000000..dbe5a2ae --- /dev/null +++ b/docs/adr/hermes-prov-diagram/hermes-prov.svg @@ -0,0 +1,4 @@ + + + +actedOnBehalfOfwasDerivedFromwasAssociatedWithwasAttributedTowasAttributedTowasAttributedTowasAttributedTowasAssociatedWithactedOnBehalfOfcff-pluginharvest-pluginversion, settingsCITATION.cfftext, path uriwasAttributedToCITATION pythonsoftware-metadatawasAssociatedWithloadfunc, args, kwargsmapwasDerivedFromwasDerivedFromwasGeneratedBywasGeneratedByusedusedwasAttributedTowasAssociatedWith.hermes/harvest/cff/codemeta.jsonwasDerivedFromwasGeneratedByused.hermes/harvest/cff/context.jsontext, path uri.hermes/harvest/cff/expanded.jsontext, path uriwasDerivedFromwasDerivedFromwasGeneratedBywasGeneratedByloadwasDerivedFromwasGeneratedByusedCITATION datawritedeposit 2deposit-pluginversion, settingsupdate metadatausedwasAssociatedWithwasDerivedFromdeposit data after deposit 2wasGeneratedByusedwasAttributedToactedOnBehalfOfpyproject-pluginharvest-pluginversion, settingshermesversionpyproject.tomltext, path uri.hermes/harvest/pyproject/codemeta.jsonwasAttributedTopyproject pythonsoftware-metadatawasAssociatedWithloadfunc, args, kwargsmapwritewasDerivedFromwasDerivedFromwasDerivedFromwasGeneratedBywasGeneratedByusedusedusedwasAttributedTowasAssociatedWith.hermes/harvest/pyproject/expanded.jsontext, path uriwasDerivedFromwasGeneratedByprocessprocess-"plugin"strategiesactedOnBehalfOf.hermes/process/result/codemeta.jsonpyproject dataprocessed dataloadmerge and processwritewasDerivedFromwasDerivedFromwasDerivedFromwasGeneratedBywasGeneratedBywasGeneratedByusedusedwasAttributedTowasAssociatedWithwasAssociatedWith.hermes/process/result/context.jsontext, path uri.hermes/process/result/expanded.jsontext, path uriwasDerivedFromwasGeneratedBywasGeneratedBywasAttributedTowasAttributedTousedHERMESCacheactedOnBehalfOfuser addedmerging strategiesusedcuratecurate-pluginversion, settingsactedOnBehalfOfwasAttributedToprocessed datawasAssociatedWithloadwasDerivedFromwasGeneratedByusedUseractedOnBehalfOfcurated datawasDerivedFromwasInfluencedBywritewasDerivedFromusedwasAssociatedWith.hermes/curate/result/context.jsontext, path uri.hermes/curate/result/expanded.jsontext, path uriwasDerivedFromwasGeneratedBywasGeneratedBywasAttributedTowasAttributedTodeposit 1deposit-pluginversion, settingsactedOnBehalfOfwasAttributedTocurated datawasAssociatedWithloadwasDerivedFromwasGeneratedByusedwritewasDerivedFromusedwasAssociatedWith.hermes/deposit/result/context.jsontextpath uri.hermes/deposit/result/expanded.jsontextpath uriwasDerivedFromwasGeneratedBywasGeneratedBywasAttributedTowasAttributedToupdate metadatausedwasAssociatedWithactedOnBehalfOfwasAttributedTowasAssociatedWithwasAttributedTowasAttributedTowasAttributedTowasAssociatedWithwasAttributedTodeposit datawasAssociatedWithloadwasGeneratedByusedwasDerivedFromcff-pluginpostprocess-pluginversion, settingsmapwasAssociatedWithusedwasAttributedTonew data for CITATION.cffwasDerivedFromwasGeneratedByCITATION.cfftext, path uriwasAssociatedWithloadfunc, args, kwargsusedwasAttributedTodata from CITATION.cffwasDerivedFromwasGeneratedBymergeusedwasAssociatedWithusednew CITATION.cff datawasDerivedFromwasGeneratedBywasAttributedTowasDerivedFromCITATION.cffpath uriwasAssociatedWithwritefunc, args, kwargsusedwasDerivedFromwasGeneratedBywasDerivedFromdeposit data after deposit 1wasGeneratedBypyproject-pluginpostprocess-pluginversion, settingsmapwasAssociatedWithusedwasAttributedTonew data for pyproject.tomlwasDerivedFromwasGeneratedBypyproject.tomltext, path uriwasAssociatedWithloadfunc, args, kwargsusedwasAttributedTodata from pyproject.tomlwasDerivedFromwasGeneratedBymergeusedusednew pyproject.toml datawasDerivedFromwasGeneratedBywasAttributedTowasDerivedFrompyproject.tomlpath uriwasAssociatedWithwritefunc, args, kwargsusedwasDerivedFromwasGeneratedByactedOnBehalfOfHARVESTPROCESSCURATEDEPOSITPOST-PROCESSLegenddesignmeaningprovenance: Agentprovenance: Entityprovenance: Activitybold textrecord those properties alwayssolid liningrecord as detailed as possibledashed liningrecord without many detailsgrayed outoptional / not always existentred backgrounddocumentation unclearnamepropertiesnamepropertiesnameproperties.hermes/harvest/pyproject/context.jsontext, path uriwasDerivedFromwasGeneratedBywasGeneratedBywasAttributedTowasDerivedFromwasAttributedTowasAssociatedWith \ No newline at end of file diff --git a/docs/adr/hermes-prov-diagram/hermes-prov.svg.license b/docs/adr/hermes-prov-diagram/hermes-prov.svg.license new file mode 100644 index 00000000..2e24f7a4 --- /dev/null +++ b/docs/adr/hermes-prov-diagram/hermes-prov.svg.license @@ -0,0 +1,3 @@ +SPDX-FileCopyrightText: 2025 German Aerospace Center (DLR) + +SPDX-License-Identifier: CC-BY-SA-4.0