Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions docs/adr/0014-standardized-provenance-recording.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
<!--
SPDX-FileCopyrightText: 2025 German Aerospace Center (DLR), Forschungszentrum Jülich, Helmholtz-Zentrum Dresden-Rossendorf

SPDX-License-Identifier: CC-BY-SA-4.0
-->
# Standardized provenance recording

* Status: proposed
* Deciders: sdruskat, skernchen, notactuallyfinn
* Date: 2025-10-17

Technical story:
* https://github.com/softwarepub/hermes/pull/442
* https://github.com/softwarepub/hermes/issues/363

## Context and Problem Statement

To consolidate traceability of the metadata, and resolution based on metadata sources in case of duplicates, etc., we need to record the provenance of metadata values in a __standardized__ way.<br>
Comment thread
notactuallyfinn marked this conversation as resolved.
Outdated
Additionally we use the [PROV-O ontology](https://www.w3.org/TR/prov-o/) and [JSON-LD](https://www.w3.org/TR/json-ld/) and want that HERMES records as much of the provenance as possible to not overcomplicate plugin development.<br>
To do this, we need to specify what provenance information is recorded and how it can be implemented in HERMES to make it easy to use.

## Considered Options

* Provide HERMES API-methods that also document themselves

## Decision Outcome

Chosen option: "Provide HERMES API-methods that also document themselves", because comes out best.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Chosen option: "Provide HERMES API-methods that also document themselves", because comes out best.

As this is in the proposal stage, I think we need more feedback on the actual solution. What are alternatives for this solution? This ADR is already a very good track record of our thinking so far, but I think we need more buy-in before making the actual decision.

That said, this isn't a blocker for the PR, it just says: we need a decision about this further down the line.


## Pros and Cons of the Options

### Provide HERMES API-methods that also document themselves

Provide API-methods for loading, writing, making web requests, etc. that document themselves.<br>
Those methods take also the function that should be used for the task at hand and just define a framework in which we implement the provenance-data recording.<br>
Like so:
```python
class HermesPlugin():
def load(func, path: str, *args, **kwargs):
# TODO: handle and record byte formats properly
with open(path) as fi:
data = func(fi, *args, **kwargs)
prov.record("load", path, func.__name__, data) # also module of func
return data

def write(func, path: str, data, *args, **kwargs):
# TODO: handle and record byte formats properly
with open(path) as fi:
func(fi, data, *args, **kwargs)
prov.record("write", path, func.__name__, data) # also module of func
```

* Good, because allows for recording of provenance information of the plugins
* Good, because it isn't making plugin development harder
* Bad, because API methods may not cover all I/O functionality python provides
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but we can make a best effort to cover as many as make sense.
When we provide the respective API (load_file, make_request, etc.)
and make it usable enough so that developers don't see themselves forced to come up with their own solutions,
I think we can get good coverage.

Probably mostly a question of documentation, plugin templates, etc.

* Bad, because it doesn't cover merging, mapping, etc.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific solution doesn't, but can we build provenance into mapping via the model API (if so, track in an issue/sub-issue to the provenance issue)?
For merging, if this is about merging full models during the hermes process step, I think we can easily also build this into the treatment classes, right? (Again, should be tracked in a respective issue/the over-arching issue for how we plan to do process and postprocess.)


All provenance information should be recorded in the following format:
| design choice | meaning |
| -------------- | ------------------------------- |
| bold text | record those values always |
| solid lining | record as detailed as possible |
| dashed lining | record but without many details |
| grayed out | optional / not always there |
| red background | way of documentation unclear |

![](./hermes-prov-diagram/hermes-prov.svg)<br>
source: [hermes-prov.drawio](./hermes-prov-diagram/hermes-prov.drawio)
Loading