-
Notifications
You must be signed in to change notification settings - Fork 7
Document standardized provenance recording #442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
11af43e
5ee10eb
0777ec6
04a273e
f63ac59
0c3f85f
c963add
0253a3d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
| @@ -0,0 +1,62 @@ | ||||
| <!-- | ||||
| SPDX-FileCopyrightText: 2025 German Aerospace Center (DLR), Forschungszentrum Jülich, Helmholtz-Zentrum Dresden-Rossendorf | ||||
|
|
||||
| SPDX-License-Identifier: CC-BY-SA-4.0 | ||||
| --> | ||||
| # Standardized provenance recording | ||||
|
|
||||
| * Status: proposed | ||||
| * Deciders: sdruskat, skernchen, notactuallyfinn | ||||
| * Date: 2025-10-17 | ||||
|
|
||||
| Technical story: | ||||
| * https://github.com/softwarepub/hermes/pull/442 | ||||
| * https://github.com/softwarepub/hermes/issues/363 | ||||
|
|
||||
| ## Context and Problem Statement | ||||
|
|
||||
| To consolidate traceability of the metadata, and resolution based on metadata sources in case of duplicates, etc., we need to record the provenance of metadata values in a __standardized__ way. | ||||
| Additionally we use the [PROV-O ontology](https://www.w3.org/TR/prov-o/) and [JSON-LD](https://www.w3.org/TR/json-ld/) and want that HERMES records as much of the provenance as possible to not overcomplicate plugin development. | ||||
|
|
||||
| To do this, we need to specify what provenance information is recorded and how it can be implemented in HERMES to make it easy to use. | ||||
|
|
||||
| ## Considered Options | ||||
|
|
||||
| * Provide HERMES API-methods that also document themselves | ||||
|
|
||||
| ## Decision Outcome | ||||
|
|
||||
| Chosen option: "Provide HERMES API-methods that also document themselves", because comes out best. | ||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
As this is in the proposal stage, I think we need more feedback on the actual solution. What are alternatives for this solution? This ADR is already a very good track record of our thinking so far, but I think we need more buy-in before making the actual decision. That said, this isn't a blocker for the PR, it just says: we need a decision about this further down the line. |
||||
|
|
||||
| ## Pros and Cons of the Options | ||||
|
|
||||
| ### Provide HERMES API-methods that also document themselves | ||||
|
|
||||
| Provide API-methods for loading, writing, making web requests, etc. that document themselves.<br> | ||||
| Those methods take also the function that should be used for the task at hand and just define a framework in which we implement the provenance-data recording.<br> | ||||
| Like so: | ||||
| ```python | ||||
| class HermesPlugin(): | ||||
| def load(func, path: str, *args, **kwargs): | ||||
| # TODO: handle and record byte formats properly | ||||
| with open(path) as fi: | ||||
| data = func(fi, *args, **kwargs) | ||||
| prov.record("load", path, func.__name__, data) # also module of func | ||||
| return data | ||||
|
|
||||
| def write(func, path: str, data, *args, **kwargs): | ||||
| # TODO: handle and record byte formats properly | ||||
| with open(path) as fi: | ||||
| func(fi, data, *args, **kwargs) | ||||
| prov.record("write", path, func.__name__, data) # also module of func | ||||
| ``` | ||||
|
|
||||
| * Good, because allows for recording of provenance information of the plugins | ||||
| * Good, because it isn't making plugin development harder | ||||
| * Bad, because API methods may not cover all I/O functionality python provides | ||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True, but we can make a best effort to cover as many as make sense. Probably mostly a question of documentation, plugin templates, etc. |
||||
| * Bad, because it doesn't cover merging, mapping, etc. | ||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This specific solution doesn't, but can we build provenance into mapping via the model API (if so, track in an issue/sub-issue to the provenance issue)? |
||||
|
|
||||
| All provenance information should be recorded in the following format where addtional properties of agents, activites and entities are values of schema and codemeta fields: | ||||
|
notactuallyfinn marked this conversation as resolved.
Outdated
|
||||
|
|
||||
| <br> | ||||
| source: [hermes-prov.drawio](./hermes-prov-diagram/hermes-prov.drawio) | ||||
Uh oh!
There was an error while loading. Please reload this page.