Skip to content

Commit 8c4f529

Browse files
authored
Merge pull request #27 from softwarepub/create-validation-report
Create basic human-readable validation reports.
2 parents 71e12bd + 44cfb30 commit 8c4f529

7 files changed

Lines changed: 212 additions & 60 deletions

File tree

README.md

Lines changed: 42 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -6,48 +6,20 @@ SPDX-FileContributor: David Pape
66

77
# Software CaRD Policies
88

9-
This repository contains example policies developed as part of the Software CaRD project, as well as a validator tool.
10-
11-
## Conventions
12-
13-
All examples in this repository use the following namespace prefix bindings:
14-
15-
```turtle
16-
@prefix codemeta: <https://doi.org/10.5063/schema/codemeta-2.0#> .
17-
@prefix owl: <http://www.w3.org/2002/07/owl#> .
18-
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
19-
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
20-
@prefix sc: <https://schema.software-metadata.pub/software-card/2025-01-01/#> .
21-
@prefix scex: <https://schema.software-metadata.pub/software-card/2025-01-01/examples/#> .
22-
@prefix scimpl: <https://schema.software-metadata.pub/software-card/2025-01-01/implementation/#> .
23-
@prefix schema: <https://schema.org/> .
24-
@prefix sh: <http://www.w3.org/ns/shacl#> .
25-
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
26-
```
27-
28-
For Software CaRD, the prefixes
29-
[`sc:`](https://schema.software-metadata.pub/software-card/2025-01-01/#),
30-
[`scex:`](https://schema.software-metadata.pub/software-card/2025-01-01/examples/#), and
31-
[`scimpl:`](https://schema.software-metadata.pub/software-card/2025-01-01/implementation/#)
32-
were established and are used for the following purposes:
33-
34-
- `sc:` contains terms exposed to users
35-
- `scex:` contains example uses of `sc:` and `sh:` terms
36-
- `scimpl:` contains internal implementation details
37-
38-
The associated IRIs currently don't exist.
39-
A [search on prefix.cc](https://prefix.cc/sc) reveals prior usage of the prefix `sc:` by projects which seem to be
40-
defunct.
9+
This repository contains the `software_card_policies` Python library as well as the associated command line program
10+
`sc-validate` and example policies.
11+
The software was written as part of the [Software CaRD](https://helmholtz-metadaten.de/en/inf-projects/softwarecard)
12+
project.
4113

4214
## `sc-validate`
4315

44-
A program that validates a given metadata file using a set of configurable policies.
16+
A command line program that validates a given metadata file using a set of configurable policies.
4517

46-
The selection of policies to use can be configured via [`config.toml`](config.toml).
18+
The selection of policies can be configured via [`config.toml`](config.toml).
4719
Policies can be loaded using any of the protocols supported by
4820
[RDFlib's `Graph.parse` method](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.graph.Graph.parse)
4921
(e.g. local files, http, ...).
50-
All of the given policies are loaded and merged into one RDF graph (union of all triples of the parts).
22+
All of the given policies are loaded and unioned into one RDF graph.
5123

5224
Policies can be implemented in a configurable fashion by defining an `sc:Parameter` and using it in place of a literal
5325
or list.
@@ -66,7 +38,7 @@ python -m pip install -e .
6638

6739
### Run
6840

69-
Start a webserver hosting the policy files (run it in the background or use a separate terminal window):
41+
Start a webserver hosting the example policy files (run it in the background or use a separate terminal window):
7042

7143
```bash
7244
python -m http.server -b 127.0.0.1 -d examples/
@@ -79,13 +51,45 @@ sc-validate examples/data/hermes.ttl
7951
```
8052

8153
This will validate [`hermes.ttl`](examples/data/hermes.ttl) using the policies defined in [`config.toml`](config.toml)
82-
and print the result to the screen.
83-
If run in debug mode (with `--debug`), the following files are written to the current working directory:
54+
and print a validation report to the screen.
55+
If run in debug mode (with `--debug`), the report is more verbose, and the following files are written to the current
56+
working directory:
8457

8558
- `debug-input-data.ttl`: the input data
8659
- `debug-shapes-processed.ttl`: the parameterized and combined policies
8760
- `debug-validation-report.ttl`: the detailed SHACL validation report (`sh:ValidationReport`)
8861

62+
## Conventions
63+
64+
All examples in this repository use the following namespace prefix bindings:
65+
66+
```turtle
67+
@prefix codemeta: <https://doi.org/10.5063/schema/codemeta-2.0#> .
68+
@prefix owl: <http://www.w3.org/2002/07/owl#> .
69+
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
70+
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
71+
@prefix sc: <https://schema.software-metadata.pub/software-card/2025-01-01/#> .
72+
@prefix scex: <https://schema.software-metadata.pub/software-card/2025-01-01/examples/#> .
73+
@prefix scimpl: <https://schema.software-metadata.pub/software-card/2025-01-01/implementation/#> .
74+
@prefix schema: <https://schema.org/> .
75+
@prefix sh: <http://www.w3.org/ns/shacl#> .
76+
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
77+
```
78+
79+
For Software CaRD, the prefixes
80+
[`sc:`](https://schema.software-metadata.pub/software-card/2025-01-01/#),
81+
[`scex:`](https://schema.software-metadata.pub/software-card/2025-01-01/examples/#), and
82+
[`scimpl:`](https://schema.software-metadata.pub/software-card/2025-01-01/implementation/#)
83+
were established and are used for the following purposes:
84+
85+
- `sc:` contains terms exposed to users
86+
- `scex:` contains example uses of `sc:` and `sh:` terms
87+
- `scimpl:` contains internal implementation details
88+
89+
The associated IRIs currently don't exist.
90+
A [search on prefix.cc](https://prefix.cc/sc) reveals prior usage of the prefix `sc:` by projects which seem to be
91+
defunct.
92+
8993
## Acknowledgments
9094

9195
[Software CaRD](https://helmholtz-metadaten.de/en/inf-projects/softwarecard) (`ZT-I-PF-3-080`) is funded by the

pyproject.toml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,9 @@ dependencies = [
1414
"rdflib>=7.1.1",
1515
"pydantic>=2.9.2",
1616
"pydantic-settings[toml]>=2.6.1",
17+
"jinja2>=3.1.6",
1718
]
18-
requires-python = ">=3.10"
19+
requires-python = ">=3.10,<3.13" # problems compiling pyduktape2 when using Python 3.13
1920
authors = [
2021
{name = "David Pape", email = "d.pape@hzdr.de"},
2122
]
@@ -30,11 +31,11 @@ classifiers = [
3031
[project.scripts]
3132
sc-validate = "sc_validate.__main__:main"
3233

33-
[tool.ruff]
34-
target-version = "py310"
35-
3634
[tool.ruff.lint]
3735
select = ["E", "F", "I", "N", "W"]
3836

37+
[tool.ruff.lint.per-file-ignores]
38+
"src/sc_validate/namespaces.py" = ["N815"] # mixed case class parameters
39+
3940
[tool.setuptools_scm]
4041
version_file = "src/sc_validate/_version.py"

src/sc_validate/__main__.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
read_rdf_resource,
1818
validate_graph,
1919
)
20+
from sc_validate.report import create_report
2021

2122

2223
def path_or_url(path: str) -> pathlib.Path | str:
@@ -74,8 +75,10 @@ def main():
7475
if arguments.debug:
7576
validation_graph.serialize("debug-validation-report.ttl", "turtle")
7677

78+
report = create_report(validation_graph, debug=arguments.debug)
79+
print(report)
80+
7781
if not conforms:
78-
print("validation failed", file=sys.stderr)
7982
sys.exit(1)
8083

8184

src/sc_validate/namespaces.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# SPDX-FileCopyrightText: 2025 Helmholtz-Zentrum Dresden - Rossendorf (HZDR)
2+
# SPDX-License-Identifier: Apache-2.0
3+
# SPDX-FileContributor: David Pape
4+
5+
from rdflib.namespace import OWL, RDF, RDFS, SDO, SH, XSD, DefinedNamespace, Namespace
6+
from rdflib.term import URIRef
7+
8+
9+
class SC(DefinedNamespace):
10+
"""The Software CaRD schema."""
11+
12+
_NS = Namespace("https://schema.software-metadata.pub/software-card/2025-01-01/#")
13+
14+
Parameter: URIRef
15+
16+
parameterType: URIRef
17+
parameterConfigPath: URIRef
18+
parameterDefaultValue: URIRef
19+
20+
21+
class SCIMPL(DefinedNamespace):
22+
"""Software CaRD implementation details."""
23+
24+
_NS = Namespace(
25+
"https://schema.software-metadata.pub/software-card/2025-01-01/implementation/#"
26+
)
27+
28+
29+
class SCEX(DefinedNamespace):
30+
"""Software CaRD example components."""
31+
32+
_NS = Namespace(
33+
"https://schema.software-metadata.pub/software-card/2025-01-01/examples/#"
34+
)
35+
36+
37+
CODEMETA = Namespace("https://doi.org/10.5063/schema/codemeta-2.0#")
38+
"""The Codemeta schema. See: https://codemeta.github.io/"""
39+
40+
41+
PREFIXES = {
42+
"codemeta": CODEMETA,
43+
"owl": OWL,
44+
"rdf": RDF,
45+
"rdfs": RDFS,
46+
"sc": SH,
47+
"scex": SCEX,
48+
"scimpl": SCIMPL,
49+
"schema": SDO,
50+
"sh": SH,
51+
"xsd": XSD,
52+
}
53+
"""Default namespace prefix bindings."""

src/sc_validate/rdf.py

Lines changed: 3 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,29 +8,15 @@
88
from pyshacl import validate
99
from rdflib import BNode, Graph, Literal
1010
from rdflib.collection import Collection
11-
from rdflib.namespace import RDF, Namespace
11+
from rdflib.namespace import RDF
1212

13-
# Just for better readability of serialized linked data.
14-
BINDINGS = {
15-
"codemeta": "https://doi.org/10.5063/schema/codemeta-2.0#",
16-
"owl": "http://www.w3.org/2002/07/owl#",
17-
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
18-
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
19-
"sc": "https://schema.software-metadata.pub/software-card/2025-01-01/#",
20-
"scex": "https://schema.software-metadata.pub/software-card/2025-01-01/examples/#",
21-
"scimpl": "https://schema.software-metadata.pub/software-card/2025-01-01/implementation/#",
22-
"schema": "https://schema.org/",
23-
"sh": "http://www.w3.org/ns/shacl#",
24-
"xsd": "http://www.w3.org/2001/XMLSchema#",
25-
}
26-
27-
SC = Namespace("https://schema.software-metadata.pub/software-card/2025-01-01/#")
13+
from sc_validate.namespaces import PREFIXES, SC
2814

2915

3016
def read_rdf_resource(source: pathlib.Path | str) -> Graph:
3117
graph = Graph()
3218
graph.parse(source)
33-
for prefix, iri in BINDINGS.items():
19+
for prefix, iri in PREFIXES.items():
3420
graph.bind(prefix, iri)
3521
return graph
3622

src/sc_validate/report.py

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# SPDX-FileCopyrightText: 2025 Helmholtz-Zentrum Dresden - Rossendorf (HZDR)
2+
# SPDX-License-Identifier: Apache-2.0
3+
# SPDX-FileContributor: David Pape
4+
5+
from dataclasses import dataclass
6+
from enum import Enum
7+
from typing import List
8+
9+
from jinja2 import Environment, PackageLoader, select_autoescape
10+
from rdflib import Graph, Literal
11+
from rdflib.namespace import RDF, SH
12+
from rdflib.term import URIRef
13+
14+
15+
# TODO: This only works for constraints of type NodeShape. Is this enough?
16+
@dataclass
17+
class Policy:
18+
name: str
19+
description: str
20+
21+
@classmethod
22+
def from_graph(cls, reference: URIRef, graph: Graph):
23+
name = graph.value(reference, SH.name, None)
24+
description = graph.value(reference, SH.description, None)
25+
return cls(name=name, description=description)
26+
27+
28+
class Severity(Enum):
29+
INFO = 1
30+
WARNING = 2
31+
VIOLATION = 3
32+
OTHER = 4
33+
34+
def __str__(self):
35+
return self.name.title()
36+
37+
@classmethod
38+
def from_graph(cls, reference: URIRef, graph: Graph):
39+
if reference == SH.Info:
40+
return cls.INFO
41+
if reference == SH.Warning:
42+
return cls.WARNING
43+
if reference == SH.Violation:
44+
return cls.VIOLATION
45+
return cls.OTHER
46+
47+
48+
@dataclass
49+
class ValidationResult:
50+
severity: Severity
51+
message: str
52+
source_policy: Policy
53+
54+
@classmethod
55+
def from_graph(cls, reference: URIRef, graph: Graph):
56+
severity = graph.value(reference, SH.resultSeverity, None)
57+
message = graph.value(reference, SH.resultMessage, None)
58+
source_policy = graph.value(reference, SH.sourceShape, None)
59+
return cls(
60+
severity=Severity.from_graph(severity, graph),
61+
message=message,
62+
source_policy=Policy.from_graph(source_policy, graph),
63+
)
64+
65+
66+
@dataclass
67+
class ValidationReport:
68+
conforms: bool
69+
results: List[ValidationResult]
70+
71+
@classmethod
72+
def from_graph(cls, reference: URIRef, graph: Graph):
73+
conforms = (reference, SH.conforms, Literal(True)) in graph
74+
results = graph.objects(reference, SH.result)
75+
return cls(
76+
conforms=conforms,
77+
results=[ValidationResult.from_graph(result, graph) for result in results],
78+
)
79+
80+
81+
def create_report(validation_graph: Graph, debug=False) -> str:
82+
shacl_report, *_ = validation_graph.subjects(RDF.type, SH.ValidationReport)
83+
validation_report = ValidationReport.from_graph(shacl_report, validation_graph)
84+
environment = Environment(
85+
loader=PackageLoader("sc_validate"), autoescape=select_autoescape()
86+
)
87+
template = environment.get_template("report.j2")
88+
return template.render(validation_report=validation_report, debug=debug)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{#
2+
SPDX-FileCopyrightText: 2025 Helmholtz-Zentrum Dresden - Rossendorf (HZDR)
3+
SPDX-License-Identifier: Apache-2.0
4+
SPDX-FileContributor: David Pape
5+
#}
6+
{%- if validation_report.conforms -%}
7+
Validation succeeded!
8+
{%- else -%}
9+
Validation failed!
10+
11+
{% for validation_result in validation_report.results %}
12+
{{ validation_result.severity }}:
13+
Breached policy: {{ validation_result.source_policy.name }}
14+
{{ validation_result.source_policy.description }}
15+
{% if debug %}Debug: {{ validation_result.message }}{% endif %}
16+
{% endfor %}
17+
{%- endif -%}

0 commit comments

Comments
 (0)