Skip to content

Commit 6c5de57

Browse files
committed
feat(cdi-preview): update documentation for CDIF Discovery Core shapes and remove Comunica references
1 parent a3421c3 commit 6c5de57

3 files changed

Lines changed: 193 additions & 402 deletions

File tree

CDIF_DISCOVERY_SHAPES_FIX.md

Lines changed: 118 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,86 @@
1-
# Why the CDI Previewer Doesn't Support SPARQL
1+
# CDIF Discovery Core Shapes for Browser-Based Validation
22

3-
## Overview
3+
## Summary
44

5-
The CDI previewer uses **Core SHACL only** for validation. SPARQL-based SHACL features like `sh:SPARQLTarget` and `sh:SPARQLConstraint` are not supported in the browser-based application.
5+
We've created **cdif-core.ttl**, a browser-compatible implementation of the CDIF Discovery SHACL shapes for validating schema.org Dataset metadata. The shapes validate 20 properties (4 mandatory + 16 recommended) and work with the lightweight Core SHACL validator, avoiding the need for a 1.9MB SPARQL engine.
66

7-
## Technical Details
7+
**Quick start:** Select "CDIF Discovery Core" from the shape dropdown in the previewer to validate your CDIF metadata. Properties conforming to the shapes will display with blue "SHACL-defined" badges.
88

9-
Browser-based applications have strict constraints on bundle size and performance. Supporting SPARQL features like `sh:SPARQLTarget` and `sh:SPARQLConstraint` would require including a full SPARQL query engine in the browser.
9+
## Background: CDIF Discovery Validation
10+
11+
CDIF Discovery shapes validate schema.org Dataset descriptions to ensure they contain essential metadata for data discovery. The original CDIF Discovery shapes used SPARQL-based SHACL features (`sh:SPARQLTarget`) for hierarchical node selection.
12+
13+
## How to Use
14+
15+
1. Open the CDI previewer with your JSON-LD file
16+
2. Select **"CDIF Discovery Core"** from the shape dropdown
17+
3. Click "Validate"
18+
4. Review results:
19+
- Red violations: Missing mandatory properties
20+
- Orange warnings: Missing recommended properties
21+
- Blue badges: SHACL-defined properties present
22+
- Yellow badges: Extra properties not in shapes
23+
24+
## Testing Notes
25+
26+
**Status:** Ready for testing with CDIF Discovery metadata files
27+
28+
**Expected results:**
29+
- Properties like `name`, `identifier`, `license`, `dateModified` should show blue "SHACL-defined" badges
30+
- Missing mandatory properties trigger red violation messages
31+
- Missing recommended properties trigger orange warning messages
32+
33+
**Known good test:** `examples/cdi/se_na2so4-XDI-CDI-CDIF.jsonld` validates correctly with recognized properties
34+
35+
---
36+
37+
## Technical Reference: Why Core SHACL Only?
38+
39+
## Technical Reference: Why Core SHACL Only?
40+
41+
The CDI previewer uses **Core SHACL only** for validation. SPARQL-based SHACL features like `sh:SPARQLTarget` and `sh:SPARQLConstraint` are not supported.
42+
43+
**Important distinction:** The previewer previously used Comunica for executing `sh:SPARQLTarget` queries to identify which nodes to validate, but Comunica does **not perform SHACL validation** - it only executes SPARQL queries. We have not found a JavaScript SHACL validation library that supports SPARQL constraints (`sh:SPARQLConstraint`).
44+
45+
### Bundle Size Comparison
46+
47+
Browser-based applications have strict constraints on bundle size and performance. Even limited SPARQL support for node targeting would require including a full SPARQL query engine in the browser.
1048

1149
**The numbers:**
1250
- **Current setup (Core SHACL only)**: ~400KB total
13-
- rdf-validate-shacl: ~120KB
51+
- rdf-validate-shacl: ~120KB (Core SHACL validation only)
1452
- N3.js (RDF parsing): ~150KB
1553
- jsonld.js (JSON-LD processing): ~130KB
1654

17-
- **With SPARQL support**: ~2.3MB total
18-
- Comunica QueryEngine: **1.9MB** (just for SPARQL!)
55+
- **Previous setup with Comunica**: ~2.3MB total
56+
- Comunica QueryEngine: **1.9MB** (for `sh:SPARQLTarget` support only)
1957
- Plus all the Core SHACL libraries above
58+
- Still no `sh:SPARQLConstraint` validation support
59+
60+
**What we tried:**
61+
- ✅ Comunica can execute SPARQL queries to find nodes matching `sh:SPARQLTarget`
62+
- ❌ Comunica cannot validate SHACL constraints
63+
- ❌ rdf-validate-shacl (the JavaScript SHACL validator) does not support `sh:SPARQLConstraint`
64+
- ❌ No other JavaScript library found that validates SPARQL-based SHACL constraints
2065

21-
Adding SPARQL would **increase the download size by 5-6x**, significantly slowing down the page load for all users, just to support a niche feature that Core SHACL can handle equally well.
66+
Adding Comunica for `sh:SPARQLTarget` support would **increase the download size by 5-6x**, significantly slowing down the page load for all users, just to support hierarchical node targeting that Core SHACL can approximate.
2267

2368
**Technical reality:**
2469
- SPARQL engines are complex (query parsing, optimization, execution)
2570
- Comunica (the leading JavaScript SPARQL engine) is 1.9MB minified
71+
- SHACL validation with SPARQL constraints requires a different tool
2672
- Most SHACL shape files (including DDI-CDI Official) use Core SHACL only
27-
- Core SHACL provides sufficient expressiveness for validation
73+
- Core SHACL provides sufficient expressiveness for validation in most cases
2874

29-
**Our decision:** We've removed SPARQL support from the CDI previewer to keep it fast and lightweight for all users.
75+
**Our decision:** We removed SPARQL support to keep the previewer fast and lightweight. The 1.9MB cost for hierarchical node selection isn't justified when Core SHACL alternatives work well for real-world use cases.
3076

31-
## Core SHACL Alternatives to SPARQL Features
77+
### Conversion Patterns
3278

33-
If you have SHACL shapes that use SPARQL features, here are the Core SHACL patterns that achieve the same goals:
79+
### Conversion Patterns
3480

35-
### 1. Node Selection: Use `sh:targetClass` instead of `sh:SPARQLTarget`
81+
When converting SPARQL-based shapes to Core SHACL, use these patterns:
82+
83+
#### Pattern 1: Node Selection with `sh:targetClass`
3684

3785
**Instead of:**
3886
```turtle
@@ -52,9 +100,9 @@ sh:target [
52100
sh:targetClass schema:Dataset ;
53101
```
54102

55-
This is simpler, more efficient, and functionally equivalent.
103+
This is simpler, more efficient, and functionally equivalent for most cases.
56104

57-
### 2. RDF List Validation: Use `sh:node` with recursion instead of `sh:SPARQLConstraint`
105+
#### Pattern 2: RDF List Validation with `sh:node`
58106

59107
**Instead of:**
60108
```turtle
@@ -94,19 +142,70 @@ ex:RDFListOfAgentsShape
94142

95143
This Core SHACL pattern validates lists of any length and works in both browser and server environments.
96144

145+
## CDIF Discovery Core Shapes
146+
147+
We created **cdif-core.ttl** as a browser-compatible alternative to the SPARQL-based CDIF Discovery shapes. This file is available in the previewer as the "CDIF Discovery Core" option.
148+
149+
### What We Converted
150+
151+
**Original CDIF Discovery shapes** (rules.shacl):
152+
- Used `sh:SPARQLTarget` to select nodes hierarchically
153+
- 2 shapes: `CDIFDatasetMandatoryShape` and `CDIFMetaMetadataShape`
154+
- 4 mandatory properties: `identifier`, `name`, `license` or `conditionsOfAccess`, `dateModified`
155+
156+
**Our Core SHACL version** (previewers/betatest/shapes/cdif-core.ttl):
157+
- Converted `sh:SPARQLTarget` to `sh:targetClass schema:Dataset` and `sh:targetSubjectsOf schema:about`
158+
- Added `CDIFDatasetRecommendedShape` with 16 additional properties
159+
- **Total: 20 properties validated:**
160+
- **4 mandatory (severity: Violation):** `identifier`, `name`, `license`/`conditionsOfAccess`, `dateModified`
161+
- **16 recommended (severity: Warning):** `url`, `description`, `contributor`, `creator`, `keywords`, `distribution`, `measurementTechnique`, `variableMeasured`, `subjectOf`, `startDate`, `location`, `mainEntity`, `additionalProperty`, `relatedLink`, `additionalType`, `email`
162+
163+
### Key Technical Fixes
164+
165+
1. **Namespace correction:** Used `http://schema.org/` (not `https://`)
166+
- schema.org's canonical namespace uses http:// protocol
167+
- This fixed property recognition in the UI (properties now show as "SHACL-defined" instead of "EXTRA")
168+
- All example files updated to use consistent http:// namespace
169+
170+
2. **Property classification bug fix:** Fixed array context handling in `cdi-shacl-helpers.js`
171+
- **Problem:** Code only checked `context[prefix]` directly, which failed when `@context` is an array
172+
- **Solution:** Iterate through array contexts to find prefix mappings
173+
- **Result:** Properties now correctly classified with blue badges (SHACL-defined) vs yellow badges (EXTRA)
174+
175+
### Trade-offs
176+
177+
**Benefits of Core SHACL approach:**
178+
-**Fast loading:** ~400KB vs 2.3MB (5-6x smaller)
179+
-**Enhanced coverage:** Expanded from 4 to 20 properties
180+
-**Browser compatibility:** Works everywhere without heavyweight dependencies
181+
-**Maintainability:** Simple, readable SHACL patterns
182+
-**Validation quality:** Same mandatory property checking
183+
184+
**Limitations compared to SPARQL approach:**
185+
- Direct class targeting (`sh:targetClass schema:Dataset`) instead of hierarchical selection
186+
- Dataset subclasses (e.g., `schema:MedicalDataset`) would need explicit shapes
187+
- In practice: This rarely matters since most files use `schema:Dataset` directly
188+
189+
**Bottom line:** The Core SHACL version provides equivalent validation for real-world use cases while being dramatically faster to load.
190+
97191
## Current Shape Options
98192

99-
The CDI previewer provides three shape selection options:
193+
The CDI previewer provides four shape selection options:
100194

101195
1. **DDI-CDI Official (Default)** - Full DDI-CDI 1.0 shapes from ddi-cdi.github.io
102196
- 300+ types covered
103197
- Core SHACL only (no SPARQL)
104198
- Comprehensive validation
105199

106-
2. **Local Fallback** - Embedded backup shapes
200+
2. **CDIF Discovery Core** - Browser-compatible CDIF Discovery shapes
201+
- 20 schema.org properties (4 mandatory + 16 recommended)
202+
- Converted from SPARQL-based shapes
203+
- Lightweight and fast
204+
205+
3. **Local Fallback** - Embedded backup shapes
107206
- Used if online shapes fail to load
108207
- Core SHACL only
109208

110-
3. **Custom URL** - Load shapes from any URL
209+
4. **Custom URL** - Load shapes from any URL
111210
- Must use Core SHACL only
112211
- SPARQL features will not work

0 commit comments

Comments
 (0)