|
| 1 | +# `cdif_example.jsonld` – Minimal CDIF Discovery Example |
| 2 | + |
| 3 | +This file is a **small, readable CDIF Discovery Core example** that mirrors the way we expect CDIF to be used with `schema.org` datasets and the CDIF Discovery Core SHACL shapes. |
| 4 | + |
| 5 | +It is intended as a reference example for Steve and others when aligning shapes and JSON-LD instance documents. |
| 6 | + |
| 7 | +## Structure |
| 8 | + |
| 9 | +The example has a single `schema:Dataset` node: |
| 10 | + |
| 11 | +- `@id`: a stable HTTPS URI for the dataset. |
| 12 | +- `@type`: `schema:Dataset`. |
| 13 | +- Core CDIF Discovery properties on the dataset: |
| 14 | + - `schema:name` – title of the dataset. |
| 15 | + - `schema:identifier` – local identifier string. |
| 16 | + - `schema:description` – human-readable description (now explicitly shaped as `cdifd:descriptionProperty`). |
| 17 | + - `schema:creator` – a `schema:Person` with `schema:name`. |
| 18 | + - `schema:datePublished` – ISO8601 date string (YYYY-MM-DD). |
| 19 | + - `schema:license` – IRI for the license. |
| 20 | + - `schema:keywords` – array of strings. |
| 21 | + - `schema:url` – landing page URL. |
| 22 | + - `schema:distribution` – a `schema:DataDownload` with `schema:name`, `schema:contentUrl`, `schema:encodingFormat`. |
| 23 | + |
| 24 | +These correspond directly to CDIF Discovery Core shapes in `CDIF-Discovery-Core-Shapes.ttl`: |
| 25 | + |
| 26 | +- `cdifd:resourceIdentifierProperty` – `schema:identifier`. |
| 27 | +- `cdifd:nameProperty` – `schema:name`. |
| 28 | +- `cdifd:descriptionProperty` – `schema:description`. |
| 29 | +- `cdifd:responsiblePartyProperty` – `schema:creator`. |
| 30 | +- `cdifd:datePublishedProperty` – `schema:datePublished`. |
| 31 | +- `cdifd:rightsProperty` – `schema:license` (via the `license / conditionsOfAccess` alternative path). |
| 32 | +- `cdifd:keywordsResourceProperty` – `schema:keywords`. |
| 33 | +- `cdifd:getResourceProperty` – `schema:url` / `schema:distribution`. |
| 34 | +- `cdifd:distributionProperty` – `schema:distribution`. |
| 35 | + |
| 36 | +Because the example uses `"schema": "https://schema.org/"` in its `@context`, the expanded IRIs are exactly: |
| 37 | + |
| 38 | +- `https://schema.org/name` |
| 39 | +- `https://schema.org/identifier` |
| 40 | +- `https://schema.org/description` |
| 41 | +- etc. |
| 42 | + |
| 43 | +The CDIF shapes now also use **HTTPS schema.org** consistently, so SPARQL and SHACL can match these predicates exactly. |
| 44 | + |
| 45 | +## How the previewer classifies fields |
| 46 | + |
| 47 | +The CDI Previewer does the following: |
| 48 | + |
| 49 | +1. **Normalize to `@graph`** if needed (here the file already has `@graph`). |
| 50 | +2. **Expand JSON-LD** to get full IRIs for properties (`https://schema.org/name`, etc.). |
| 51 | +3. **Run SPARQL targets** from CDIF Discovery shapes: |
| 52 | + - The `cdifd:CDIFDatasetRecommendedShape` has a `sh:SPARQLTarget` that selects all `schema:Dataset` instances. |
| 53 | +4. **Classify properties** for each dataset node: |
| 54 | + - It finds the applicable NodeShape(s) (e.g. `cdifd:CDIFDatasetRecommendedShape`). |
| 55 | + - For each `sh:property` in that NodeShape, it looks at the `sh:path` and compares it to the expanded property URI. |
| 56 | + - If they match, the field is marked **REQUIRED** (if `sh:minCount > 0`) or **OPTIONAL**; otherwise it is **EXTRA**. |
| 57 | + |
| 58 | +For `cdif_example.jsonld`, all of the core fields listed above show up as **blue** (SHACL-defined) in the previewer, with **REQUIRED** or **OPTIONAL** badges according to the CDIF Discovery shapes. |
| 59 | + |
| 60 | +## How this relates to Steve's examples |
| 61 | + |
| 62 | +Steve's richer CDI/XAS examples (`FeXAS_...jsonld`, `se_na2so4-...jsonld`) use the *same* schema.org properties on a `schema:Dataset` node: |
| 63 | + |
| 64 | +- `schema:name` |
| 65 | +- `schema:identifier` |
| 66 | +- `schema:description` |
| 67 | +- `schema:license` |
| 68 | +- `schema:distribution` |
| 69 | +- `schema:keywords` |
| 70 | +- `schema:variableMeasured` |
| 71 | +- `schema:subjectOf` / `dcterms:conformsTo` |
| 72 | + |
| 73 | +The CDIF Discovery Core shapes now: |
| 74 | + |
| 75 | +- Use HTTPS `https://schema.org/` everywhere. |
| 76 | +- Select all `schema:Dataset` nodes via SPARQL (no root-only filter). |
| 77 | +- Include `cdifd:descriptionProperty` for `schema:description`. |
| 78 | +- Include `cdifd:variableMeasuredProperty` in the main dataset NodeShape, so `schema:variableMeasured` is SHACL-defined. |
| 79 | + |
| 80 | +That means the **same properties** that are blue in this minimal example are the ones we would *like* to see as blue on Steve's datasets as well: |
| 81 | + |
| 82 | +- Name, identifier, description, license, keywords, distribution, variableMeasured, etc. |
| 83 | + |
| 84 | +## Feedback for Steve |
| 85 | + |
| 86 | +When updating CDIF Discovery shapes and examples, this file demonstrates a few key points: |
| 87 | + |
| 88 | +1. **Use HTTPS schema.org consistently** |
| 89 | + - In JSON-LD contexts: `"schema": "https://schema.org/"`. |
| 90 | + - In SHACL/Turtle: `@prefix schema: <https://schema.org/> .` |
| 91 | + - In SPARQL and `sh:prefixes`: always `https://schema.org/`. |
| 92 | + |
| 93 | +2. **Don't filter out referenced datasets in SPARQL targets** |
| 94 | + - The original `NOT EXISTS { ?s ?p ?this . }` filter excluded datasets that are referenced elsewhere in the graph (which realistic CDI examples do). |
| 95 | + - Removing this filter lets CDIF Discovery target any `schema:Dataset` node, including those linked via `schema:subjectOf`, `schema:about`, etc. |
| 96 | + |
| 97 | +3. **Model core metadata on the dataset using schema.org keys** |
| 98 | + - `schema:name`, `schema:identifier`, `schema:description`, `schema:license`, `schema:keywords`, `schema:distribution`, `schema:variableMeasured`. |
| 99 | + - These align directly with CDIF Discovery property shapes. |
| 100 | + |
| 101 | +4. **Keep examples readable** |
| 102 | + - This file is intentionally small so people can see, at a glance, which properties CDIF Discovery expects and how they map to the SHACL shapes. |
| 103 | + |
| 104 | +If your shapes and examples follow the same patterns as in `cdif_example.jsonld`, the CDI Previewer (and other SHACL engines) will be able to classify fields reliably as CDIF-defined instead of EXTRA. |
| 105 | + |
| 106 | +### Note on small example fixes |
| 107 | + |
| 108 | +While reviewing Steve's FeXAS example (`FeXAS_Fe_c3d.001-NEXUS-HDF5-cdi-CDIF.jsonld`), we also fixed a minor typo where one nested variable had `schame:alternateName` instead of `schema:alternateName`. This is now corrected so that all `schema:alternateName` occurrences use the proper `schema` prefix, matching the "schema": "https://schema.org/" context above. |
| 109 | + |
| 110 | +We have adjusted the CDIF Discovery shapes and previewer so that Steve's dataset *types* are recognized correctly via SPARQL targets and HTTPS schema.org IRIs. However, some of Steve's dataset properties still show up as EXTRA rather than SHACL-defined. The intention of this example and the shapes is clear, but there is still follow-up work needed to get perfect alignment between the CDIF shapes, the previewer classification logic, and Steve's richer CDI/XAS patterns. |
0 commit comments