- Environment: backend lives in
/home/eryk/projects/rdm-integration(Dataverse + integration Go service); frontend is this repo (rdm-integration-frontend). - Typical commands: backend
make dev_up(brings up docker stack) /make dev_down; frontendnpm run test -- --watch=false,make fmt,make lint,ng serve(via repo script if needed). - Current state: DDI-CDI Angular tests now pass after spec fixes providing valid Turtle fixtures; SHACL warnings cleared after replacing placeholder Turtle in mocks.
- CDI cache:
tests/response.json/tests/response.ttlregenerated from the per-filecdi_generator.pyoutputs; headerless samples now surface syntheticcol_*variables so the cached graph stays metadata-only. - Upload issue: earlier uploads sent only Turtle prefixes and triggered a 500 with Go JSON decode error (
AddReplaceFileResponse.messageexpects string, backend returns object when upload fails). The frontend now merges SHACL edits back into the full CDI Turtle before upload; need to re-test the Dataverse call and still harden response handling. - Outstanding UI work: SHACL form error handling (see TODO below).
- SHACL tooling view: ULB Darmstadt’s generic SHACL form component renders fields directly from a SHACL shape graph; it’s viable once we ship full CDI-aligned shapes but currently stalls because we lack a stable root node shape. Without those shapes the renderer can’t reflect our UX or validation needs, so custom Angular forms may serve better until shapes are in place.
- SHACL shape graph research: official DDI Lifecycle SHACL exports live in the
ddimodelGitHub releases, but no ready-made CDI-specific shapes turned up—available CDI assets are RDF encodings in theddi-cdirepo/spec, so we’d need to derive shapes ourselves or adapt lifecycle ones. - Backend generator: Python entry point renamed to
cdi_generator.py; Go job now emits a manifest and runs the script once per dataset while still honoring the legacy single-file flags when needed.
- Turtle (TTL) is the compact syntax we use to serialize RDF triples; our generator emits CDI dataset descriptions in this format.
- RDF vocabularies like CDI define the predicates/classes (e.g.,
cdi:Variable,dcterms:title) that give the TTL meaning. - SHACL is the constraint language layered on RDF; shapes describe what properties/structures should exist and power validation or auto-generated forms. A SHACL form renderer needs both the shape graph (rules) and the data graph (our CDI TTL) to work.
- Move the dataset selection dropdown out of the sticky menu and mirror the download component (label, layout, styles).
- Relocate the
Generate DDI-CDIbutton into the right sticky menu and reuse the download component iconography. - Replace the "Select" header text in the tree table with a toggleable checkmark matching the download component behaviour (supports select/deselect all).
- Investigate and resolve
Error: shacl root node shape not found, using the capturedresponse.jsonfrom/api/common/cachedddicdioutputto build tests/mocks. - Ensure
Add to Datasetalways uploads clean Turtle output even when the SHACL editor fails to render. - Wire the SHACL form integration to emit full CDI Turtle (not just prefixes) before invoking upload.
- Regenerate CDI Turtle after fixing
cdi_generator.py(clear cached output and ensure only column-level variables remain). - Improve
image/cdi_generator.pyso each run emits column-level variables only (handle headerless tables, avoid per-row logical datasets, dedupe roles) before merging into cached CDI. - Diagnose the HTTP 500 when calling
api/datasets/:persistentId/add(json: cannot unmarshal object into ... AddReplaceFileResponse.message); verify the request payload with the merged Turtle and align the response handling with backend expectations. - Restrict
xconvertusage to cases whereGetDataFileDDIreturns no output during thecdi_ddi.gojob execution path. - Extend
image/test_csv_to_cdi.pyto parse and assert against thetestdata/tmp_ddi8.xmloutput generated byGetDataFileDDI. - Host the SHACL shapes we design on the backend alongside the embedded frontend config (
Dockerfile,frontend.gogo:embed all:dist/datasync). - Document SHACL shape hosting/contribution guidance in
ddi-cdi.md, mirroring howcdi_generator.pyparticipation is covered. - Load the regenerated dataset in the SHACL form component and confirm the form renders without errors; capture logs/screenshots and note any remaining warnings.
- Angular unit suite now green after updating
src/app/ddi-cdi/ddi-cdi.component.spec.tswith valid Turtle fixtures and richer DOM mocks; rerun vianpm run test -- --watch=false. - Upload attempt previously failed server-side with 500 due to response message type mismatch; with the new Turtle merge helper the upload payload now includes the full dataset graph—pending verification against the real Dataverse API.
- DDI-CDI layout now mirrors the download component (dropdown placement, sticky action button, select-all icon); validated via
make test. - Cached response-driven regression tests cover SHACL root shape detection, unedited Turtle uploads, and merged SHACL edits; added in
src/app/ddi-cdi/ddi-cdi.component.spec.tsand verified vianpm run test -- --watch=false. - Implemented Turtle merge helpers in
src/app/ddi-cdi/ddi-cdi.component.tsso SHACL form submissions rehydrate the original graph, preserve prefixes, and keep uploads in sync with user edits. - Replaced placeholder Turtle strings in SHACL-related mocks with valid CDI dataset snippets to eliminate parser warnings during tests.
- Regenerated
tests/response.json/tests/response.ttlfrom the cleaned generator outputs; verified headerless files rely on syntheticcol_*variables so no row-level values leak into the cached CDI graph. image/cdi_generator.pyexposes a--manifestmode that profiles multiple files in one run, updates the aggregation logic to reuse the shared graph helper, and records summary JSON per manifest.- Go backend
image/app/core/ddi_cdi.gonow builds manifest inputs and invokes the generator once, clearing cached node state between manifest entries and capturing Python warnings for job output; validated viago test ./app/core. - Manifest builder now sets
allow_xconvertonly when Dataverse DDI fetch fails, preventing unnecessary xconvert runs; regression covered via new Go unit tests.
- Verify the add-file request against integration logs now that uploads send merged Turtle; confirm the response schema matches expectations and tighten error handling if discrepancies remain.
- Normalize the CDI generator so columns emit as
cdi:Variabledefinitions while record-level values remain in the physical dataset payload, then retest the SHACL form with a hosted shape graph that covers those variables. - Monitor the manifest-backed Go job in integration and refresh cached fixtures once the new manifest summary JSON is available for the frontend.
- Keep the prompt reusable: add new discoveries or regressions to the checklist above.
- When work completes on any item, flip the checkbox and reference supporting commits/tests.
- Plan carefully and execute step-by-step.
- The cached
response.ttlmodels the dataset DOI 10.5072/FK2/HWBVZM as a cdi:DataSet, links per-file cdi:LogicalDataSet entries, and now limits variables to column-level identifiers (e.g.,col_1,score,int_col). Earlier merges leaked row values as variable names because headerless CSVs were treated as having headers; the refreshed cache eliminates those leaks. - The file also inlines every Dataverse tabular file as a separate cdi:PhysicalDataSet, embedding either the harvested DDI Codebook XML literals or the 400 error payloads when a file was not ingested. The prov:wasGeneratedBy list still repeats cdi:ProcessStep blank nodes for each invocation—future dedupe would further tidy the graph but is outside the immediate cache cleanup.
- Comparing with
cdi_generator.py: the script streams one CSV, infers per-column stats, emits exactly one logical dataset, and never encodes row-level cells (regardless of header presence). Keeping the cache aligned now means running the tool once per tabular file or via a manifest (with header detection fixes) and merging outputs while filtering out any accidental row-derived variables. The manifest workflow now aggregates multiple files in a single invocation and writes an optional profiling summary JSON alongside the Turtle.