Support nested em rdf export#2558
Conversation
…date tests for correct RDF export
There was a problem hiding this comment.
Pull request overview
Adds proper RDF serialization for nested ExtendedMetadata (LinkedExtendedMetadata / LinkedExtendedMetadataMulti) by emitting blank nodes and recursively exporting nested attributes, addressing seek/issues/2557 and improving the queryability of exported RDF.
Changes:
- Refactors extended-metadata RDF export to recursively emit nested attributes as blank nodes (instead of stringifying nested hashes).
- Adds factories and unit tests covering nested EMT export (single + multi), PID-skipping behavior, and scalar datatype literal emission.
- Updates an existing Study RDF test to assert that
nilEMT values do not produce RDF triples.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
lib/seek/rdf/rdf_generation.rb |
Implements recursive EMT RDF emission with blank nodes and typed literals for scalar base types. |
test/unit/rdf_generation_test.rb |
Adds unit tests for nested EMT RDF export and scalar XSD datatype literal behavior. |
test/factories/extended_metadata_types.rb |
Adds EMT/EMA factories to construct nested EMT shapes for RDF tests. |
test/factories/sample_attribute_types.rb |
Adds a date SampleAttributeType factory to support date typing tests. |
test/unit/study_test.rb |
Updates expectations so nil EMT values do not emit RDF triples. |
db/seeds/extended_metadata_drafts/study_nested_emt_rdf_example.seeds.rb |
Adds a seed example showcasing nested EMT RDF export structures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| next unless attribute.pid.present? | ||
|
|
||
| value = data[attribute.accessor_name] | ||
| next if value.nil? |
There was a problem hiding this comment.
emit_emt_attributes only skips nil values. For scalar types this can lead to incorrect RDF when the stored value is blank (e.g. optional integer/float/boolean fields often persist as ""), because typed_rdf_literal coerces "" to 0/0.0 or produces an invalid xsd:boolean literal. Consider skipping values that the attribute considers blank (e.g. attribute.test_blank?(value) / value.blank?) before calling typed_rdf_literal.
| next if value.nil? | |
| next if value.nil? | |
| next if attribute.respond_to?(:test_blank?) ? attribute.test_blank?(value) : value.blank? |
| puts 'Created study_rdf_example EMT' | ||
| end | ||
| end | ||
| # rubocop:enable Metrics/BlockLength |
There was a problem hiding this comment.
This seeds file ends with # rubocop:enable Metrics/BlockLength but there is no corresponding # rubocop:disable Metrics/BlockLength at the top. As written, the block-length cop will still run for this long file and the lone enable directive is likely unintended. Either add the matching rubocop:disable header (and keep the enable), or remove the enable line entirely.
| # rubocop:enable Metrics/BlockLength |
| def typed_rdf_literal(attribute, value) | ||
| case attribute.sample_attribute_type&.base_type | ||
| when Seek::Samples::BaseType::DATE | ||
| RDF::Literal(value.to_s, datatype: RDF::XSD.date) | ||
| when Seek::Samples::BaseType::DATE_TIME | ||
| RDF::Literal(value.to_s, datatype: RDF::XSD.dateTime) | ||
| when Seek::Samples::BaseType::INTEGER |
There was a problem hiding this comment.
typed_rdf_literal emits xsd:date/xsd:dateTime literals using value.to_s. However the date/date-time validators accept many non-XSD lexical formats (e.g. "2 Feb 2015" or "Thu, 11 Feb 2016 15:39:55 +0000"), which would become invalid RDF typed literals when exported. To ensure valid RDF, consider normalizing these values before emitting (e.g. parse then iso8601 for dateTime and Date.parse(...).iso8601 for date).
#2557