add / adjust the structures in https://github.com/SciCodes/software-metadata-extraction-benchmark to set up a ground truth dataset to be used in test suites / evaluation benchmarks
Initial supported formats:
- codemeta (versioned)
- DataCite (versioned)
- Citation File Format
- any others?
Initial work will focus on purely deterministic transformation, use codemeticulous to convert from format A -> B for all items in the datasets (should be 100%, non-lossy, deterministic output)
Later work may include LLM augmentation where the LLM-assisted transformation augments it with additional metadata that wasn't included in the original manually curated transformations. This starts to bleed into responsibilities and functionality that should exist in somef-core though
@SciCodes/2025-workshop-organizers
add / adjust the structures in https://github.com/SciCodes/software-metadata-extraction-benchmark to set up a ground truth dataset to be used in test suites / evaluation benchmarks
Initial supported formats:
Initial work will focus on purely deterministic transformation, use codemeticulous to convert from format A -> B for all items in the datasets (should be 100%, non-lossy, deterministic output)
Later work may include LLM augmentation where the LLM-assisted transformation augments it with additional metadata that wasn't included in the original manually curated transformations. This starts to bleed into responsibilities and functionality that should exist in somef-core though
@SciCodes/2025-workshop-organizers