Skip to content

build test suite / evaluation benchmark with ideal metadata transformations #5

@alee

Description

@alee

add / adjust the structures in https://github.com/SciCodes/software-metadata-extraction-benchmark to set up a ground truth dataset to be used in test suites / evaluation benchmarks

Initial supported formats:

  1. codemeta (versioned)
  2. DataCite (versioned)
  3. Citation File Format
  4. any others?

Initial work will focus on purely deterministic transformation, use codemeticulous to convert from format A -> B for all items in the datasets (should be 100%, non-lossy, deterministic output)

Later work may include LLM augmentation where the LLM-assisted transformation augments it with additional metadata that wasn't included in the original manually curated transformations. This starts to bleed into responsibilities and functionality that should exist in somef-core though

@SciCodes/2025-workshop-organizers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions