Skip to content

Tighten the workflow test schema#22566

Merged
mvdbeek merged 9 commits intogalaxyproject:devfrom
jmchilton:wf_test_job_schema
Apr 30, 2026
Merged

Tighten the workflow test schema#22566
mvdbeek merged 9 commits intogalaxyproject:devfrom
jmchilton:wf_test_job_schema

Conversation

@jmchilton
Copy link
Copy Markdown
Member

Pulls in some updated version of stuff from #17128 that didn't make it into my approach with #18884. Also goes some way toward unifying the Planemo workflow test format and the workflow test format we use for the framework tests in Galaxy.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

jmchilton and others added 9 commits April 22, 2026 09:53
Precursor to strict Job schema in TestJob. Covers 21 legacy files using
type: File|Directory|raw plus 7 files with dead inner type: on nested
Collection elements.

- WorkflowPopulator.run_workflow: new test_data_format="cwl_style" opt-in.
  Routes via stage_inputs, extracts bare scalars & pure-scalar lists as
  literal params, rejects legacy type: dict forms. Default None is
  bit-for-bit unchanged for all existing callers.
- test_framework_workflows.py: opts in to cwl_style.
- Fixture migrations:
  - type: File + value:[+file_type:] -> class: File + path:[+filetype:]
  - type: Directory + value: + file_type: -> class: Directory + path: + filetype:
  - type: raw + value: X -> bare scalar X (incl. null, "", lists)
  - Dropped dead inner type: from nested Collection elements (outer
    collection_type implies inner levels via galactic_job_json.to_elements).

IWC audit (119 *-tests.yml files in galaxyproject/iwc): 0 hits for
type: File|Directory|raw, 0 hits for content:, 0 hits for file_type:.
Legacy forms exist only in lib/galaxy_test/workflow/ — safe local cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to galaxyproject#18884 which modeled TestJob.outputs but left TestJob.job
as Dict[str, Any]. Schema defines the canonical CWL-style workflow-test
input syntax; legacy type: File|Directory|raw forms stay in populator
helpers and are no longer accepted in fixtures.

- New galaxy.tool_util_models.test_job with Job = RootModel[dict[str, ...]]
  - File variants (LocationFile, PathFile, ContentsFile, CompositeDataFile)
    dispatched via callable Discriminator on dict-shape
  - Collection with recursive CollectionElement, strict collection_type
  - Directory (supported by galactic_job_json; IWC unused but in-tree
    bwa_mem2_index fixture needs it)
  - HashEntry using galaxy.util.hash_util.HashFunctionNames
  - Non-recursive list axis (List[File|scalar]) — no observed deeper nesting
- Moved StrictModel + CollectionType + _check_collection_type to _base.py
  so test_job.py can reuse them without circular import.
- TestJob.job: Dict[str, Any] -> Job; Job re-exported at package top.
- test_test_format_model.py: 12 positive + 7 negative parametrized unit
  fixtures covering every union arm and the canonical rejection cases.
- replacement_parameters_legacy.gxwf-tests.yml parked on skip list with
  explanation — replacement_parameters: {...} is Planemo-era magic popped
  by run_workflow, not a canonical job input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapses un-tagged Union into tagged File/Collection/scalar so
validation errors point at one branch, not three. Adds class_ field
to nested element types for symmetry. Fixes three gxwf-tests YAMLs
that were missing class: Collection on nested collection outputs —
the un-discriminated Union had been hiding the bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
class: Directory + path: in CWL-style jobs only joined path with
job_dir — no fallback to test-data/. Broke directory_index_1 after
migrating fixtures to class/path form. Match upload_file/upload_tar
by using abs_path_or_uri, which falls back to resolve_data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the workflow-test schema usability regressions flagged against
galaxy-tool-util-schema 0.4.0:

- TestJob.doc: make optional (was incorrectly required).
- TestJob + BaseTestOutputModel: port per-field descriptions from the
  0.2.0 vendored schema.
- Tests: name the top-level RootModel; emit $schema, title
  (GalaxyWorkflowTests), and description.
- Job and assertion_list: promote type aliases to named RootModel
  subclasses so $defs keys aren't mangled.
- Drop field_title_generator from StrictModel/_StrictJobModel/
  ToolSourceBaseModel; add explicit title= on every Field and
  model_config(title=) on every test-schema model class. Codegen
  template emits titles on generated assertion models.
- Fix codegen models_path to point at tool_util_models/assertions.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced by codespell on generated _schema.yaml once assertion
models got pulled into the OpenAPI schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
elements:
- identifier: el1
class: Collection
type: paired
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we allow this ? I agree it's unnecessary and maybe humans won't be writing these but they'd help me make sense of deeply nested structures ? Not a must, i just think i have a slight preference for seeing these.

Copy link
Copy Markdown
Member Author

@jmchilton jmchilton Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude's Response:

Good catch — the schema does support this, just under the key collection_type (not type). Collection.elements is recursive (File | Collection) and nested Collection entries accept collection_type, so e.g. collection_type: paired on those inner elements is valid and would be dropped only because I conflated it with the unsupported type: key. Want me to restore them as collection_type across the gxwf-tests fixtures for readability?

note-to-self prompt (if requested):

Re-add explicit collection_type: annotations to nested class: Collection elements in lib/galaxy_test/workflow/*.gxwf-tests.yml where the parent collection_type implies the inner shape (e.g. inside sample_sheet:paired, list:paired, list:list:paired, etc.). Use the test_job.py Collection model (collection_type recursive on CollectionElement) to validate. Run gxwf validate on every touched file and the framework-workflows tests for the affected fixtures. Don't add type: — it's not a schema field. Keep changes purely additive/readability; no semantic edits.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, i think that's fine then

Copy link
Copy Markdown
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice cleanup, thank you @jmchilton!

@mvdbeek mvdbeek merged commit 882f22a into galaxyproject:dev Apr 30, 2026
66 of 69 checks passed
@github-project-automation github-project-automation Bot moved this from Needs Review to Done in Galaxy Dev - weeklies Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants