Tighten the workflow test schema#22566
Conversation
Precursor to strict Job schema in TestJob. Covers 21 legacy files using
type: File|Directory|raw plus 7 files with dead inner type: on nested
Collection elements.
- WorkflowPopulator.run_workflow: new test_data_format="cwl_style" opt-in.
Routes via stage_inputs, extracts bare scalars & pure-scalar lists as
literal params, rejects legacy type: dict forms. Default None is
bit-for-bit unchanged for all existing callers.
- test_framework_workflows.py: opts in to cwl_style.
- Fixture migrations:
- type: File + value:[+file_type:] -> class: File + path:[+filetype:]
- type: Directory + value: + file_type: -> class: Directory + path: + filetype:
- type: raw + value: X -> bare scalar X (incl. null, "", lists)
- Dropped dead inner type: from nested Collection elements (outer
collection_type implies inner levels via galactic_job_json.to_elements).
IWC audit (119 *-tests.yml files in galaxyproject/iwc): 0 hits for
type: File|Directory|raw, 0 hits for content:, 0 hits for file_type:.
Legacy forms exist only in lib/galaxy_test/workflow/ — safe local cleanup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to galaxyproject#18884 which modeled TestJob.outputs but left TestJob.job as Dict[str, Any]. Schema defines the canonical CWL-style workflow-test input syntax; legacy type: File|Directory|raw forms stay in populator helpers and are no longer accepted in fixtures. - New galaxy.tool_util_models.test_job with Job = RootModel[dict[str, ...]] - File variants (LocationFile, PathFile, ContentsFile, CompositeDataFile) dispatched via callable Discriminator on dict-shape - Collection with recursive CollectionElement, strict collection_type - Directory (supported by galactic_job_json; IWC unused but in-tree bwa_mem2_index fixture needs it) - HashEntry using galaxy.util.hash_util.HashFunctionNames - Non-recursive list axis (List[File|scalar]) — no observed deeper nesting - Moved StrictModel + CollectionType + _check_collection_type to _base.py so test_job.py can reuse them without circular import. - TestJob.job: Dict[str, Any] -> Job; Job re-exported at package top. - test_test_format_model.py: 12 positive + 7 negative parametrized unit fixtures covering every union arm and the canonical rejection cases. - replacement_parameters_legacy.gxwf-tests.yml parked on skip list with explanation — replacement_parameters: {...} is Planemo-era magic popped by run_workflow, not a canonical job input. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapses un-tagged Union into tagged File/Collection/scalar so validation errors point at one branch, not three. Adds class_ field to nested element types for symmetry. Fixes three gxwf-tests YAMLs that were missing class: Collection on nested collection outputs — the un-discriminated Union had been hiding the bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
class: Directory + path: in CWL-style jobs only joined path with job_dir — no fallback to test-data/. Broke directory_index_1 after migrating fixtures to class/path form. Match upload_file/upload_tar by using abs_path_or_uri, which falls back to resolve_data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the workflow-test schema usability regressions flagged against galaxy-tool-util-schema 0.4.0: - TestJob.doc: make optional (was incorrectly required). - TestJob + BaseTestOutputModel: port per-field descriptions from the 0.2.0 vendored schema. - Tests: name the top-level RootModel; emit $schema, title (GalaxyWorkflowTests), and description. - Job and assertion_list: promote type aliases to named RootModel subclasses so $defs keys aren't mangled. - Drop field_title_generator from StrictModel/_StrictJobModel/ ToolSourceBaseModel; add explicit title= on every Field and model_config(title=) on every test-schema model class. Codegen template emits titles on generated assertion models. - Fix codegen models_path to point at tool_util_models/assertions.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced by codespell on generated _schema.yaml once assertion models got pulled into the OpenAPI schema. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| elements: | ||
| - identifier: el1 | ||
| class: Collection | ||
| type: paired |
There was a problem hiding this comment.
Could we allow this ? I agree it's unnecessary and maybe humans won't be writing these but they'd help me make sense of deeply nested structures ? Not a must, i just think i have a slight preference for seeing these.
There was a problem hiding this comment.
Claude's Response:
Good catch — the schema does support this, just under the key collection_type (not type). Collection.elements is recursive (File | Collection) and nested Collection entries accept collection_type, so e.g. collection_type: paired on those inner elements is valid and would be dropped only because I conflated it with the unsupported type: key. Want me to restore them as collection_type across the gxwf-tests fixtures for readability?
note-to-self prompt (if requested):
Re-add explicit
collection_type:annotations to nestedclass: Collectionelements in lib/galaxy_test/workflow/*.gxwf-tests.yml where the parentcollection_typeimplies the inner shape (e.g. insidesample_sheet:paired,list:paired,list:list:paired, etc.). Use the test_job.py Collection model (collection_type recursive on CollectionElement) to validate. Run gxwf validate on every touched file and the framework-workflows tests for the affected fixtures. Don't addtype:— it's not a schema field. Keep changes purely additive/readability; no semantic edits.
mvdbeek
left a comment
There was a problem hiding this comment.
Very nice cleanup, thank you @jmchilton!
Pulls in some updated version of stuff from #17128 that didn't make it into my approach with #18884. Also goes some way toward unifying the Planemo workflow test format and the workflow test format we use for the framework tests in Galaxy.
How to test the changes?
(Select all options that apply)
License