Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #161 +/- ##
=======================================
Coverage 41.03% 41.03%
=======================================
Files 51 51
Lines 1974 1974
Branches 441 441
=======================================
Hits 810 810
Misses 1047 1047
Partials 117 117
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0179731 to
479ff6c
Compare
25fb004 to
d863b28
Compare
New gxformat2/native.py with load_native(data, strict) that validates
native workflow dicts via pydantic. strict=False normalizes known Galaxy
quirks (tags as strings, scalar action_arguments) before validation.
from_galaxy_native() now parses input into NativeGalaxyWorkflow early,
replacing untyped dict access with typed attribute access throughout.
No longer mutates input dict (pop("name") removed). convert_tool_state
callback receives model_dump for backward compat.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New gxformat2/normalized/ package providing a typed layer above the
auto-generated pydantic schema models. The normalized models narrow
loose union types into predictable, uniform representations:
Native (NormalizedNativeWorkflow, NormalizedNativeStep):
- tool_state: str|dict|None -> dict[str,Any] (always parsed)
- input_connections, inputs, outputs, workflow_outputs,
post_job_actions: X|None -> X (never None, empty defaults)
- tags: list[str]|None -> list[str]
- Subworkflows recursively normalized
Format2 (NormalizedFormat2, NormalizedWorkflowStep):
- steps, inputs, outputs, comments: list|dict -> list (always list)
- Input string shorthands ("data") expanded to WorkflowInputParameter
- Step in/out string shorthands expanded to WorkflowStepInput/Output
- Step ids always populated
- doc: str|list[str]|None -> str|None (joined)
- Subworkflows recursively normalized
Both reuse component models from the auto-generated schemas
(WorkflowStepInput, NativeInputConnection, StepPosition, etc.) -
only the container/workflow/step models are hand-crafted.
Entry points:
normalized_native(dict|str|Path|NativeGalaxyWorkflow)
normalized_format2(dict|str|Path|GalaxyWorkflow)
Goal: establish a layered architecture where raw dicts (layer 0),
schema-validated models (layer 1), and normalized models (layer 2)
give consumers clear typing guarantees based on the assumptions
they're willing to make about the workflow data.
Also adds DEPENDENCY_GXFORMAT2_ABSTRACTIONS.md documenting all
workflow abstractions available to downstream consumers.
…workflow These conflated format conversion with Galaxy API interaction. After import_tool removal, galaxy_interface was threaded through ConversionContext but never dereferenced. convert_and_import_workflow was only used by Galaxy test infrastructure, not production code. - Remove interface.py (ImporterGalaxyInterface, BioBlendImporterGalaxyInterface) - Remove main.py (convert_and_import_workflow) - Remove galaxy_interface from ConversionContext, python_to_workflow, yaml_to_workflow - Make galaxy_interface param optional (default None) for backward compat - Drop bioblend runtime dependency - Remove MockGalaxyInterface from test helpers - Clean up unused imports in tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Widen WorkflowStep.run for subworkflows, add ConversionOptions for controlling expansion behavior, and implement expanded_format2/ expanded_native with URL resolution and @import support. New gxformat2/options.py: - ConversionOptions: unified options for both conversion directions, replaces ImportOptions. Adds expand flag, url_resolver callback, convert_tool_state, compact. - default_url_resolver: HTTP fetch + YAML parse, TRS URL descriptor extraction, base64:// decode - is_trs_url: GA4GH TRS v2 URL pattern detection
New gxformat2/to_native.py with to_native() entry point that converts Format2 workflows to NormalizedNativeWorkflow using typed models throughout. No dict mutation — reads from NormalizedFormat2, constructs NormalizedNativeStep instances via _build_* functions. _build_input_step: converts WorkflowInputParameter to native input step _build_tool_step: handles state encoding, connections, PJAs, user tools _build_subworkflow_step: inline or URL ref subworkflows _build_pause_step, _build_pick_value_step: simple step types Uses @overload for expand=True -> ExpandedNativeWorkflow return type. Internal _ConversionContext replaces the old mutable ConversionContext for label tracking and step_output resolution. Also fixes normalized_format2 to default missing inputs/outputs/steps fields to empty dicts before validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New gxformat2/to_format2.py with to_format2() entry point that converts native workflows to NormalizedFormat2 using typed models. Builds WorkflowInputParameter, WorkflowStepInput, WorkflowStepOutput, NormalizedWorkflowStep models directly instead of OrderedDicts. Handles: tool steps (with convert_tool_state callback), subworkflows (inline + URL passthrough), pause, pick_value, user-defined tools (tool_representation → run), post-job-actions → step outputs, comments, compact mode. Also adds tool_representation field to NormalizedNativeStep. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New primary API: to_native() and to_format2() with ConversionOptions. Backward compat: python_to_workflow, from_galaxy_native, ImportOptions still exported. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Route python_to_workflow and from_galaxy_native through new typed code paths. Fix cross-schema type handling, label resolution, idmap ordering, comment round-tripping, dropped input fields, content_source mapping, and subworkflow context propagation.
converter.py and export.py now delegate entirely to to_native.py and to_format2.py. Remove ~995 lines of dead code: converter.py: _python_to_workflow, all transform_* functions, BaseConversionContext/ConversionContext/SubworkflowConversionContext, _populate_*, _ensure_*, _action, _preprocess_graphs, convert_inputs_to_steps, run_workflow_to_step, run_user_tool_to_step. Keep: python_to_workflow/yaml_to_workflow (compat wrappers), ImportOptions, _compat_fixup_native, main/CLI. export.py: all _convert_*_step functions, _copy_properties, _copy_common_properties, _convert_input_connections, _convert_post_job_actions, _convert_comments_to_format2, _tool_state, _to_source. Keep: from_galaxy_native (compat wrapper), idmap helpers, main/CLI. Fix steps_as_list imports in abstract.py and normalize.py to import from model.py directly instead of via converter.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…isions Add discriminated union types for comments and creators in Format2 schema. Replace enum-based type discriminators with string + Literal to fix JSON-LD predicate collisions in schema-salad codegen. Regenerate native schemas with discriminator getattr fix.
Enable pydantic mypy plugin, fix type alias syntax for Python <3.10, reformat long lines/overload stubs, fix docstring formatting (D205/D209/D400/D107) and E704 flake8 errors.
Add stub/marker models for NormalizedWorkflowStep.run: - GalaxyUserToolStub: opaque marker for user-defined tools - ImportReference: @import path, resolved during expansion Normalization eagerly converts inline subworkflow dicts to NormalizedFormat2. field_validator prevents pydantic auto-coercion (NormalizedFormat2 with extra=allow matches any dict). ExpandedWorkflowStep narrows run to ExpandedFormat2 | GalaxyUserToolStub | None — no imports or URL refs remain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Recurse into inline subworkflows (run: {class: GalaxyWorkflow})
in lint_format2, matching lint_ga's existing recursion pattern.
Add _validate_output_sources to catch dangling outputSource refs
pointing to nonexistent steps — fixes the nested_no_steps bug.
Add LintContext.child(prefix) for nested error context paths
e.g. "[step nested_workflow] Output 'x' references step 'y'..."
Closes the xfail test — linter now catches exit code 2.
See #162 for expanded lint mode (URL/@import subworkflows).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| class ConversionOptions: | ||
| """Options for workflow format conversion and expansion. | ||
|
|
||
| Subsumes the old ``ImportOptions`` and adds native→Format2 options, |
There was a problem hiding this comment.
Reading Subsumes the old ``ImportOptions`` in the future us going to be awkward, can you ask claude to drop references to old state in comments and docstrings ?
| ExpandedWorkflowStep.model_rebuild() | ||
| ExpandedFormat2.model_rebuild() | ||
| ExpandedNativeStep.model_rebuild() | ||
| ExpandedNativeWorkflow.model_rebuild() |
| from .options import ConversionOptions | ||
| from .to_native import to_native |
There was a problem hiding this comment.
to_native.py defined to_native() - this isn't even on Claude - this was probably a @jmchilton request. It read awkward for sure. to_native.py is just a shim in #164 so that is less confusing but there is no reason to have a shim for code introduced in this PR 😆 - I will clean up the imports in #164 and have it drop the shim all together. There is a big cross cutting in that PR that makes rebasing on this one ... tricky so I'm going to just merge this before doing that and do it on the HEAD there. It is a good catch though and I will fix it.
Issue galaxyproject#187 reported that Planemo was broken by two gxformat2 changes: 1. The `lint_format2` / `lint_ga` signatures changed from accepting raw dicts with a `path=` kwarg to requiring normalized pydantic models. Back-compat was already added on main; this adds a regression test and also makes `LintContext._emit` use %-style substitution for positional args so messages coming from `galaxy.tool_util.lint` callers (which use `%s`/`%d`) render correctly. Keyword args still use `.format()`. 2. `gxformat2.interface` was removed in PR galaxyproject#161 but Planemo still imports `BioBlendImporterGalaxyInterface` and `ImporterGalaxyInterface` from it. Restore the module as a deprecated compatibility shim. `bioblend` is now an optional dependency (install via `gxformat2[bioblend]`) and is imported lazily inside `BioBlendImporterGalaxyInterface.__init__` so the shim imports cleanly without it. https://claude.ai/code/session_012NSTQsTHKwEnpiDc9ivBoK
Issue galaxyproject#187 reported that Planemo was broken by two gxformat2 changes: 1. The `lint_format2` / `lint_ga` signatures changed from accepting raw dicts with a `path=` kwarg to requiring normalized pydantic models. Back-compat was already added on main; this adds a regression test and also makes `LintContext._emit` use %-style substitution for positional args so messages coming from `galaxy.tool_util.lint` callers (which use `%s`/`%d`) render correctly. Keyword args still use `.format()`. 2. `gxformat2.interface` was removed in PR galaxyproject#161 but Planemo still imports `BioBlendImporterGalaxyInterface` and `ImporterGalaxyInterface` from it. Restore the module as a deprecated compatibility shim. `bioblend` is now an optional dependency (install via `gxformat2[bioblend]`) and is imported lazily inside `BioBlendImporterGalaxyInterface.__init__` so the shim imports cleanly without it.
Summary
gxformat2 bridges Galaxy's two workflow formats — Format2 (YAML, human-authored) and native (.ga, JSON, editor-produced). Until now that bridge was built entirely out of raw dicts: every consumer re-discovered the same structural quirks independently, compensating with its own defensive
get()calls and type checks. The real contracts lived in convention, not in types. This PR replaces that foundation with a normalized model layer that makes the contracts explicit — and in doing so, sets up a follow-up PR that migrates every consumer in the library to typed model access.NormalizedFormat2andNormalizedNativeWorkfloware Pydantic models that accept the full flexibility of both formats on the way in but present narrow, guaranteed structure on the way out. Steps are always lists with populated IDs. Inputs are always expanded objects.tool_stateis always a parsed dict. Comments are discriminated unions. The conversion engine (to_native(),to_format2()) is rebuilt on top of these models, so typed guarantees propagate to every downstream consumer for free.The schema layer gets a matching upgrade: comments split from a monolithic
WorkflowCommentintoTextComment | MarkdownComment | FrameComment | FreehandCommentwith literal-type discriminators. Creators becomeCreatorPerson | CreatorOrganizationfollowing schema.org semantics. Therunfield on steps is widened to accept inline subworkflows, URL strings, and@importdicts — finally matching what real workflows actually contain.Cross-format subworkflows are a particularly exciting capability this unlocks. The expansion system can now resolve a Format2 workflow whose step
run:s a native.gaURL (or vice versa), fetch and convert it across formats, and inline the result as a fully typed subworkflow model — recursively, with circular-reference detection and a configurable depth limit. This means a singleensure_format2(wf, expand=True)call can chase TRS references, base64-encoded workflows, HTTP URLs, and local file imports across format boundaries, returning one coherent, fully-resolved model tree.With models carrying the structural guarantees, old infrastructure becomes dead weight.
ImporterGalaxyInterfaceandBioBlendImporterGalaxyInterfaceare removed — workflow conversion is not workflow import, and gxformat2 shouldn't own the Galaxy API call. ~650 lines of transform functions inconverter.pyare replaced by immutable_build_*step constructors. The public API shrinks toto_native(),to_format2(), andConversionOptions, whilepython_to_workflow()/yaml_to_workflow()remain as thin backward-compat wrappers.Linting picks up two capabilities the model layer makes straightforward: recursive subworkflow linting (descending into inline Format2 subworkflows with child lint contexts) and output source validation (verifying every
outputSourceresolves to an actual step or input label). These are a taste of what the follow-up PR delivers across the rest of the codebase.What changed
New:
gxformat2/normalized/package_format2.py—NormalizedFormat2,NormalizedWorkflowStep,GalaxyUserToolStub,ImportReference,SourceReference,resolve_source_reference()_native.py—NormalizedNativeWorkflow,NormalizedNativeStepwith type-discriminating properties (is_tool_step,is_subworkflow_step,is_input_step, etc.)_conversion.py— unified conversion engine:to_native()/to_format2()with overloaded signatures, expansion system (ExpandedFormat2,ExpandedNativeWorkflow) with URL/TRS/base64 resolution and circular-reference detectionNew:
gxformat2/options.pyConversionOptionsconsolidatingImportOptions+ Format2 export options + expansion config + pluggable URL resolverdefault_url_resolver()handling base64://, GA4GH TRS v2, and plain HTTPSchema (
gxformat2/schema/)CreatorPerson | CreatorOrganizationWorkflowStep.runtyped asGalaxyWorkflow | str | dict | Noneaction_argumentswidened fromdict[str, str]todict[str, Any]Removed
gxformat2/interface.py—ImporterGalaxyInterface,BioBlendImporterGalaxyInterfacegxformat2/main.py—convert_and_import_workflow()converter.pyLinting
_validate_output_sources()checkingoutputSourcetargets existTests
test_normalized.py—$graphhandling, subworkflow resolution, expansion, circular ref detectiontest_to_native_model.py— typedto_native()API coveragetest_to_format2_model.py— typedto_format2()API, compact modetest_load_native.py— native workflow loading