Overhaul and synchronize collection type algebra verbs#22565
Open
jmchilton wants to merge 3 commits intogalaxyproject:devfrom
Open
Overhaul and synchronize collection type algebra verbs#22565jmchilton wants to merge 3 commits intogalaxyproject:devfrom
jmchilton wants to merge 3 commits intogalaxyproject:devfrom
Conversation
Replace overloaded can_match_type/canMatch with two named lattice ops: - accepts(candidate): asymmetric subtype check, used at edge validation - compatible(other): symmetric, used at sibling map-over sites Eliminates order-dependent sibling-matching in Python (Tree.compatible_shape) and TypeScript (mappingConstraints). Sample_sheet asymmetry guard stays in has_subcollections_of_type per safety analysis. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename Python ``has_subcollections_of_type`` -> ``can_map_over`` to match TypeScript ``CollectionTypeDescription.canMapOver``. Both encode the same operational question (output.canMapOver(input)); naming them alike makes the cross-language correspondence obvious to readers. Drop ``is_subcollection_of_type`` (directional inverse helper, single caller). Inline at ``query.py`` reading ``hdca_type.can_map_over(input)``. Clean up ``canMatch`` local variable leftovers in ``terminals.ts`` - residue of the old conflated name; renamed to ``directlyAccepted`` / inlined where the alias was redundant. Update ``collection_semantics.yml`` algebra section to document three operations (accepts / compatible / can_map_over) instead of two. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop type-theory framing (requirement/candidate, ``zipped``, ``substituted``) in favor of Galaxy-concrete framing (input slot / output collection type, ``match for sibling iteration``, ``connected to``). ``zip`` collides with the ``paired`` collection operation; ``requirement``/``candidate`` is abstract where Galaxy readers think directly in terms of input slots and output shapes. Conventions in docstrings now read ``input_type.accepts(output_type)`` and ``output_type.can_map_over(input_type)`` — direction unchanged from prior comments, only the names of the roles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jmchilton
added a commit
to jmchilton/galaxy
that referenced
this pull request
Apr 26, 2026
…e to PR galaxyproject#22565 vocabulary. Three threads land here, all Galaxy-side. They unblock the TypeScript connection validator (galaxy-tool-util-ts) by replacing hand-ported Python tests with declarative YAML corpora that sync verbatim, and adopt the new verbs from galaxyproject#22565 across the workflow_state validator. WI-4 — type-algebra truth table test/unit/tool_util/workflow_state/connection_type_cases.yml: 91-case YAML driving can_match / can_map_over / effective_map_over / compatible. Sentinels NULL / ANY resolve to NULL_COLLECTION_TYPE / ANY_COLLECTION_TYPE. test_connection_types.py shrinks to a parametrized loader plus the property-style sentinel tests that aren't naturally table-shaped. Same corpus is the source of truth for the eventual TS consumer. WI-5 — workflow_format_validation + algebra coverage collection_semantics.yml (lib/galaxy/model/dataset_collections/types/) gains two test-tracking keys per example: - workflow_format_validation.fixture: stem in connection_workflows/ - algebra[]: {op, output, input, [expected]} cross-refs into connection_type_cases.yml. semantics.py models them (WorkflowFormatValidationTest, AlgebraCaseRef); semantics.check() runs validate_workflow_format_validation_refs and validate_algebra_refs to ensure fixture stems exist on disk and algebra rows resolve in the truth table. test_collection_semantics_coverage.py adds bidirectional gating: every fixture is referenced by an example or KNOWN_ORPHANS, every example has algebra: / workflow_format_validation: coverage or sits in EXPECTED_NEITHER. Current state: 40/42 algebra, 7/42 fixture; two examples in EXPECTED_NEITHER (BASIC_MAPPING_INCLUDING_SINGLE_DATASET, BASIC_MAPPING_TWO_INPUTS_WITH_IDENTICAL_STRUCTURE — runtime-only). Helpers connection_workflows_dir(), connection_type_cases_path(), load_examples() exposed for cross-language reuse. Post-rebase verb adoption (PR galaxyproject#22565) connection_types.py / connection_validation.py move to the accepts / can_map_over / compatible vocabulary: - can_match_type → accepts - has_subcollections_of_type → can_map_over - new compatible() free function with sentinel handling, wraps the symmetric CollectionTypeDescription.compatible. _resolve_step_map_over rewritten: pairs of contributions checked with compatible() (was raw collection_type string equality); resolved map-over picks the highest-rank compatible type. Sibling map-over resolution is now order-independent and matches TS mappingConstraints. LIST_NOT_MATCHES_SAMPLE_SHEET and LIST_PAIRED_NOT_MATCHES_SAMPLE_SHEET_PAIRED promoted to algebra coverage now that the asymmetry guard makes the rejection enforceable from accepts / can_map_over directly. Fixture adds (WI-5 sweep + WI-6 conversions) connection_workflows/ gains 12 .gxwf.yml + 12 expected sidecars: ok_list_paired_to_paired_or_unpaired ok_list_list_paired_to_paired_or_unpaired ok_list_to_paired_or_unpaired ok_list_list_over_list_paired_or_unpaired ok_list_to_dataset ok_paired_and_data_no_map_over ok_sample_sheet_to_multi_data ok_simple_chain_dataset ok_subworkflow_list_propagation ok_subworkflow_map_over ok_subworkflow_passthrough ok_two_list_inputs_map_over All reuse existing functional test tools; no new tool XMLs. Refs: vault INTEROP_CONNECTION_TESTING_PLAN.md (WI-4, WI-5), INTEROP_CONNECTION_TESTING_HARDEN_PLAN.md (sweep/conversion), PR galaxyproject#22565. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jmchilton
added a commit
to jmchilton/galaxy
that referenced
this pull request
Apr 27, 2026
…e to PR galaxyproject#22565 vocabulary. Three threads land here, all Galaxy-side. They unblock the TypeScript connection validator (galaxy-tool-util-ts) by replacing hand-ported Python tests with declarative YAML corpora that sync verbatim, and adopt the new verbs from galaxyproject#22565 across the workflow_state validator. WI-4 — type-algebra truth table test/unit/tool_util/workflow_state/connection_type_cases.yml: 91-case YAML driving can_match / can_map_over / effective_map_over / compatible. Sentinels NULL / ANY resolve to NULL_COLLECTION_TYPE / ANY_COLLECTION_TYPE. test_connection_types.py shrinks to a parametrized loader plus the property-style sentinel tests that aren't naturally table-shaped. Same corpus is the source of truth for the eventual TS consumer. WI-5 — workflow_format_validation + algebra coverage collection_semantics.yml (lib/galaxy/model/dataset_collections/types/) gains two test-tracking keys per example: - workflow_format_validation.fixture: stem in connection_workflows/ - algebra[]: {op, output, input, [expected]} cross-refs into connection_type_cases.yml. semantics.py models them (WorkflowFormatValidationTest, AlgebraCaseRef); semantics.check() runs validate_workflow_format_validation_refs and validate_algebra_refs to ensure fixture stems exist on disk and algebra rows resolve in the truth table. test_collection_semantics_coverage.py adds bidirectional gating: every fixture is referenced by an example or KNOWN_ORPHANS, every example has algebra: / workflow_format_validation: coverage or sits in EXPECTED_NEITHER. Current state: 40/42 algebra, 7/42 fixture; two examples in EXPECTED_NEITHER (BASIC_MAPPING_INCLUDING_SINGLE_DATASET, BASIC_MAPPING_TWO_INPUTS_WITH_IDENTICAL_STRUCTURE — runtime-only). Helpers connection_workflows_dir(), connection_type_cases_path(), load_examples() exposed for cross-language reuse. Post-rebase verb adoption (PR galaxyproject#22565) connection_types.py / connection_validation.py move to the accepts / can_map_over / compatible vocabulary: - can_match_type → accepts - has_subcollections_of_type → can_map_over - new compatible() free function with sentinel handling, wraps the symmetric CollectionTypeDescription.compatible. _resolve_step_map_over rewritten: pairs of contributions checked with compatible() (was raw collection_type string equality); resolved map-over picks the highest-rank compatible type. Sibling map-over resolution is now order-independent and matches TS mappingConstraints. LIST_NOT_MATCHES_SAMPLE_SHEET and LIST_PAIRED_NOT_MATCHES_SAMPLE_SHEET_PAIRED promoted to algebra coverage now that the asymmetry guard makes the rejection enforceable from accepts / can_map_over directly. Fixture adds (WI-5 sweep + WI-6 conversions) connection_workflows/ gains 12 .gxwf.yml + 12 expected sidecars: ok_list_paired_to_paired_or_unpaired ok_list_list_paired_to_paired_or_unpaired ok_list_to_paired_or_unpaired ok_list_list_over_list_paired_or_unpaired ok_list_to_dataset ok_paired_and_data_no_map_over ok_sample_sheet_to_multi_data ok_simple_chain_dataset ok_subworkflow_list_propagation ok_subworkflow_map_over ok_subworkflow_passthrough ok_two_list_inputs_map_over All reuse existing functional test tools; no new tool XMLs. Refs: vault INTEROP_CONNECTION_TESTING_PLAN.md (WI-4, WI-5), INTEROP_CONNECTION_TESTING_HARDEN_PLAN.md (sweep/conversion), PR galaxyproject#22565. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Splits the overloaded
can_match_type/canMatchoperation into two clearly-named lattice operations and eliminates order-dependent behavior in sibling-matching paths in both Python and TypeScript. Additionally renames Pythonhas_subcollections_of_type→can_map_overto match TypeScriptcanMapOver, so the same operational question reads alike across languages.Motivation
can_match_type/canMatchwas answering two distinct questions with one operation:listinput accepts asample_sheetoutput, but not the reverse).Tree.can_matchand at connection time by TypeScriptmappingConstraintschecks.Routing both questions through the asymmetric operation made sibling matching order-dependent: which sibling input arrived first changed whether a workflow validated. An earlier patch added a
sample_sheetasymmetry guard insidecan_match_typeitself, which fixed connection-time edge correctness but propagated the asymmetry into sibling matching where it does not belong.Approach
Three operations on
CollectionTypeDescription, each named for the question it answers:accepts(other)— asymmetric direct-edge check.input_type.accepts(output_type)returns true iff an output of typeothercan be connected to an input slot of typeself. Encodes the type lattice (sample_sheet <: list,paired <: paired_or_unpaired).compatible(other)— symmetric sibling-matching check. Implemented asself.accepts(other) or other.accepts(self). Used at sites where neither side is an input slot — both are concrete sibling shapes.can_map_over(other)— asymmetric nesting check.output_type.can_map_over(input_type)returns true iffselfhas proper subcollections of typeother— i.e. the output has more rank than the input and can be sliced to feed it. (Renamed from Pythonhas_subcollections_of_typeto match the existing TypeScriptcanMapOver.) Sample-sheet asymmetry guard kept here as load-bearing formultiply/effective_collection_typearithmetic; cross-referenced toacceptsfor future unification.Renames are mechanical (direction unchanged at every callsite). Two TS sites at
terminals.ts:516/:673switch fromacceptstocompatiblebecause their operands are sibling map-over states. PythonTree.can_matchbecomesTree.compatible_shapeand usescompatibleinternally.The single caller of the directional inverse helper
is_subcollection_of_typeis inlined ashdca_type.can_map_over(input_desc)and the helper removed.Behavior changes
Documentation
The canonical write-up lives in
lib/galaxy/model/dataset_collections/types/collection_semantics.ymlunder a new "Type Compatibility Algebra" section: lattice diagram, three-row operations table, where-each-is-used breakdown, and worked examples cross-referenced to test labels. Module headers intype_description.pyandcollectionTypeDescription.tspoint at it. Docstrings use Galaxy-native vocabulary (input slot / output collection type / "match for sibling iteration") rather than abstract type-theory framing.How to test the changes?
(Select all options that apply)
License