Skip to content

Overhaul and synchronize collection type algebra verbs#22565

Open
jmchilton wants to merge 3 commits intogalaxyproject:devfrom
jmchilton:map_match_logic
Open

Overhaul and synchronize collection type algebra verbs#22565
jmchilton wants to merge 3 commits intogalaxyproject:devfrom
jmchilton:map_match_logic

Conversation

@jmchilton
Copy link
Copy Markdown
Member

Summary

Splits the overloaded can_match_type / canMatch operation into two clearly-named lattice operations and eliminates order-dependent behavior in sibling-matching paths in both Python and TypeScript. Additionally renames Python has_subcollections_of_typecan_map_over to match TypeScript canMapOver, so the same operational question reads alike across languages.

Motivation

can_match_type / canMatch was answering two distinct questions with one operation:

  1. Direct-edge check at workflow-editor connection time: "does input slot type A accept output type B?" — asymmetric (e.g. a list input accepts a sample_sheet output, but not the reverse).
  2. Sibling-matching check: "do A and B match such that they could drive a common map-over over sibling inputs of one tool?" — symmetric. Used at runtime by Python Tree.can_match and at connection time by TypeScript mappingConstraints checks.

Routing both questions through the asymmetric operation made sibling matching order-dependent: which sibling input arrived first changed whether a workflow validated. An earlier patch added a sample_sheet asymmetry guard inside can_match_type itself, which fixed connection-time edge correctness but propagated the asymmetry into sibling matching where it does not belong.

Approach

Three operations on CollectionTypeDescription, each named for the question it answers:

  • accepts(other) — asymmetric direct-edge check. input_type.accepts(output_type) returns true iff an output of type other can be connected to an input slot of type self. Encodes the type lattice (sample_sheet <: list, paired <: paired_or_unpaired).
  • compatible(other) — symmetric sibling-matching check. Implemented as self.accepts(other) or other.accepts(self). Used at sites where neither side is an input slot — both are concrete sibling shapes.
  • can_map_over(other) — asymmetric nesting check. output_type.can_map_over(input_type) returns true iff self has proper subcollections of type other — i.e. the output has more rank than the input and can be sliced to feed it. (Renamed from Python has_subcollections_of_type to match the existing TypeScript canMapOver.) Sample-sheet asymmetry guard kept here as load-bearing for multiply / effective_collection_type arithmetic; cross-referenced to accepts for future unification.

Renames are mechanical (direction unchanged at every callsite). Two TS sites at terminals.ts:516/:673 switch from accepts to compatible because their operands are sibling map-over states. Python Tree.can_match becomes Tree.compatible_shape and uses compatible internally.

The single caller of the directional inverse helper is_subcollection_of_type is inlined as hdca_type.can_map_over(input_desc) and the helper removed.

Behavior changes

  • Connection-time edge validation: unchanged. List output → sample_sheet input still rejected; sample_sheet output → list input still accepted.
  • Python runtime sibling matching: now order-independent. Sample_sheet HDCA + list HDCA on sibling inputs match in either order.
  • TypeScript connection-time sibling map-over checks: now order-independent. Same fix class as Python.

Documentation

The canonical write-up lives in lib/galaxy/model/dataset_collections/types/collection_semantics.yml under a new "Type Compatibility Algebra" section: lattice diagram, three-row operations table, where-each-is-used breakdown, and worked examples cross-referenced to test labels. Module headers in type_description.py and collectionTypeDescription.ts point at it. Docstrings use Galaxy-native vocabulary (input slot / output collection type / "match for sibling iteration") rather than abstract type-theory framing.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

jmchilton and others added 3 commits April 25, 2026 10:45
Replace overloaded can_match_type/canMatch with two named lattice ops:
- accepts(candidate): asymmetric subtype check, used at edge validation
- compatible(other): symmetric, used at sibling map-over sites

Eliminates order-dependent sibling-matching in Python (Tree.compatible_shape)
and TypeScript (mappingConstraints). Sample_sheet asymmetry guard stays in
has_subcollections_of_type per safety analysis.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename Python ``has_subcollections_of_type`` -> ``can_map_over`` to match
TypeScript ``CollectionTypeDescription.canMapOver``. Both encode the same
operational question (output.canMapOver(input)); naming them alike makes
the cross-language correspondence obvious to readers.

Drop ``is_subcollection_of_type`` (directional inverse helper, single
caller). Inline at ``query.py`` reading ``hdca_type.can_map_over(input)``.

Clean up ``canMatch`` local variable leftovers in ``terminals.ts`` -
residue of the old conflated name; renamed to ``directlyAccepted`` /
inlined where the alias was redundant.

Update ``collection_semantics.yml`` algebra section to document three
operations (accepts / compatible / can_map_over) instead of two.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop type-theory framing (requirement/candidate, ``zipped``,
``substituted``) in favor of Galaxy-concrete framing (input slot /
output collection type, ``match for sibling iteration``,
``connected to``). ``zip`` collides with the ``paired`` collection
operation; ``requirement``/``candidate`` is abstract where Galaxy
readers think directly in terms of input slots and output shapes.

Conventions in docstrings now read ``input_type.accepts(output_type)``
and ``output_type.can_map_over(input_type)`` — direction unchanged from
prior comments, only the names of the roles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added this to the 26.1 milestone Apr 25, 2026
jmchilton added a commit to jmchilton/galaxy that referenced this pull request Apr 26, 2026
…e to PR galaxyproject#22565 vocabulary.

Three threads land here, all Galaxy-side. They unblock the TypeScript
connection validator (galaxy-tool-util-ts) by replacing hand-ported
Python tests with declarative YAML corpora that sync verbatim, and
adopt the new verbs from galaxyproject#22565 across the
workflow_state validator.

WI-4 — type-algebra truth table
  test/unit/tool_util/workflow_state/connection_type_cases.yml: 91-case
  YAML driving can_match / can_map_over / effective_map_over / compatible.
  Sentinels NULL / ANY resolve to NULL_COLLECTION_TYPE / ANY_COLLECTION_TYPE.
  test_connection_types.py shrinks to a parametrized loader plus the
  property-style sentinel tests that aren't naturally table-shaped.
  Same corpus is the source of truth for the eventual TS consumer.

WI-5 — workflow_format_validation + algebra coverage
  collection_semantics.yml (lib/galaxy/model/dataset_collections/types/)
  gains two test-tracking keys per example:
    - workflow_format_validation.fixture: stem in connection_workflows/
    - algebra[]: {op, output, input, [expected]} cross-refs into
      connection_type_cases.yml.
  semantics.py models them (WorkflowFormatValidationTest, AlgebraCaseRef);
  semantics.check() runs validate_workflow_format_validation_refs and
  validate_algebra_refs to ensure fixture stems exist on disk and algebra
  rows resolve in the truth table.
  test_collection_semantics_coverage.py adds bidirectional gating: every
  fixture is referenced by an example or KNOWN_ORPHANS, every example
  has algebra: / workflow_format_validation: coverage or sits in
  EXPECTED_NEITHER. Current state: 40/42 algebra, 7/42 fixture; two
  examples in EXPECTED_NEITHER (BASIC_MAPPING_INCLUDING_SINGLE_DATASET,
  BASIC_MAPPING_TWO_INPUTS_WITH_IDENTICAL_STRUCTURE — runtime-only).
  Helpers connection_workflows_dir(), connection_type_cases_path(),
  load_examples() exposed for cross-language reuse.

Post-rebase verb adoption (PR galaxyproject#22565)
  connection_types.py / connection_validation.py move to the
  accepts / can_map_over / compatible vocabulary:
    - can_match_type → accepts
    - has_subcollections_of_type → can_map_over
    - new compatible() free function with sentinel handling, wraps the
      symmetric CollectionTypeDescription.compatible.
  _resolve_step_map_over rewritten: pairs of contributions checked with
  compatible() (was raw collection_type string equality); resolved
  map-over picks the highest-rank compatible type. Sibling map-over
  resolution is now order-independent and matches TS mappingConstraints.
  LIST_NOT_MATCHES_SAMPLE_SHEET and LIST_PAIRED_NOT_MATCHES_SAMPLE_SHEET_PAIRED
  promoted to algebra coverage now that the asymmetry guard makes the
  rejection enforceable from accepts / can_map_over directly.

Fixture adds (WI-5 sweep + WI-6 conversions)
  connection_workflows/ gains 12 .gxwf.yml + 12 expected sidecars:
    ok_list_paired_to_paired_or_unpaired
    ok_list_list_paired_to_paired_or_unpaired
    ok_list_to_paired_or_unpaired
    ok_list_list_over_list_paired_or_unpaired
    ok_list_to_dataset
    ok_paired_and_data_no_map_over
    ok_sample_sheet_to_multi_data
    ok_simple_chain_dataset
    ok_subworkflow_list_propagation
    ok_subworkflow_map_over
    ok_subworkflow_passthrough
    ok_two_list_inputs_map_over
  All reuse existing functional test tools; no new tool XMLs.

Refs: vault INTEROP_CONNECTION_TESTING_PLAN.md (WI-4, WI-5),
INTEROP_CONNECTION_TESTING_HARDEN_PLAN.md (sweep/conversion), PR galaxyproject#22565.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jmchilton added a commit to jmchilton/galaxy that referenced this pull request Apr 27, 2026
…e to PR galaxyproject#22565 vocabulary.

Three threads land here, all Galaxy-side. They unblock the TypeScript
connection validator (galaxy-tool-util-ts) by replacing hand-ported
Python tests with declarative YAML corpora that sync verbatim, and
adopt the new verbs from galaxyproject#22565 across the
workflow_state validator.

WI-4 — type-algebra truth table
  test/unit/tool_util/workflow_state/connection_type_cases.yml: 91-case
  YAML driving can_match / can_map_over / effective_map_over / compatible.
  Sentinels NULL / ANY resolve to NULL_COLLECTION_TYPE / ANY_COLLECTION_TYPE.
  test_connection_types.py shrinks to a parametrized loader plus the
  property-style sentinel tests that aren't naturally table-shaped.
  Same corpus is the source of truth for the eventual TS consumer.

WI-5 — workflow_format_validation + algebra coverage
  collection_semantics.yml (lib/galaxy/model/dataset_collections/types/)
  gains two test-tracking keys per example:
    - workflow_format_validation.fixture: stem in connection_workflows/
    - algebra[]: {op, output, input, [expected]} cross-refs into
      connection_type_cases.yml.
  semantics.py models them (WorkflowFormatValidationTest, AlgebraCaseRef);
  semantics.check() runs validate_workflow_format_validation_refs and
  validate_algebra_refs to ensure fixture stems exist on disk and algebra
  rows resolve in the truth table.
  test_collection_semantics_coverage.py adds bidirectional gating: every
  fixture is referenced by an example or KNOWN_ORPHANS, every example
  has algebra: / workflow_format_validation: coverage or sits in
  EXPECTED_NEITHER. Current state: 40/42 algebra, 7/42 fixture; two
  examples in EXPECTED_NEITHER (BASIC_MAPPING_INCLUDING_SINGLE_DATASET,
  BASIC_MAPPING_TWO_INPUTS_WITH_IDENTICAL_STRUCTURE — runtime-only).
  Helpers connection_workflows_dir(), connection_type_cases_path(),
  load_examples() exposed for cross-language reuse.

Post-rebase verb adoption (PR galaxyproject#22565)
  connection_types.py / connection_validation.py move to the
  accepts / can_map_over / compatible vocabulary:
    - can_match_type → accepts
    - has_subcollections_of_type → can_map_over
    - new compatible() free function with sentinel handling, wraps the
      symmetric CollectionTypeDescription.compatible.
  _resolve_step_map_over rewritten: pairs of contributions checked with
  compatible() (was raw collection_type string equality); resolved
  map-over picks the highest-rank compatible type. Sibling map-over
  resolution is now order-independent and matches TS mappingConstraints.
  LIST_NOT_MATCHES_SAMPLE_SHEET and LIST_PAIRED_NOT_MATCHES_SAMPLE_SHEET_PAIRED
  promoted to algebra coverage now that the asymmetry guard makes the
  rejection enforceable from accepts / can_map_over directly.

Fixture adds (WI-5 sweep + WI-6 conversions)
  connection_workflows/ gains 12 .gxwf.yml + 12 expected sidecars:
    ok_list_paired_to_paired_or_unpaired
    ok_list_list_paired_to_paired_or_unpaired
    ok_list_to_paired_or_unpaired
    ok_list_list_over_list_paired_or_unpaired
    ok_list_to_dataset
    ok_paired_and_data_no_map_over
    ok_sample_sheet_to_multi_data
    ok_simple_chain_dataset
    ok_subworkflow_list_propagation
    ok_subworkflow_map_over
    ok_subworkflow_passthrough
    ok_two_list_inputs_map_over
  All reuse existing functional test tools; no new tool XMLs.

Refs: vault INTEROP_CONNECTION_TESTING_PLAN.md (WI-4, WI-5),
INTEROP_CONNECTION_TESTING_HARDEN_PLAN.md (sweep/conversion), PR galaxyproject#22565.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Needs Review

Development

Successfully merging this pull request may close these issues.

1 participant