Skip to content

Commit 1eb5f07

Browse files
jmchiltonclaude
andcommitted
Algebra truth-table + workflow_format_validation tracking, post-rebase to PR galaxyproject#22565 vocabulary.
Three threads land here, all Galaxy-side. They unblock the TypeScript connection validator (galaxy-tool-util-ts) by replacing hand-ported Python tests with declarative YAML corpora that sync verbatim, and adopt the new verbs from galaxyproject#22565 across the workflow_state validator. WI-4 — type-algebra truth table test/unit/tool_util/workflow_state/connection_type_cases.yml: 91-case YAML driving can_match / can_map_over / effective_map_over / compatible. Sentinels NULL / ANY resolve to NULL_COLLECTION_TYPE / ANY_COLLECTION_TYPE. test_connection_types.py shrinks to a parametrized loader plus the property-style sentinel tests that aren't naturally table-shaped. Same corpus is the source of truth for the eventual TS consumer. WI-5 — workflow_format_validation + algebra coverage collection_semantics.yml (lib/galaxy/model/dataset_collections/types/) gains two test-tracking keys per example: - workflow_format_validation.fixture: stem in connection_workflows/ - algebra[]: {op, output, input, [expected]} cross-refs into connection_type_cases.yml. semantics.py models them (WorkflowFormatValidationTest, AlgebraCaseRef); semantics.check() runs validate_workflow_format_validation_refs and validate_algebra_refs to ensure fixture stems exist on disk and algebra rows resolve in the truth table. test_collection_semantics_coverage.py adds bidirectional gating: every fixture is referenced by an example or KNOWN_ORPHANS, every example has algebra: / workflow_format_validation: coverage or sits in EXPECTED_NEITHER. Current state: 40/42 algebra, 7/42 fixture; two examples in EXPECTED_NEITHER (BASIC_MAPPING_INCLUDING_SINGLE_DATASET, BASIC_MAPPING_TWO_INPUTS_WITH_IDENTICAL_STRUCTURE — runtime-only). Helpers connection_workflows_dir(), connection_type_cases_path(), load_examples() exposed for cross-language reuse. Post-rebase verb adoption (PR galaxyproject#22565) connection_types.py / connection_validation.py move to the accepts / can_map_over / compatible vocabulary: - can_match_type → accepts - has_subcollections_of_type → can_map_over - new compatible() free function with sentinel handling, wraps the symmetric CollectionTypeDescription.compatible. _resolve_step_map_over rewritten: pairs of contributions checked with compatible() (was raw collection_type string equality); resolved map-over picks the highest-rank compatible type. Sibling map-over resolution is now order-independent and matches TS mappingConstraints. LIST_NOT_MATCHES_SAMPLE_SHEET and LIST_PAIRED_NOT_MATCHES_SAMPLE_SHEET_PAIRED promoted to algebra coverage now that the asymmetry guard makes the rejection enforceable from accepts / can_map_over directly. Fixture adds (WI-5 sweep + WI-6 conversions) connection_workflows/ gains 12 .gxwf.yml + 12 expected sidecars: ok_list_paired_to_paired_or_unpaired ok_list_list_paired_to_paired_or_unpaired ok_list_to_paired_or_unpaired ok_list_list_over_list_paired_or_unpaired ok_list_to_dataset ok_paired_and_data_no_map_over ok_sample_sheet_to_multi_data ok_simple_chain_dataset ok_subworkflow_list_propagation ok_subworkflow_map_over ok_subworkflow_passthrough ok_two_list_inputs_map_over All reuse existing functional test tools; no new tool XMLs. Refs: vault INTEROP_CONNECTION_TESTING_PLAN.md (WI-4, WI-5), INTEROP_CONNECTION_TESTING_HARDEN_PLAN.md (sweep/conversion), PR galaxyproject#22565. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 3c30cef commit 1eb5f07

31 files changed

Lines changed: 1386 additions & 527 deletions

lib/galaxy/model/dataset_collections/types/collection_semantics.yml

Lines changed: 184 additions & 0 deletions
Large diffs are not rendered by default.

lib/galaxy/model/dataset_collections/types/semantics.py

Lines changed: 89 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,12 +52,36 @@ class WorkflowRuntimeTest(BaseModel):
5252
framework_test: Optional[str] = None
5353

5454

55+
class AlgebraCaseRef(BaseModel):
56+
"""Pointer into connection_type_cases.yml. Algebra cases describe the
57+
pure type calculus (can_match / can_map_over / effective_map_over) used
58+
by every surface that reasons about connection validity. They are NOT a
59+
property of any one surface - they cover the shared library underneath
60+
the editor, the workflow invocation system, and the format2 validator."""
61+
62+
op: Literal["can_match", "can_map_over", "compatible", "effective_map_over"]
63+
output: Optional[str] = None
64+
input: Optional[str] = None
65+
66+
67+
class WorkflowFormatValidationTest(BaseModel):
68+
"""Tracks format2-level validation coverage - specifically, whether the
69+
format2 connection validator has an end-to-end fixture exercising this
70+
example. Analogous to workflow_editor/workflow_runtime: scoped to one
71+
execution surface. Algebra coverage is tracked separately under
72+
``tests.algebra`` because it cross-cuts every surface."""
73+
74+
fixture: Optional[str] = None
75+
76+
5577
class ExampleTests(BaseModel):
5678
model_config = ConfigDict(extra="forbid")
5779

5880
tool_runtime: Optional[ToolRuntimeTest] = None
5981
workflow_runtime: Optional[WorkflowRuntimeTest] = None
6082
workflow_editor: Optional[str] = None
83+
workflow_format_validation: Optional[WorkflowFormatValidationTest] = None
84+
algebra: Optional[list[AlgebraCaseRef]] = None
6185

6286

6387
class DatasetsDeclaration(BaseModel):
@@ -374,20 +398,83 @@ def main(argv=None) -> None:
374398
generate_docs()
375399

376400

377-
def _load_examples() -> list["Example"]:
401+
def _workflow_state_test_dir() -> str:
402+
return os.path.join(galaxy_directory(), "test", "unit", "tool_util", "workflow_state")
403+
404+
405+
def connection_workflows_dir() -> str:
406+
return os.path.join(_workflow_state_test_dir(), "connection_workflows")
407+
408+
409+
def connection_type_cases_path() -> str:
410+
return os.path.join(_workflow_state_test_dir(), "connection_type_cases.yml")
411+
412+
413+
def load_examples() -> list["Example"]:
378414
semantics_yaml = yaml.safe_load(
379415
resource_string("galaxy.model.dataset_collections.types", "collection_semantics.yml")
380416
)
381417
root = YAMLRootModel.model_validate(semantics_yaml)
382418
return [e.example for e in root.root if isinstance(e, ExampleEntry)]
383419

384420

421+
# Back-compat alias (previous name had a leading underscore).
422+
_load_examples = load_examples
423+
424+
385425
def check() -> list[str]:
386-
examples = _load_examples()
426+
examples = load_examples()
387427
errors: list[str] = []
388428
errors.extend(validate_api_test_refs(examples))
389429
errors.extend(validate_tool_refs(examples))
390430
errors.extend(validate_workflow_editor_refs(examples))
431+
errors.extend(validate_workflow_format_validation_refs(examples))
432+
errors.extend(validate_algebra_refs(examples))
433+
return errors
434+
435+
436+
def _load_type_case_keys() -> set[tuple[str, Optional[str], Optional[str]]]:
437+
"""Return (op, output, input) tuples declared in connection_type_cases.yml.
438+
439+
Treat YAML ``null`` (i.e. Python ``None``, which is how unquoted ``NULL``
440+
also deserializes) as the sentinel token for matching purposes."""
441+
path = connection_type_cases_path()
442+
if not os.path.exists(path):
443+
return set()
444+
with open(path) as f:
445+
cases = yaml.safe_load(f) or []
446+
return {(c["op"], c.get("output"), c.get("input")) for c in cases}
447+
448+
449+
def validate_workflow_format_validation_refs(examples: list["Example"]) -> list[str]:
450+
errors: list[str] = []
451+
fixtures_dir = connection_workflows_dir()
452+
for ex in examples:
453+
if not ex.tests or not ex.tests.workflow_format_validation:
454+
continue
455+
wfv = ex.tests.workflow_format_validation
456+
if wfv.fixture:
457+
path = os.path.join(fixtures_dir, f"{wfv.fixture}.gxwf.yml")
458+
if not os.path.exists(path):
459+
errors.append(f"[{ex.label}] workflow_format_validation fixture not found: {wfv.fixture}.gxwf.yml")
460+
return errors
461+
462+
463+
def validate_algebra_refs(examples: list["Example"]) -> list[str]:
464+
if not os.path.exists(connection_type_cases_path()):
465+
return []
466+
errors: list[str] = []
467+
type_case_keys = _load_type_case_keys()
468+
for ex in examples:
469+
if not ex.tests or not ex.tests.algebra:
470+
continue
471+
for tc in ex.tests.algebra:
472+
if (tc.op, tc.output, tc.input) not in type_case_keys:
473+
errors.append(
474+
f"[{ex.label}] algebra entry not found in "
475+
f"connection_type_cases.yml: "
476+
f"op={tc.op} output={tc.output!r} input={tc.input!r}"
477+
)
391478
return errors
392479

393480

lib/galaxy/tool_util/workflow_state/connection_types.py

Lines changed: 33 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44
(NULL_COLLECTION_TYPE, ANY_COLLECTION_TYPE) and free functions for
55
connection validation (can_match, can_map_over, effective_map_over).
66
7-
The base class methods (can_match_type, has_subcollections_of_type,
8-
effective_collection_type) do the real work — this module handles
9-
sentinel dispatch and the dataset (non-collection) case.
7+
The base class methods (accepts, can_map_over, effective_collection_type)
8+
do the real work — this module handles sentinel dispatch and the dataset
9+
(non-collection) case.
1010
"""
1111

1212
from typing import (
@@ -72,7 +72,8 @@ def can_match(
7272
) -> bool:
7373
"""Can output directly satisfy input (no mapping)?
7474
75-
Convention: can_match(output, input) — matches TS inputType.canMatch(outputType).
75+
Convention: can_match(output, input) — wraps the asymmetric edge check
76+
input_type.accepts(output) with sentinel handling.
7677
"""
7778
if output is NULL_COLLECTION_TYPE or input_type is NULL_COLLECTION_TYPE:
7879
return False
@@ -83,7 +84,7 @@ def can_match(
8384
assert isinstance(input_type, CollectionTypeDescription)
8485
assert isinstance(output, CollectionTypeDescription)
8586
for variant in _split_collection_type(input_type):
86-
if variant.can_match_type(output):
87+
if variant.accepts(output):
8788
return True
8889
return False
8990

@@ -108,11 +109,36 @@ def can_map_over(
108109
return True
109110
assert isinstance(input_type, CollectionTypeDescription)
110111
for variant in _split_collection_type(input_type):
111-
if output.has_subcollections_of_type(variant):
112+
if output.can_map_over(variant):
112113
return True
113114
return False
114115

115116

117+
def compatible(
118+
a: CollectionTypeOrSentinel,
119+
b: CollectionTypeOrSentinel,
120+
) -> bool:
121+
"""Symmetric sibling-matching: do a and b match such that they could
122+
drive a common map-over over sibling inputs?
123+
124+
Wraps the symmetric base method ``CollectionTypeDescription.compatible``
125+
with sentinel handling. Order of arguments must not change the answer.
126+
127+
Used at sites resolving sibling map-over contributions where neither
128+
side is the input slot — pre-rebase code used asymmetric can_match here
129+
and produced order-dependent results.
130+
"""
131+
if a is NULL_COLLECTION_TYPE and b is NULL_COLLECTION_TYPE:
132+
return True
133+
if a is NULL_COLLECTION_TYPE or b is NULL_COLLECTION_TYPE:
134+
return False
135+
if a is ANY_COLLECTION_TYPE or b is ANY_COLLECTION_TYPE:
136+
return True
137+
assert isinstance(a, CollectionTypeDescription)
138+
assert isinstance(b, CollectionTypeDescription)
139+
return a.compatible(b)
140+
141+
116142
def is_list_like(ctd: CollectionTypeDescription) -> bool:
117143
"""Is this a list-like collection type (eligible for multi-data reduction)?
118144
@@ -142,7 +168,7 @@ def effective_map_over(
142168
assert isinstance(input_type, CollectionTypeDescription)
143169
# Find the matching variant for comma-separated types
144170
for variant in _split_collection_type(input_type):
145-
if output.has_subcollections_of_type(variant):
171+
if output.can_map_over(variant):
146172
effective = output.effective_collection_type(variant)
147173
return COLLECTION_TYPE_DESCRIPTION_FACTORY.for_collection_type(effective)
148174
return None

lib/galaxy/tool_util/workflow_state/connection_validation.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,9 @@
3737
from .connection_types import (
3838
ANY_COLLECTION_TYPE,
3939
can_match,
40+
collection_type_rank,
4041
CollectionTypeOrSentinel,
42+
compatible,
4143
effective_map_over,
4244
is_list_like,
4345
NULL_COLLECTION_TYPE,
@@ -268,7 +270,7 @@ def _ok(mapping: Optional[str] = None):
268270
# Simple list -> multi-data: full reduction, no map-over
269271
# Deeper (list:list, list:paired, etc.) -> multi-data: outer levels map over
270272
inner_list = COLLECTION_TYPE_DESCRIPTION_FACTORY.for_collection_type("list")
271-
if source_type.has_subcollections_of_type(inner_list):
273+
if source_type.can_map_over(inner_list):
272274
remaining = source_type.effective_collection_type(inner_list)
273275
return _ok(mapping=remaining)
274276
return _ok()
@@ -302,18 +304,23 @@ def _resolve_step_map_over(
302304
) -> Optional[CollectionTypeDescription]:
303305
"""Resolve effective map-over from all connection contributions.
304306
305-
All non-None map-over types must be identical. If any disagree,
306-
report an error (the step can't satisfy both map-over structures).
307+
Pairs of non-None contributions are checked with the symmetric
308+
``compatible`` so order of arrival of sibling inputs doesn't change the
309+
answer (matches TS ``mappingConstraints``). The resolved map-over is
310+
the highest-rank compatible type — TS picks "most specific" the same
311+
way.
307312
"""
308313
non_none = [c for c in contributions if c is not None]
309314
if not non_none:
310315
return None
311316

312317
best = non_none[0]
313318
for ctd in non_none[1:]:
314-
if ctd.collection_type != best.collection_type:
319+
if not compatible(best, ctd):
315320
step_result.errors.append(f"Incompatible map-over types: {best.collection_type} vs {ctd.collection_type}")
316-
return best # return something, error recorded
321+
return best
322+
if collection_type_rank(ctd) > collection_type_rank(best):
323+
best = ctd
317324

318325
return best
319326

0 commit comments

Comments
 (0)