Push UserToolSource semantic validation onto the pydantic model by mvdbeek · Pull Request #22615 · galaxyproject/galaxy

mvdbeek · 2026-05-01T15:52:32Z

Takes the validation in #22611 and moves it into the pydantic models so all consumers benefit from these rules.

Move the id pattern, container shape, citation DOI/BibTeX checks, blank required-field checks, undeclared inputs.<name> references in shell_command / configfiles[*].content, and dataset/collection output discovery requirements onto field/model validators on _DynamicToolSourceBase, UserToolSource, and Citation. Doing so benefits API callers of DynamicUnprivilegedToolCreatePayload, not just CustomToolAgent.

Tool ids accept [a-z][a-z0-9_-]* (hyphens allowed). Outputs of a tool with shell_command must claim their bytes via from_work_dir or discover_datasets; the prior "name appears in command" heuristic produced both false positives and false negatives.

format_validation_errors distills ValidationError to friendly bullet strings for reuse by the agent and API layers. CustomToolAgent walks the cause chain on UnexpectedModelBehavior to surface the underlying ValidationError as a low-confidence response with bullet list, and logs at debug since validation failure is the expected path while a user is editing.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

Move the id pattern, container shape, citation DOI/BibTeX checks, blank required-field checks, undeclared `inputs.<name>` references in `shell_command` / `configfiles[*].content`, and dataset/collection output discovery requirements onto field/model validators on `_DynamicToolSourceBase`, `UserToolSource`, and `Citation`. Doing so benefits API callers of `DynamicUnprivilegedToolCreatePayload`, not just `CustomToolAgent`. Tool ids accept `[a-z][a-z0-9_-]*` (hyphens allowed). Outputs of a tool with `shell_command` must claim their bytes via `from_work_dir` or `discover_datasets`; the prior "name appears in command" heuristic produced both false positives and false negatives. `format_validation_errors` distills `ValidationError` to friendly bullet strings for reuse by the agent and API layers. `CustomToolAgent` walks the cause chain on `UnexpectedModelBehavior` to surface the underlying `ValidationError` as a low-confidence response with bullet list, and logs at debug since validation failure is the expected path while a user is editing.

jmchilton · 2026-05-01T17:29:10Z

+# references allow optional registry path segments and an optional tag.
+CONTAINER_PREFIXES: Tuple[str, ...] = ("quay.io/biocontainers/", "docker://", "oras://")
+DOCKER_IMAGE_RE = re.compile(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*(/[a-zA-Z0-9._-]+)*(:[\w][\w.-]*)?$")
+


This is a linter concern not a tool source validation concern. Really -.9 on this being here. This isn't a YAML tool concern it is an abstract ToolSource concern. Create a lint rule and require it to be met for Galaxy User Defined tools behind some flag that defaults to that behavior.

3c97502 any better ?

Builds on galaxyproject#22615's integrated UserToolSource validation. The producer agent runs with output_retries=0 so the reflection loop owns the retry and can pass a structured error list back to the prompt rather than pydantic-ai's generic 'try again' message. Two opt-in loops on top of pydantic validation: - Validator-driven retry (default on): if the producer's output fails validation, _produce_tool returns the formatted error list, and process() re-calls the producer once with those errors prefixed to the prompt. Cap of one retry; if it still fails, the agent returns a low-confidence validation_failed response. - Quality critic + refine (default off): an LLM critic agent reviews the validated tool for clarity (description, labels, help text) and idiomaticity (defaults, exposed options, container choice) -- the fuzzy dimensions pydantic can't see. If the critic flags significant issues (should_refine=true), the producer is re-rolled once with the critique. Cap of one refine; if refinement breaks validation, the original tool is kept rather than ship something worse. Both gates live under inference_services.custom_tool: validator_retry_enabled (default true) and quality_critic_enabled (default false). Defaulting the critic off keeps the cost neutral for deployments that don't opt in -- the critic doubles tool-creation latency / spend when enabled. The 170-line process() method got factored into named helpers (capability check, structured-output extraction, validation-failed response, success response, model-error handlers) to keep the reflection control flow readable.

Mirrors galaxyproject#22615's relaxed output discovery -- a UserToolSource with a shell_command must claim its outputs via from_work_dir or discover_datasets, but the prompt only mentioned the former. The producer was occasionally generating tools that pattern-matched on output names in the command, which the integrated validator now correctly rejects. Pre-commit's markdown formatter also normalized the YAML example indentation in the same file.

Reviewer feedback on galaxyproject#22615: the container regex/prefix shape check is a linter concern, not a tool source validation concern, and applies to any ToolSource (not just YAML). Pull the shape check off `UserToolSource._check_container_shape` and surface it as a new `ContainerImageShape` linter under `galaxy.tool_util.linters.containers`, discovered automatically by the existing lint framework (`lint_tool_types = ["*"]`). The linter walks `tool_source.parse_requirements()` and warns on identifiers that do not match a recognized prefix or `<image>[:<tag>]` shape. For Galaxy User Defined tools, enforcement is unconditional: the unprivileged tools API and the `CustomToolAgent` both call a new `lint_user_tool_source` helper after pydantic validation succeeds and block creation if WARN+ messages are returned. The not-empty/whitespace check on `container` stays on the pydantic model — that is a basic field validity check rather than a "shape" concern. Container-shape tests move from `test_user_tool_source_validation.py` to a new `test_container_shape_lint.py` covering both the linter and the helper.

Builds on galaxyproject#22615's integrated UserToolSource validation. The producer agent runs with output_retries=0 so the reflection loop owns the retry and can pass a structured error list back to the prompt rather than pydantic-ai's generic 'try again' message. Two opt-in loops on top of pydantic validation: - Validator-driven retry (default on): if the producer's output fails validation, _produce_tool returns the formatted error list, and process() re-calls the producer once with those errors prefixed to the prompt. Cap of one retry; if it still fails, the agent returns a low-confidence validation_failed response. - Quality critic + refine (default off): an LLM critic agent reviews the validated tool for clarity (description, labels, help text) and idiomaticity (defaults, exposed options, container choice) -- the fuzzy dimensions pydantic can't see. If the critic flags significant issues (should_refine=true), the producer is re-rolled once with the critique. Cap of one refine; if refinement breaks validation, the original tool is kept rather than ship something worse. Both gates live under inference_services.custom_tool: validator_retry_enabled (default true) and quality_critic_enabled (default false). Defaulting the critic off keeps the cost neutral for deployments that don't opt in -- the critic doubles tool-creation latency / spend when enabled. The 170-line process() method got factored into named helpers (capability check, structured-output extraction, validation-failed response, success response, model-error handlers) to keep the reflection control flow readable.

Mirrors galaxyproject#22615's relaxed output discovery -- a UserToolSource with a shell_command must claim its outputs via from_work_dir or discover_datasets, but the prompt only mentioned the former. The producer was occasionally generating tools that pattern-matched on output names in the command, which the integrated validator now correctly rejects. Pre-commit's markdown formatter also normalized the YAML example indentation in the same file.

Reviewer feedback on #22615: the container regex/prefix shape check is a linter concern, not a tool source validation concern, and applies to any ToolSource (not just YAML). Pull the shape check off `UserToolSource._check_container_shape` and surface it as a new `ContainerImageShape` linter under `galaxy.tool_util.linters.containers`, discovered automatically by the existing lint framework (`lint_tool_types = ["*"]`). The linter walks `tool_source.parse_requirements()` and warns on identifiers that do not match a recognized prefix or `<image>[:<tag>]` shape. For Galaxy User Defined tools, enforcement is unconditional: the unprivileged tools API and the `CustomToolAgent` both call a new `lint_user_tool_source` helper after pydantic validation succeeds and block creation if WARN+ messages are returned. The not-empty/whitespace check on `container` stays on the pydantic model — that is a basic field validity check rather than a "shape" concern. Container-shape tests move from `test_user_tool_source_validation.py` to a new `test_container_shape_lint.py` covering both the linter and the helper.

Builds on #22615's integrated UserToolSource validation. The producer agent runs with output_retries=0 so the reflection loop owns the retry and can pass a structured error list back to the prompt rather than pydantic-ai's generic 'try again' message. Two opt-in loops on top of pydantic validation: - Validator-driven retry (default on): if the producer's output fails validation, _produce_tool returns the formatted error list, and process() re-calls the producer once with those errors prefixed to the prompt. Cap of one retry; if it still fails, the agent returns a low-confidence validation_failed response. - Quality critic + refine (default off): an LLM critic agent reviews the validated tool for clarity (description, labels, help text) and idiomaticity (defaults, exposed options, container choice) -- the fuzzy dimensions pydantic can't see. If the critic flags significant issues (should_refine=true), the producer is re-rolled once with the critique. Cap of one refine; if refinement breaks validation, the original tool is kept rather than ship something worse. Both gates live under inference_services.custom_tool: validator_retry_enabled (default true) and quality_critic_enabled (default false). Defaulting the critic off keeps the cost neutral for deployments that don't opt in -- the critic doubles tool-creation latency / spend when enabled. The 170-line process() method got factored into named helpers (capability check, structured-output extraction, validation-failed response, success response, model-error handlers) to keep the reflection control flow readable.

Mirrors #22615's relaxed output discovery -- a UserToolSource with a shell_command must claim its outputs via from_work_dir or discover_datasets, but the prompt only mentioned the former. The producer was occasionally generating tools that pattern-matched on output names in the command, which the integrated validator now correctly rejects. Pre-commit's markdown formatter also normalized the YAML example indentation in the same file.

Reviewer feedback on #22615: the container regex/prefix shape check is a linter concern, not a tool source validation concern, and applies to any ToolSource (not just YAML). Pull the shape check off `UserToolSource._check_container_shape` and surface it as a new `ContainerImageShape` linter under `galaxy.tool_util.linters.containers`, discovered automatically by the existing lint framework (`lint_tool_types = ["*"]`). The linter walks `tool_source.parse_requirements()` and warns on identifiers that do not match a recognized prefix or `<image>[:<tag>]` shape. For Galaxy User Defined tools, enforcement is unconditional: the unprivileged tools API and the `CustomToolAgent` both call a new `lint_user_tool_source` helper after pydantic validation succeeds and block creation if WARN+ messages are returned. The not-empty/whitespace check on `container` stays on the pydantic model — that is a basic field validity check rather than a "shape" concern. Container-shape tests move from `test_user_tool_source_validation.py` to a new `test_container_shape_lint.py` covering both the linter and the helper.

Builds on #22615's integrated UserToolSource validation. The producer agent runs with output_retries=0 so the reflection loop owns the retry and can pass a structured error list back to the prompt rather than pydantic-ai's generic 'try again' message. Two opt-in loops on top of pydantic validation: - Validator-driven retry (default on): if the producer's output fails validation, _produce_tool returns the formatted error list, and process() re-calls the producer once with those errors prefixed to the prompt. Cap of one retry; if it still fails, the agent returns a low-confidence validation_failed response. - Quality critic + refine (default off): an LLM critic agent reviews the validated tool for clarity (description, labels, help text) and idiomaticity (defaults, exposed options, container choice) -- the fuzzy dimensions pydantic can't see. If the critic flags significant issues (should_refine=true), the producer is re-rolled once with the critique. Cap of one refine; if refinement breaks validation, the original tool is kept rather than ship something worse. Both gates live under inference_services.custom_tool: validator_retry_enabled (default true) and quality_critic_enabled (default false). Defaulting the critic off keeps the cost neutral for deployments that don't opt in -- the critic doubles tool-creation latency / spend when enabled. The 170-line process() method got factored into named helpers (capability check, structured-output extraction, validation-failed response, success response, model-error handlers) to keep the reflection control flow readable.

Mirrors #22615's relaxed output discovery -- a UserToolSource with a shell_command must claim its outputs via from_work_dir or discover_datasets, but the prompt only mentioned the former. The producer was occasionally generating tools that pattern-matched on output names in the command, which the integrated validator now correctly rejects. Pre-commit's markdown formatter also normalized the YAML example indentation in the same file.

github-project-automation Bot added this to Galaxy Dev - weeklies May 1, 2026

github-project-automation Bot moved this to Needs Review in Galaxy Dev - weeklies May 1, 2026

mvdbeek requested review from dannon and jmchilton and removed request for dannon May 1, 2026 15:52

github-actions Bot added the area/testing label May 1, 2026

github-actions Bot added this to the 26.1 milestone May 1, 2026

mvdbeek force-pushed the more-validation-user-tool-sources branch from 0b5ed56 to 0258589 Compare May 1, 2026 16:07

mvdbeek requested a review from dannon May 1, 2026 16:57

jmchilton reviewed May 1, 2026

View reviewed changes

dannon mentioned this pull request May 1, 2026

Validate generated UserToolSource semantically before returning it #22611

Closed

2 tasks

mvdbeek force-pushed the more-validation-user-tool-sources branch from 25ca402 to 3c97502 Compare May 1, 2026 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push UserToolSource semantic validation onto the pydantic model#22615

Push UserToolSource semantic validation onto the pydantic model#22615
mvdbeek wants to merge 2 commits intogalaxyproject:devfrom
mvdbeek:more-validation-user-tool-sources

mvdbeek commented May 1, 2026

Uh oh!

jmchilton May 1, 2026

Uh oh!

mvdbeek May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mvdbeek commented May 1, 2026

How to test the changes?

License

Uh oh!

jmchilton May 1, 2026

Choose a reason for hiding this comment

Uh oh!

mvdbeek May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants