Push UserToolSource semantic validation onto the pydantic model#22615
Open
mvdbeek wants to merge 2 commits intogalaxyproject:devfrom
Open
Push UserToolSource semantic validation onto the pydantic model#22615mvdbeek wants to merge 2 commits intogalaxyproject:devfrom
mvdbeek wants to merge 2 commits intogalaxyproject:devfrom
Conversation
Move the id pattern, container shape, citation DOI/BibTeX checks, blank required-field checks, undeclared `inputs.<name>` references in `shell_command` / `configfiles[*].content`, and dataset/collection output discovery requirements onto field/model validators on `_DynamicToolSourceBase`, `UserToolSource`, and `Citation`. Doing so benefits API callers of `DynamicUnprivilegedToolCreatePayload`, not just `CustomToolAgent`. Tool ids accept `[a-z][a-z0-9_-]*` (hyphens allowed). Outputs of a tool with `shell_command` must claim their bytes via `from_work_dir` or `discover_datasets`; the prior "name appears in command" heuristic produced both false positives and false negatives. `format_validation_errors` distills `ValidationError` to friendly bullet strings for reuse by the agent and API layers. `CustomToolAgent` walks the cause chain on `UnexpectedModelBehavior` to surface the underlying `ValidationError` as a low-confidence response with bullet list, and logs at debug since validation failure is the expected path while a user is editing.
0b5ed56 to
0258589
Compare
jmchilton
reviewed
May 1, 2026
| # references allow optional registry path segments and an optional tag. | ||
| CONTAINER_PREFIXES: Tuple[str, ...] = ("quay.io/biocontainers/", "docker://", "oras://") | ||
| DOCKER_IMAGE_RE = re.compile(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*(/[a-zA-Z0-9._-]+)*(:[\w][\w.-]*)?$") | ||
|
|
Member
There was a problem hiding this comment.
This is a linter concern not a tool source validation concern. Really -.9 on this being here. This isn't a YAML tool concern it is an abstract ToolSource concern. Create a lint rule and require it to be met for Galaxy User Defined tools behind some flag that defaults to that behavior.
dannon
added a commit
to dannon/galaxy
that referenced
this pull request
May 1, 2026
Builds on galaxyproject#22615's integrated UserToolSource validation. The producer agent runs with output_retries=0 so the reflection loop owns the retry and can pass a structured error list back to the prompt rather than pydantic-ai's generic 'try again' message. Two opt-in loops on top of pydantic validation: - Validator-driven retry (default on): if the producer's output fails validation, _produce_tool returns the formatted error list, and process() re-calls the producer once with those errors prefixed to the prompt. Cap of one retry; if it still fails, the agent returns a low-confidence validation_failed response. - Quality critic + refine (default off): an LLM critic agent reviews the validated tool for clarity (description, labels, help text) and idiomaticity (defaults, exposed options, container choice) -- the fuzzy dimensions pydantic can't see. If the critic flags significant issues (should_refine=true), the producer is re-rolled once with the critique. Cap of one refine; if refinement breaks validation, the original tool is kept rather than ship something worse. Both gates live under inference_services.custom_tool: validator_retry_enabled (default true) and quality_critic_enabled (default false). Defaulting the critic off keeps the cost neutral for deployments that don't opt in -- the critic doubles tool-creation latency / spend when enabled. The 170-line process() method got factored into named helpers (capability check, structured-output extraction, validation-failed response, success response, model-error handlers) to keep the reflection control flow readable.
dannon
added a commit
to dannon/galaxy
that referenced
this pull request
May 1, 2026
Mirrors galaxyproject#22615's relaxed output discovery -- a UserToolSource with a shell_command must claim its outputs via from_work_dir or discover_datasets, but the prompt only mentioned the former. The producer was occasionally generating tools that pattern-matched on output names in the command, which the integrated validator now correctly rejects. Pre-commit's markdown formatter also normalized the YAML example indentation in the same file.
2 tasks
mvdbeek
added a commit
to mvdbeek/galaxy
that referenced
this pull request
May 1, 2026
Reviewer feedback on galaxyproject#22615: the container regex/prefix shape check is a linter concern, not a tool source validation concern, and applies to any ToolSource (not just YAML). Pull the shape check off `UserToolSource._check_container_shape` and surface it as a new `ContainerImageShape` linter under `galaxy.tool_util.linters.containers`, discovered automatically by the existing lint framework (`lint_tool_types = ["*"]`). The linter walks `tool_source.parse_requirements()` and warns on identifiers that do not match a recognized prefix or `<image>[:<tag>]` shape. For Galaxy User Defined tools, enforcement is unconditional: the unprivileged tools API and the `CustomToolAgent` both call a new `lint_user_tool_source` helper after pydantic validation succeeds and block creation if WARN+ messages are returned. The not-empty/whitespace check on `container` stays on the pydantic model — that is a basic field validity check rather than a "shape" concern. Container-shape tests move from `test_user_tool_source_validation.py` to a new `test_container_shape_lint.py` covering both the linter and the helper.
Reviewer feedback on galaxyproject#22615: the container regex/prefix shape check is a linter concern, not a tool source validation concern, and applies to any ToolSource (not just YAML). Pull the shape check off `UserToolSource._check_container_shape` and surface it as a new `ContainerImageShape` linter under `galaxy.tool_util.linters.containers`, discovered automatically by the existing lint framework (`lint_tool_types = ["*"]`). The linter walks `tool_source.parse_requirements()` and warns on identifiers that do not match a recognized prefix or `<image>[:<tag>]` shape. For Galaxy User Defined tools, enforcement is unconditional: the unprivileged tools API and the `CustomToolAgent` both call a new `lint_user_tool_source` helper after pydantic validation succeeds and block creation if WARN+ messages are returned. The not-empty/whitespace check on `container` stays on the pydantic model — that is a basic field validity check rather than a "shape" concern. Container-shape tests move from `test_user_tool_source_validation.py` to a new `test_container_shape_lint.py` covering both the linter and the helper.
25ca402 to
3c97502
Compare
dannon
added a commit
to dannon/galaxy
that referenced
this pull request
May 1, 2026
Builds on galaxyproject#22615's integrated UserToolSource validation. The producer agent runs with output_retries=0 so the reflection loop owns the retry and can pass a structured error list back to the prompt rather than pydantic-ai's generic 'try again' message. Two opt-in loops on top of pydantic validation: - Validator-driven retry (default on): if the producer's output fails validation, _produce_tool returns the formatted error list, and process() re-calls the producer once with those errors prefixed to the prompt. Cap of one retry; if it still fails, the agent returns a low-confidence validation_failed response. - Quality critic + refine (default off): an LLM critic agent reviews the validated tool for clarity (description, labels, help text) and idiomaticity (defaults, exposed options, container choice) -- the fuzzy dimensions pydantic can't see. If the critic flags significant issues (should_refine=true), the producer is re-rolled once with the critique. Cap of one refine; if refinement breaks validation, the original tool is kept rather than ship something worse. Both gates live under inference_services.custom_tool: validator_retry_enabled (default true) and quality_critic_enabled (default false). Defaulting the critic off keeps the cost neutral for deployments that don't opt in -- the critic doubles tool-creation latency / spend when enabled. The 170-line process() method got factored into named helpers (capability check, structured-output extraction, validation-failed response, success response, model-error handlers) to keep the reflection control flow readable.
dannon
added a commit
to dannon/galaxy
that referenced
this pull request
May 1, 2026
Mirrors galaxyproject#22615's relaxed output discovery -- a UserToolSource with a shell_command must claim its outputs via from_work_dir or discover_datasets, but the prompt only mentioned the former. The producer was occasionally generating tools that pattern-matched on output names in the command, which the integrated validator now correctly rejects. Pre-commit's markdown formatter also normalized the YAML example indentation in the same file.
mvdbeek
added a commit
that referenced
this pull request
May 3, 2026
Reviewer feedback on #22615: the container regex/prefix shape check is a linter concern, not a tool source validation concern, and applies to any ToolSource (not just YAML). Pull the shape check off `UserToolSource._check_container_shape` and surface it as a new `ContainerImageShape` linter under `galaxy.tool_util.linters.containers`, discovered automatically by the existing lint framework (`lint_tool_types = ["*"]`). The linter walks `tool_source.parse_requirements()` and warns on identifiers that do not match a recognized prefix or `<image>[:<tag>]` shape. For Galaxy User Defined tools, enforcement is unconditional: the unprivileged tools API and the `CustomToolAgent` both call a new `lint_user_tool_source` helper after pydantic validation succeeds and block creation if WARN+ messages are returned. The not-empty/whitespace check on `container` stays on the pydantic model — that is a basic field validity check rather than a "shape" concern. Container-shape tests move from `test_user_tool_source_validation.py` to a new `test_container_shape_lint.py` covering both the linter and the helper.
mvdbeek
pushed a commit
that referenced
this pull request
May 3, 2026
Builds on #22615's integrated UserToolSource validation. The producer agent runs with output_retries=0 so the reflection loop owns the retry and can pass a structured error list back to the prompt rather than pydantic-ai's generic 'try again' message. Two opt-in loops on top of pydantic validation: - Validator-driven retry (default on): if the producer's output fails validation, _produce_tool returns the formatted error list, and process() re-calls the producer once with those errors prefixed to the prompt. Cap of one retry; if it still fails, the agent returns a low-confidence validation_failed response. - Quality critic + refine (default off): an LLM critic agent reviews the validated tool for clarity (description, labels, help text) and idiomaticity (defaults, exposed options, container choice) -- the fuzzy dimensions pydantic can't see. If the critic flags significant issues (should_refine=true), the producer is re-rolled once with the critique. Cap of one refine; if refinement breaks validation, the original tool is kept rather than ship something worse. Both gates live under inference_services.custom_tool: validator_retry_enabled (default true) and quality_critic_enabled (default false). Defaulting the critic off keeps the cost neutral for deployments that don't opt in -- the critic doubles tool-creation latency / spend when enabled. The 170-line process() method got factored into named helpers (capability check, structured-output extraction, validation-failed response, success response, model-error handlers) to keep the reflection control flow readable.
mvdbeek
pushed a commit
that referenced
this pull request
May 3, 2026
Mirrors #22615's relaxed output discovery -- a UserToolSource with a shell_command must claim its outputs via from_work_dir or discover_datasets, but the prompt only mentioned the former. The producer was occasionally generating tools that pattern-matched on output names in the command, which the integrated validator now correctly rejects. Pre-commit's markdown formatter also normalized the YAML example indentation in the same file.
mvdbeek
added a commit
that referenced
this pull request
May 4, 2026
Reviewer feedback on #22615: the container regex/prefix shape check is a linter concern, not a tool source validation concern, and applies to any ToolSource (not just YAML). Pull the shape check off `UserToolSource._check_container_shape` and surface it as a new `ContainerImageShape` linter under `galaxy.tool_util.linters.containers`, discovered automatically by the existing lint framework (`lint_tool_types = ["*"]`). The linter walks `tool_source.parse_requirements()` and warns on identifiers that do not match a recognized prefix or `<image>[:<tag>]` shape. For Galaxy User Defined tools, enforcement is unconditional: the unprivileged tools API and the `CustomToolAgent` both call a new `lint_user_tool_source` helper after pydantic validation succeeds and block creation if WARN+ messages are returned. The not-empty/whitespace check on `container` stays on the pydantic model — that is a basic field validity check rather than a "shape" concern. Container-shape tests move from `test_user_tool_source_validation.py` to a new `test_container_shape_lint.py` covering both the linter and the helper.
mvdbeek
pushed a commit
that referenced
this pull request
May 4, 2026
Builds on #22615's integrated UserToolSource validation. The producer agent runs with output_retries=0 so the reflection loop owns the retry and can pass a structured error list back to the prompt rather than pydantic-ai's generic 'try again' message. Two opt-in loops on top of pydantic validation: - Validator-driven retry (default on): if the producer's output fails validation, _produce_tool returns the formatted error list, and process() re-calls the producer once with those errors prefixed to the prompt. Cap of one retry; if it still fails, the agent returns a low-confidence validation_failed response. - Quality critic + refine (default off): an LLM critic agent reviews the validated tool for clarity (description, labels, help text) and idiomaticity (defaults, exposed options, container choice) -- the fuzzy dimensions pydantic can't see. If the critic flags significant issues (should_refine=true), the producer is re-rolled once with the critique. Cap of one refine; if refinement breaks validation, the original tool is kept rather than ship something worse. Both gates live under inference_services.custom_tool: validator_retry_enabled (default true) and quality_critic_enabled (default false). Defaulting the critic off keeps the cost neutral for deployments that don't opt in -- the critic doubles tool-creation latency / spend when enabled. The 170-line process() method got factored into named helpers (capability check, structured-output extraction, validation-failed response, success response, model-error handlers) to keep the reflection control flow readable.
mvdbeek
pushed a commit
that referenced
this pull request
May 4, 2026
Mirrors #22615's relaxed output discovery -- a UserToolSource with a shell_command must claim its outputs via from_work_dir or discover_datasets, but the prompt only mentioned the former. The producer was occasionally generating tools that pattern-matched on output names in the command, which the integrated validator now correctly rejects. Pre-commit's markdown formatter also normalized the YAML example indentation in the same file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Takes the validation in #22611 and moves it into the pydantic models so all consumers benefit from these rules.
Move the id pattern, container shape, citation DOI/BibTeX checks, blank required-field checks, undeclared
inputs.<name>references inshell_command/configfiles[*].content, and dataset/collection output discovery requirements onto field/model validators on_DynamicToolSourceBase,UserToolSource, andCitation. Doing so benefits API callers ofDynamicUnprivilegedToolCreatePayload, not justCustomToolAgent.Tool ids accept
[a-z][a-z0-9_-]*(hyphens allowed). Outputs of a tool withshell_commandmust claim their bytes viafrom_work_dirordiscover_datasets; the prior "name appears in command" heuristic produced both false positives and false negatives.format_validation_errorsdistillsValidationErrorto friendly bullet strings for reuse by the agent and API layers.CustomToolAgentwalks the cause chain onUnexpectedModelBehaviorto surface the underlyingValidationErroras a low-confidence response with bullet list, and logs at debug since validation failure is the expected path while a user is editing.How to test the changes?
(Select all options that apply)
License