Skip to content

Give the chat router fast-path tools for browsing and inventory#22610

Merged
mvdbeek merged 7 commits intogalaxyproject:devfrom
dannon:agent-router-fast-path-tools
May 1, 2026
Merged

Give the chat router fast-path tools for browsing and inventory#22610
mvdbeek merged 7 commits intogalaxyproject:devfrom
dannon:agent-router-fast-path-tools

Conversation

@dannon
Copy link
Copy Markdown
Member

@dannon dannon commented Apr 30, 2026

The router was handing off every query to a specialist, even for trivial read-only operations. "What histories do I have?" round-tripped through HistoryAgent for a result that's just list_user_histories(). This adds a small set of @agent.tool functions on the router itself for browsing/inventory queries, so the easy cases skip the handoff round-trip while everything else still gets specialist routing.

Curation principle: stateless + read-only + single AgentOperationsManager call = router tool. Multi-step reasoning, structured output, or domain-specific prompting = specialist handoff.

The seven tools:

Tool Used for
list_histories(limit=10) "What histories do I have?"
get_history_summary(history_id) "Tell me about history abc123" (metadata only)
list_workflows(filter="") "What workflows do I have?"
search_workflows(query, limit=10) "Do I have an RNA-seq workflow?"
search_tools(query, limit=10) "Is FastQC installed?"
get_user_info() "Who am I?"
get_server_info() "What version of Galaxy is this?"

The router's prompt has a new "Fast-path tools" section listing each tool and a one-line cue for when to use it vs. hand off. Recommendation queries ("what tool should I use for my analysis?") still route to tool_recommendation -- the new tools are for availability, not ranking. IWC catalog search is deliberately out of scope; that's a different data source / ranking story and deserves its own follow-up.

Two safety details: search_tools clamps its limit argument so zero/negative values fall back to the default of 10 and oversized values cap at 50 (the underlying ops.search_tools doesn't take a limit parameter, so the wrapper slices). get_history_summary catches MalformedId and returns a structured error pointing the model back at list_histories rather than crashing the run.

Hard cap of 6-10 tools to keep routing quality from drifting. We're at 7; if more fast-paths are wanted later (workflow-run browsing -- get_invocations / get_invocation_details / get_workflow_details -- is the obvious Tier 2), the right move is probably a "browsing skill" rather than growing the router further.

Test plan

  • pytest test/unit/app/test_agents.py::TestAgentUnitMocked -- 35 passed (12 new: registration with cap guardrail, per-tool happy paths, MalformedId path, empty-filter path, oversized-result truncation, parametrized clamp tests for 0/-1/-5 limits, oversized-limit clamp at 50)
  • Manual smoke: ask the chat router "what histories do I have?" and confirm it answers via the tool rather than handing off

dannon added 5 commits April 29, 2026 23:05
Right now the router hands every browsing query off to a specialist, even
trivial ones -- "what histories do I have?" round-trips through the history
agent for what should be a single AgentOperationsManager call. This wires up
four @agent.tool fast-path tools on QueryRouterAgent so the router can answer
those directly: list_histories, get_history_summary, list_workflows, and
get_user_info. Each one is a thin async wrapper around AgentOperationsManager
(via anyio.to_thread.run_sync), so the existing service-layer permission and
encoding logic still runs.

The curation principle (also documented in router.md): router tools are for
stateless, low-cost, read-only browsing that maps to a single ops call.
Anything that needs domain prompting, multi-step context building, or
structured output -- summarizing a history, error analysis, tool generation
-- still hands off to the specialist. The original target list included
list_recent_jobs, but AgentOperationsManager doesn't expose a list-jobs
method (only get_job_status for a single id), so that one's dropped here. We
can add it later if/when the ops layer grows that capability.

Tests cover registration (the four tools show up in agent.toolsets) plus per-
tool unit tests that mock AgentOperationsManager and assert the right method
gets called with the right args, including a MalformedId path on
get_history_summary and the empty-filter -> search=None translation on
list_workflows.
Server metadata (version, brand, URL, capability flags) is a stateless
single-call lookup that doesn't need a specialist. Adds the tool to the
router, lists it in the prompt's fast-path section, and adds focused
unit-test coverage.
Inventory/availability questions like "is FastQC installed?" or "do I
have an RNA-seq workflow?" don't need specialist routing -- they're
single-call lookups against the toolbox or workflow index. The router
handling them directly avoids a handoff round-trip.

Boundary stays tight: these are availability/inventory only. "What tool
should I use for my analysis?" still routes to the tool_recommendation
specialist, and IWC catalog search is deliberately out of scope (different
data source, different ranking semantics) -- left for a follow-up.

ops.search_tools doesn't accept a limit, so the router wrapper slices
oversized result lists and flags the truncation back to the model.
Negative limits silently sliced from the tail (tools[:-5]) and very large
limits would have happily echoed thousands of results. Treat anything <= 0
as "use the default of 10," and cap at 50 to bound the per-turn payload
the router ships back to the model.
CLAUDE.md is explicit that test files shouldn't use inline imports.
Move the lone galaxy.exceptions.MalformedId import to the top of the
test file alongside the other galaxy.* imports.
Comment thread test/unit/app/test_agents.py Outdated
self.deps.get_agent.assert_called_once_with("history", self.deps)
assert "History summary" in response

def test_router_registers_fast_path_tools(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think these tests are useful - this first one seems to just be testing the literal implementation list of strings. All the rest of them are just sort of seem to be testing the mock data being patched?

I would prefer if you drop them. If you're unwilling to drop them would you trade for me adapting the test fixture static response for router testing - not work I want to do per se but I would prefer it to these unit tests.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, 100% fair -- they were mostly testing the mocks back at themselves. even the "real logic" ones tested 2-4 lines of arithmetic that don't prove the router actually picks the right tool. all gone in 9f5666d

dannon added 2 commits April 30, 2026 19:48
Mypy on CI didn't like the bare ``router.agent.toolsets[0].tools[name]``
expression repeated across the per-tool tests: ``toolsets[0]`` is
typed as ``AbstractToolset``, and ``tools`` is only an instance
attribute on the concrete ``FunctionToolset`` subclass.

Extract a single ``_fast_path_tool(router, name)`` helper using
``getattr`` (matching the pattern the registration test already uses)
so the cast lives in one place and returns Any -- the per-tool tests
then look up tools without tripping mypy.
John flagged in galaxyproject#22610 that the per-tool unit tests don't earn their
keep -- the registration test just enumerates strings, and the
per-tool tests largely confirm "the mock returned what I patched it
with". A few of them tested small bits of wrapper logic (limit
clamping, MalformedId handling, empty-filter mapping) but even those
were 2-4 lines of arithmetic each, with no coverage of the thing that
actually matters: whether the router decides to call these tools at
the right times.

Drop the whole TestAgentUnitMocked fast-path block plus the
``_fast_path_tool`` helper that just got added for mypy. John offered
to extend the existing static-response test fixture for router
testing as a follow-up, which gives us real router-decision coverage
instead of mock-passthrough coverage.
- Scientific analysis best practices
- Galaxy features and capabilities

## Fast-path tools
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section seems relatively verbose. Maybe we should have some post-processing for markdown consumed by LLMs that compacts these (caveman style) ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, let's follow up after I get the evals framework in and will have benchmarks to iterate on

@mvdbeek mvdbeek merged commit dc39768 into galaxyproject:dev May 1, 2026
52 of 57 checks passed
@github-project-automation github-project-automation Bot moved this from Needs Review to Done in Galaxy Dev - weeklies May 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

This PR was merged without a "kind/" label, please correct.

@itisAliRH itisAliRH deleted the agent-router-fast-path-tools branch May 4, 2026 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants