Give the chat router fast-path tools for browsing and inventory#22610
Give the chat router fast-path tools for browsing and inventory#22610mvdbeek merged 7 commits intogalaxyproject:devfrom
Conversation
Right now the router hands every browsing query off to a specialist, even trivial ones -- "what histories do I have?" round-trips through the history agent for what should be a single AgentOperationsManager call. This wires up four @agent.tool fast-path tools on QueryRouterAgent so the router can answer those directly: list_histories, get_history_summary, list_workflows, and get_user_info. Each one is a thin async wrapper around AgentOperationsManager (via anyio.to_thread.run_sync), so the existing service-layer permission and encoding logic still runs. The curation principle (also documented in router.md): router tools are for stateless, low-cost, read-only browsing that maps to a single ops call. Anything that needs domain prompting, multi-step context building, or structured output -- summarizing a history, error analysis, tool generation -- still hands off to the specialist. The original target list included list_recent_jobs, but AgentOperationsManager doesn't expose a list-jobs method (only get_job_status for a single id), so that one's dropped here. We can add it later if/when the ops layer grows that capability. Tests cover registration (the four tools show up in agent.toolsets) plus per- tool unit tests that mock AgentOperationsManager and assert the right method gets called with the right args, including a MalformedId path on get_history_summary and the empty-filter -> search=None translation on list_workflows.
Server metadata (version, brand, URL, capability flags) is a stateless single-call lookup that doesn't need a specialist. Adds the tool to the router, lists it in the prompt's fast-path section, and adds focused unit-test coverage.
Inventory/availability questions like "is FastQC installed?" or "do I have an RNA-seq workflow?" don't need specialist routing -- they're single-call lookups against the toolbox or workflow index. The router handling them directly avoids a handoff round-trip. Boundary stays tight: these are availability/inventory only. "What tool should I use for my analysis?" still routes to the tool_recommendation specialist, and IWC catalog search is deliberately out of scope (different data source, different ranking semantics) -- left for a follow-up. ops.search_tools doesn't accept a limit, so the router wrapper slices oversized result lists and flags the truncation back to the model.
Negative limits silently sliced from the tail (tools[:-5]) and very large limits would have happily echoed thousands of results. Treat anything <= 0 as "use the default of 10," and cap at 50 to bound the per-turn payload the router ships back to the model.
CLAUDE.md is explicit that test files shouldn't use inline imports. Move the lone galaxy.exceptions.MalformedId import to the top of the test file alongside the other galaxy.* imports.
| self.deps.get_agent.assert_called_once_with("history", self.deps) | ||
| assert "History summary" in response | ||
|
|
||
| def test_router_registers_fast_path_tools(self): |
There was a problem hiding this comment.
I really don't think these tests are useful - this first one seems to just be testing the literal implementation list of strings. All the rest of them are just sort of seem to be testing the mock data being patched?
I would prefer if you drop them. If you're unwilling to drop them would you trade for me adapting the test fixture static response for router testing - not work I want to do per se but I would prefer it to these unit tests.
There was a problem hiding this comment.
yeah, 100% fair -- they were mostly testing the mocks back at themselves. even the "real logic" ones tested 2-4 lines of arithmetic that don't prove the router actually picks the right tool. all gone in 9f5666d
Mypy on CI didn't like the bare ``router.agent.toolsets[0].tools[name]`` expression repeated across the per-tool tests: ``toolsets[0]`` is typed as ``AbstractToolset``, and ``tools`` is only an instance attribute on the concrete ``FunctionToolset`` subclass. Extract a single ``_fast_path_tool(router, name)`` helper using ``getattr`` (matching the pattern the registration test already uses) so the cast lives in one place and returns Any -- the per-tool tests then look up tools without tripping mypy.
John flagged in galaxyproject#22610 that the per-tool unit tests don't earn their keep -- the registration test just enumerates strings, and the per-tool tests largely confirm "the mock returned what I patched it with". A few of them tested small bits of wrapper logic (limit clamping, MalformedId handling, empty-filter mapping) but even those were 2-4 lines of arithmetic each, with no coverage of the thing that actually matters: whether the router decides to call these tools at the right times. Drop the whole TestAgentUnitMocked fast-path block plus the ``_fast_path_tool`` helper that just got added for mypy. John offered to extend the existing static-response test fixture for router testing as a follow-up, which gives us real router-decision coverage instead of mock-passthrough coverage.
| - Scientific analysis best practices | ||
| - Galaxy features and capabilities | ||
|
|
||
| ## Fast-path tools |
There was a problem hiding this comment.
The section seems relatively verbose. Maybe we should have some post-processing for markdown consumed by LLMs that compacts these (caveman style) ?
There was a problem hiding this comment.
That's a good idea, let's follow up after I get the evals framework in and will have benchmarks to iterate on
|
This PR was merged without a "kind/" label, please correct. |
The router was handing off every query to a specialist, even for trivial read-only operations. "What histories do I have?" round-tripped through
HistoryAgentfor a result that's justlist_user_histories(). This adds a small set of@agent.toolfunctions on the router itself for browsing/inventory queries, so the easy cases skip the handoff round-trip while everything else still gets specialist routing.Curation principle: stateless + read-only + single AgentOperationsManager call = router tool. Multi-step reasoning, structured output, or domain-specific prompting = specialist handoff.
The seven tools:
list_histories(limit=10)get_history_summary(history_id)list_workflows(filter="")search_workflows(query, limit=10)search_tools(query, limit=10)get_user_info()get_server_info()The router's prompt has a new "Fast-path tools" section listing each tool and a one-line cue for when to use it vs. hand off. Recommendation queries ("what tool should I use for my analysis?") still route to
tool_recommendation-- the new tools are for availability, not ranking. IWC catalog search is deliberately out of scope; that's a different data source / ranking story and deserves its own follow-up.Two safety details:
search_toolsclamps itslimitargument so zero/negative values fall back to the default of 10 and oversized values cap at 50 (the underlyingops.search_toolsdoesn't take a limit parameter, so the wrapper slices).get_history_summarycatchesMalformedIdand returns a structured error pointing the model back atlist_historiesrather than crashing the run.Hard cap of 6-10 tools to keep routing quality from drifting. We're at 7; if more fast-paths are wanted later (workflow-run browsing --
get_invocations/get_invocation_details/get_workflow_details-- is the obvious Tier 2), the right move is probably a "browsing skill" rather than growing the router further.Test plan
pytest test/unit/app/test_agents.py::TestAgentUnitMocked-- 35 passed (12 new: registration with cap guardrail, per-tool happy paths, MalformedId path, empty-filter path, oversized-result truncation, parametrized clamp tests for 0/-1/-5 limits, oversized-limit clamp at 50)