Skip to content

Commit 4eb591f

Browse files
committed
Recommend IWC workflows alongside tools
The tool_recommendation agent now also surfaces IWC workflows when the ask is analysis-shaped rather than tool-shaped: "which tool sorts a BAM?" still returns a tool, but "RNA-seq from FASTQ to differential expression" returns a workflow. Adds search_iwc_workflows / get_iwc_workflow_details as pydantic-ai tools on the agent (going through the module-level iwc helpers so they share the cached manifest with the MCP wrappers), extends SimplifiedToolRecommendationResult with a recommended_workflows field, and renders a Recommended IWC Workflows section in the formatted output. Workflow recommendations produce a new WORKFLOW_IMPORT ActionSuggestion (parameters: trs_id, name) so the UI can wire that to the existing import_workflow_from_iwc operation -- one click from "this is the analysis you want" to "imported into your library." When the agent returns both a tool and a workflow (ambiguous ask), the tool keeps priority 1 and the workflow drops to priority 2. Prompt updated to teach the tool-vs-workflow heuristic and the new agent tools. Five unit tests cover suggestion creation (with/without trs_id, with/without a competing tool), the rendered workflow section, and the search helper against a mocked manifest.
1 parent 10e53ab commit 4eb591f

4 files changed

Lines changed: 288 additions & 26 deletions

File tree

Lines changed: 37 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,57 @@
1-
# Galaxy Tool Recommendation Agent
1+
# Galaxy Analysis Recommendation Agent
22

3-
You are a Galaxy Project expert specializing in tool discovery and recommendation.
3+
You are a Galaxy Project expert specializing in **analysis discovery**. Your job is to recommend the _right kind of thing_ for the user's request:
44

5-
Your goal is to help users find the right tools for their bioinformatics tasks by providing practical recommendations with clear reasoning.
5+
- A **tool** when the user asks for a single, atomic operation ("which tool sorts a BAM?", "I need to merge FASTQ files").
6+
- An **IWC workflow** when the user asks for a complete, multi-step analysis ("RNA-seq from FASTQ to differential expression", "variant calling pipeline", "ChIP-seq analysis").
7+
- **Both** when the user is unsure and could reasonably want either.
8+
9+
Default to a tool for narrow asks. Default to a workflow for end-to-end asks. When in doubt, return both and let the user choose.
610

711
## CRITICAL: Tool Availability
812

913
**This Galaxy server only has certain tools installed. You MUST verify tools exist before recommending them.**
1014

11-
1. **ALWAYS call `search_galaxy_tools` FIRST** before making any recommendations
12-
2. **ONLY recommend tools that appear in the search results** - if a tool doesn't show up in the search, it is NOT installed on this server
13-
3. If your search returns no results for a common tool (like BWA, HISAT2, etc.), that means it's not installed
15+
1. **For tool recommendations: ALWAYS call `search_galaxy_tools` FIRST** before naming a tool.
16+
2. **ONLY recommend tools that appear in the search results** -- if a tool doesn't show up in the search, it is NOT installed on this server.
17+
3. If your search returns no results for a common tool (like BWA, HISAT2, etc.), that means it's not installed.
1418
4. When a well-known tool is not installed, tell the user: "While [tool name] would typically be recommended for this task, it doesn't appear to be installed on this Galaxy server. You may want to contact your administrator to request its installation."
1519

20+
IWC workflows are a separate catalog -- they can be recommended even if not yet installed on this server, because the user can import them via `import_workflow_from_iwc`.
21+
1622
## Available Tools
1723

18-
- **`search_galaxy_tools(query)`** - Search for tools by keyword. Always start here.
19-
- **`get_galaxy_tool_details(tool_id)`** - Get detailed info (inputs, outputs, version) for a specific tool. Use after searching to provide better recommendations.
20-
- **`get_galaxy_tool_categories()`** - List available tool categories. Use when user asks "what kinds of tools are available?" or to understand the server's capabilities.
24+
- **`search_galaxy_tools(query)`** -- Search this server's installed tools by keyword. Always start here for atomic asks.
25+
- **`get_galaxy_tool_details(tool_id)`** -- Get inputs, outputs, version for a specific tool.
26+
- **`get_galaxy_tool_categories()`** -- List tool categories on this server.
27+
- **`search_iwc_workflows(query, limit=5)`** -- Search the IWC catalog for end-to-end workflows. Use for analysis-shaped requests.
28+
- **`get_iwc_workflow_details(trs_id)`** -- Get full details (steps, tools, readme) for one IWC workflow before recommending it.
2129

2230
## Recommendation Process
2331

24-
1. Understand the user's task and data types
25-
2. **Call `search_galaxy_tools` with relevant keywords** (e.g., "alignment", "mapping", "fastq")
26-
3. Optionally call `get_galaxy_tool_details` on promising candidates to get input/output format info
27-
4. Recommend tools from the search results, using their exact IDs
28-
5. If no suitable tools are found, be honest about the limitation
32+
1. Decide: is the user asking for a single step (tool) or a complete analysis (workflow)?
33+
2. For tools: call `search_galaxy_tools`, optionally `get_galaxy_tool_details`, populate `primary_tools` from the search results.
34+
3. For workflows: call `search_iwc_workflows`, optionally `get_iwc_workflow_details` for the top hit, populate `recommended_workflows` with the entries from the search (preserve `trsID`, `name`, `description`, `step_count`, `tools_used`, `match_score`).
35+
4. If the ask is ambiguous, populate both `primary_tools` and `recommended_workflows`.
36+
5. Always explain _why_ in the `reasoning` field, including the tool-vs-workflow choice.
37+
38+
## Workflow Recommendations
39+
40+
When recommending a workflow:
41+
42+
- Always preserve the exact `trsID` from `search_iwc_workflows` -- this is what the import action needs.
43+
- Mention the step count and the key tools the workflow uses, so the user can judge fit.
44+
- Prefer workflows whose `tools_used` overlap with what's installed on this server, but do not require it.
2945

3046
## Tool IDs
3147

32-
- Use ONLY the exact `id` field from search results
33-
- Never guess or fabricate tool IDs based on your training knowledge
34-
- If you know a tool exists in Galaxy generally but it's not in the search results, it's NOT available on this server
48+
- Use ONLY the exact `id` field from `search_galaxy_tools` results.
49+
- Never guess or fabricate tool IDs based on training data.
50+
- If a tool exists in Galaxy generally but is not in the search results, it's NOT available on this server.
3551

3652
## Best Practices
3753

38-
- Prioritize tools that are well-maintained and widely used
39-
- Consider the user's experience level
40-
- Explain why you're recommending specific tools
41-
- Mention important parameters or configuration options
42-
- Suggest workflows when multiple tools are needed
54+
- Match the scope of the recommendation to the scope of the ask.
55+
- Explain which kind of recommendation you chose and why.
56+
- Mention important parameters or configuration options for tools.
57+
- For workflows, mention what the user gets end-to-end (input format -> outputs).

lib/galaxy/agents/tools.py

Lines changed: 119 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
11
"""
22
Tool recommendation agent for suggesting appropriate Galaxy tools.
3+
4+
Despite the historical name, this agent recommends both atomic Galaxy tools
5+
and end-to-end IWC workflows. Atomic asks ("which tool sorts a BAM?") still
6+
get a tool back; analysis-shaped asks ("RNA-seq from FASTQ to differential
7+
expression") get a workflow back.
38
"""
49

10+
import asyncio
511
import logging
612
import re
713
from pathlib import Path
@@ -16,6 +22,7 @@
1622
from pydantic_ai import Agent
1723
from pydantic_ai.tools import RunContext
1824

25+
from galaxy.agents import iwc
1926
from galaxy.schema.agents import ConfidenceLevel
2027
from .base import (
2128
ActionSuggestion,
@@ -33,11 +40,25 @@
3340
log = logging.getLogger(__name__)
3441

3542

43+
def _iwc_search(query: str, limit: int) -> list[dict[str, Any]]:
44+
workflows = iwc.all_workflows(iwc.fetch_manifest())
45+
return iwc.search_workflows(workflows, query, limit=limit)
46+
47+
48+
def _iwc_details(trs_id: str) -> Optional[dict[str, Any]]:
49+
workflows = iwc.all_workflows(iwc.fetch_manifest())
50+
for wf in workflows:
51+
if wf.get("trsID") == trs_id:
52+
return iwc.enrich_workflow(wf, include_full_readme=False)
53+
return None
54+
55+
3656
class SimplifiedToolRecommendationResult(BaseModel):
3757
"""Tool recommendation result using simple types for local LLM compatibility."""
3858

39-
primary_tools: list[dict[str, Any]]
59+
primary_tools: list[dict[str, Any]] = []
4060
alternative_tools: list[dict[str, Any]] = []
61+
recommended_workflows: list[dict[str, Any]] = []
4162
workflow_suggestion: Optional[str] = None
4263
parameter_guidance: dict[str, Any] = {}
4364
confidence: ConfidenceLiteral
@@ -121,6 +142,49 @@ async def get_galaxy_tool_categories(ctx: RunContext[GalaxyAgentDependencies]) -
121142
return "No tool categories found"
122143
return "Available tool categories:\n" + "\n".join(f"- {cat}" for cat in categories)
123144

145+
@agent.tool
146+
async def search_iwc_workflows(ctx: RunContext[GalaxyAgentDependencies], query: str, limit: int = 5) -> str:
147+
"""Search the IWC (Intergalactic Workflows Commission) catalog for workflows.
148+
149+
Use this when the user is asking for a multi-step analysis (e.g. "run
150+
an RNA-seq pipeline", "variant calling from FASTQ") rather than a
151+
single tool. Returns ranked workflow entries with trsID, name,
152+
description, step count, and the tools each workflow uses.
153+
"""
154+
results = await self.search_iwc_workflows(query, limit=limit)
155+
if not results:
156+
return f"No IWC workflows found matching '{query}'"
157+
lines = [f"Found {len(results)} IWC workflows for '{query}':"]
158+
for wf in results:
159+
lines.append(
160+
f"- trsID: {wf['trsID']}, name: {wf['name']}, steps: {wf['step_count']}, "
161+
f"tools: {', '.join(wf.get('tools_used', [])[:6])}, "
162+
f"description: {(wf.get('description') or '')[:160]}"
163+
)
164+
return "\n".join(lines)
165+
166+
@agent.tool
167+
async def get_iwc_workflow_details(ctx: RunContext[GalaxyAgentDependencies], trs_id: str) -> str:
168+
"""Fetch the full enriched IWC entry for a single workflow.
169+
170+
Use after search_iwc_workflows to get the complete tool list,
171+
authors, categories, and readme summary before recommending.
172+
"""
173+
details = await self.get_iwc_workflow_details(trs_id)
174+
if details is None:
175+
return f"No IWC workflow found with trsID {trs_id}"
176+
lines = [
177+
f"Name: {details.get('name')}",
178+
f"trsID: {details.get('trsID')}",
179+
f"Steps: {details.get('step_count')}",
180+
f"Tags: {', '.join(details.get('tags', []))}",
181+
f"Categories: {', '.join(details.get('categories', []))}",
182+
f"Tools used: {', '.join(details.get('tools_used', []))}",
183+
f"Description: {details.get('description', '')}",
184+
f"Readme: {details.get('readme_summary', '')}",
185+
]
186+
return "\n".join(lines)
187+
124188
return agent
125189

126190
def get_system_prompt(self) -> str:
@@ -202,6 +266,22 @@ async def get_tool_details(self, tool_id: str) -> dict[str, Any]:
202266
log.warning(f"Error getting tool details for {tool_id}: {e}")
203267
return {"id": tool_id, "error": str(e)}
204268

269+
async def search_iwc_workflows(self, query: str, limit: int = 5) -> list[dict[str, Any]]:
270+
"""Search the IWC manifest. Network-bound on cache miss; runs in a thread."""
271+
try:
272+
return await asyncio.to_thread(_iwc_search, query, limit)
273+
except (OSError, ValueError) as e:
274+
log.warning(f"IWC search failed for query={query!r}: {e}")
275+
return []
276+
277+
async def get_iwc_workflow_details(self, trs_id: str) -> Optional[dict[str, Any]]:
278+
"""Fetch one workflow from the IWC manifest, fully enriched."""
279+
try:
280+
return await asyncio.to_thread(_iwc_details, trs_id)
281+
except (OSError, ValueError) as e:
282+
log.warning(f"IWC details lookup failed for {trs_id!r}: {e}")
283+
return None
284+
205285
async def get_tool_categories(self) -> list[str]:
206286
if not self.deps.toolbox:
207287
log.warning("Toolbox not available in agent dependencies")
@@ -300,8 +380,11 @@ async def process(self, query: str, context: Optional[dict[str, Any]] = None) ->
300380
suggestions=suggestions,
301381
agent_data={
302382
"num_tools_found": len(recommendation.primary_tools),
383+
"num_workflows_found": len(recommendation.recommended_workflows),
303384
"has_alternatives": bool(recommendation.alternative_tools),
304-
"has_workflow": bool(recommendation.workflow_suggestion),
385+
"has_workflow": bool(
386+
recommendation.recommended_workflows or recommendation.workflow_suggestion
387+
),
305388
"search_keywords": recommendation.search_keywords,
306389
},
307390
reasoning=recommendation.reasoning,
@@ -357,6 +440,23 @@ def _format_recommendation_response(self, recommendation: SimplifiedToolRecommen
357440
tool_name = tool.get("name", tool.get("tool_name", "Unknown"))
358441
parts.append(f"- **{tool_name}**: {tool.get('description', 'No description')}")
359442

443+
if recommendation.recommended_workflows:
444+
parts.append("\n**Recommended IWC Workflows:**")
445+
for i, wf in enumerate(recommendation.recommended_workflows[:3], 1):
446+
wf_name = wf.get("name", "Unknown workflow")
447+
trs_id = wf.get("trsID") or wf.get("trs_id") or ""
448+
parts.append(f"\n{i}. **{wf_name}**")
449+
if trs_id:
450+
parts.append(f" - trsID: `{trs_id}`")
451+
if wf.get("description"):
452+
parts.append(f" - {wf['description']}")
453+
if wf.get("step_count"):
454+
parts.append(f" - Steps: {wf['step_count']}")
455+
if wf.get("tools_used"):
456+
parts.append(f" - Tools: {', '.join(wf['tools_used'][:6])}")
457+
if wf.get("categories"):
458+
parts.append(f" - Categories: {', '.join(wf['categories'])}")
459+
360460
if recommendation.workflow_suggestion:
361461
parts.append(f"\n**Workflow Suggestion:**\n{recommendation.workflow_suggestion}")
362462

@@ -366,12 +466,13 @@ def _format_recommendation_response(self, recommendation: SimplifiedToolRecommen
366466
parts.append(f"- {param}: {value}")
367467

368468
if recommendation.reasoning:
369-
parts.append(f"\n**Why these tools?**\n{recommendation.reasoning}")
469+
parts.append(f"\n**Why this recommendation?**\n{recommendation.reasoning}")
370470

371471
return "\n".join(parts)
372472

373473
def _create_suggestions(self, recommendation: SimplifiedToolRecommendationResult) -> list[ActionSuggestion]:
374474
suggestions = []
475+
action_confidence = ConfidenceLevel(recommendation.confidence.lower())
375476

376477
if recommendation.primary_tools:
377478
top_tool = recommendation.primary_tools[0]
@@ -381,7 +482,6 @@ def _create_suggestions(self, recommendation: SimplifiedToolRecommendationResult
381482
log.debug(f"Extracted tool_name={tool_name}, tool_id={tool_id}")
382483

383484
if tool_id and self._verify_tool_exists(tool_id):
384-
action_confidence = ConfidenceLevel(recommendation.confidence.lower())
385485
suggestions.append(
386486
ActionSuggestion(
387487
action_type=ActionType.TOOL_RUN,
@@ -394,6 +494,21 @@ def _create_suggestions(self, recommendation: SimplifiedToolRecommendationResult
394494
elif tool_id:
395495
log.warning(f"Tool '{tool_id}' recommended but not found in toolbox - skipping suggestion")
396496

497+
if recommendation.recommended_workflows:
498+
top_wf = recommendation.recommended_workflows[0]
499+
trs_id = top_wf.get("trsID") or top_wf.get("trs_id")
500+
wf_name = top_wf.get("name", "IWC workflow")
501+
if trs_id:
502+
suggestions.append(
503+
ActionSuggestion(
504+
action_type=ActionType.WORKFLOW_IMPORT,
505+
description=f"Import {wf_name} from IWC",
506+
parameters={"trs_id": trs_id, "name": wf_name},
507+
confidence=action_confidence,
508+
priority=1 if not recommendation.primary_tools else 2,
509+
)
510+
)
511+
397512
return suggestions
398513

399514
def _verify_tool_exists(self, tool_id: str) -> bool:

lib/galaxy/schema/agents.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ class ActionType(str, Enum):
3131
CONTACT_SUPPORT = "contact_support"
3232
VIEW_EXTERNAL = "view_external"
3333
DOCUMENTATION = "documentation"
34+
WORKFLOW_IMPORT = "workflow_import"
3435

3536

3637
class ActionSuggestion(BaseModel):
@@ -54,6 +55,9 @@ def validate_parameters(self) -> "ActionSuggestion":
5455
elif self.action_type == ActionType.VIEW_EXTERNAL:
5556
if not self.parameters.get("url"):
5657
raise ValueError("VIEW_EXTERNAL requires 'url' parameter")
58+
elif self.action_type == ActionType.WORKFLOW_IMPORT:
59+
if not self.parameters.get("trs_id"):
60+
raise ValueError("WORKFLOW_IMPORT requires 'trs_id' parameter")
5761
return self
5862

5963

0 commit comments

Comments
 (0)