Hi Vespa team,
A lot of production RAG systems are now built on Vespa, and many of them struggle with subtle failure modes that are not really about ANN or ranking, but about how the whole RAG pipeline is wired.
I maintain an MIT-licensed project called WFGY RAG 16 Problem Map, which focuses specifically on RAG / LLM failure modes and diagnostics at the pipeline level.
Repo (MIT):
https://github.com/onestardao/WFGY
Main RAG failure map page:
https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md
In short, WFGY provides:
- A 16-problem RAG failure taxonomy (retrieval, prompt, structure, infra)
- A ready-to-use triage prompt that takes a failing run (Q, evidence, prompt, answer, logs, metrics) and classifies it
- For each problem, concrete structural fix suggestions (not just “use a better model”)
The map is already integrated or cited in multiple ecosystems, for example:
- RAGFlow (RAG failure modes checklist in their docs)
- LlamaIndex (RAG troubleshooting pages)
- ToolUniverse – Harvard MIMS Lab (wrapped as an incident-triage tool)
- Rankify – University of Innsbruck (reranking failure analysis)
- Multimodal RAG Survey – QCRI LLM Lab
- Curated resources such as Awesome LLM Apps and Awesome Data Science – academic
Proposal
Add a short “RAG failure modes and debugging” section to the Vespa documentation that adopts the WFGY 16-problem map as a vocabulary for Vespa-based RAG systems. For example:
-
A doc page that:
- Explains that many production issues come from pipeline design, not the vector engine itself.
- Maps typical Vespa-based RAG mistakes (index schema, document structure, query formulation, hybrid ranking configs, etc.) to the 16 WFGY failure types.
- Shows how to capture a failing run (query, retrieved docs, ranking features, answer, logs) and feed it into the triage prompt.
-
An example (blog or notebook) that:
- Builds a small Vespa RAG demo.
- Intentionally misconfigures it in several realistic ways.
- Uses the WFGY triage prompt to classify each failure and then shows how to fix the Vespa and pipeline configuration.
This would give Vespa users a structured checklist for debugging RAG behavior, while keeping Vespa itself focused on high-performance search and ranking.
If this sounds useful, I would be happy to draft a first version of the doc / example in the style you prefer and submit it as a PR.
Hi Vespa team,
A lot of production RAG systems are now built on Vespa, and many of them struggle with subtle failure modes that are not really about ANN or ranking, but about how the whole RAG pipeline is wired.
I maintain an MIT-licensed project called WFGY RAG 16 Problem Map, which focuses specifically on RAG / LLM failure modes and diagnostics at the pipeline level.
Repo (MIT):
https://github.com/onestardao/WFGY
Main RAG failure map page:
https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md
In short, WFGY provides:
The map is already integrated or cited in multiple ecosystems, for example:
Proposal
Add a short “RAG failure modes and debugging” section to the Vespa documentation that adopts the WFGY 16-problem map as a vocabulary for Vespa-based RAG systems. For example:
A doc page that:
An example (blog or notebook) that:
This would give Vespa users a structured checklist for debugging RAG behavior, while keeping Vespa itself focused on high-performance search and ranking.
If this sounds useful, I would be happy to draft a first version of the doc / example in the style you prefer and submit it as a PR.