Proposal: add WFGY 16-problem RAG failure map as a debugging reference for Vespa-based RAG

Hi Vespa team,

A lot of production RAG systems are now built on Vespa, and many of them struggle with subtle failure modes that are not really about ANN or ranking, but about how the whole RAG pipeline is wired.

I maintain an MIT-licensed project called **WFGY RAG 16 Problem Map**, which focuses specifically on RAG / LLM failure modes and diagnostics at the pipeline level.

Repo (MIT):
https://github.com/onestardao/WFGY

Main RAG failure map page:
https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md

In short, WFGY provides:

- A **16-problem RAG failure taxonomy** (retrieval, prompt, structure, infra)
- A ready-to-use **triage prompt** that takes a failing run (Q, evidence, prompt, answer, logs, metrics) and classifies it
- For each problem, concrete **structural fix suggestions** (not just “use a better model”)

The map is already integrated or cited in multiple ecosystems, for example:

- **RAGFlow** (RAG failure modes checklist in their docs)
- **LlamaIndex** (RAG troubleshooting pages)
- **ToolUniverse – Harvard MIMS Lab** (wrapped as an incident-triage tool)
- **Rankify – University of Innsbruck** (reranking failure analysis)
- **Multimodal RAG Survey – QCRI LLM Lab**
- Curated resources such as **Awesome LLM Apps** and **Awesome Data Science – academic**

### Proposal

Add a short **“RAG failure modes and debugging”** section to the Vespa documentation that adopts the WFGY 16-problem map as a vocabulary for Vespa-based RAG systems. For example:

1. A doc page that:
   - Explains that many production issues come from pipeline design, not the vector engine itself.
   - Maps typical Vespa-based RAG mistakes (index schema, document structure, query formulation, hybrid ranking configs, etc.) to the 16 WFGY failure types.
   - Shows how to capture a failing run (query, retrieved docs, ranking features, answer, logs) and feed it into the triage prompt.

2. An example (blog or notebook) that:
   - Builds a small Vespa RAG demo.
   - Intentionally misconfigures it in several realistic ways.
   - Uses the WFGY triage prompt to classify each failure and then shows how to fix the Vespa and pipeline configuration.

This would give Vespa users a structured checklist for debugging RAG behavior, while keeping Vespa itself focused on high-performance search and ranking.

If this sounds useful, I would be happy to draft a first version of the doc / example in the style you prefer and submit it as a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: add WFGY 16-problem RAG failure map as a debugging reference for Vespa-based RAG #36063

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: add WFGY 16-problem RAG failure map as a debugging reference for Vespa-based RAG #36063

Description

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions