Skip to content

Proposal: add WFGY 16-problem RAG failure map as a debugging reference for Vespa-based RAG #36063

@onestardao

Description

@onestardao

Hi Vespa team,

A lot of production RAG systems are now built on Vespa, and many of them struggle with subtle failure modes that are not really about ANN or ranking, but about how the whole RAG pipeline is wired.

I maintain an MIT-licensed project called WFGY RAG 16 Problem Map, which focuses specifically on RAG / LLM failure modes and diagnostics at the pipeline level.

Repo (MIT):
https://github.com/onestardao/WFGY

Main RAG failure map page:
https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md

In short, WFGY provides:

  • A 16-problem RAG failure taxonomy (retrieval, prompt, structure, infra)
  • A ready-to-use triage prompt that takes a failing run (Q, evidence, prompt, answer, logs, metrics) and classifies it
  • For each problem, concrete structural fix suggestions (not just “use a better model”)

The map is already integrated or cited in multiple ecosystems, for example:

  • RAGFlow (RAG failure modes checklist in their docs)
  • LlamaIndex (RAG troubleshooting pages)
  • ToolUniverse – Harvard MIMS Lab (wrapped as an incident-triage tool)
  • Rankify – University of Innsbruck (reranking failure analysis)
  • Multimodal RAG Survey – QCRI LLM Lab
  • Curated resources such as Awesome LLM Apps and Awesome Data Science – academic

Proposal

Add a short “RAG failure modes and debugging” section to the Vespa documentation that adopts the WFGY 16-problem map as a vocabulary for Vespa-based RAG systems. For example:

  1. A doc page that:

    • Explains that many production issues come from pipeline design, not the vector engine itself.
    • Maps typical Vespa-based RAG mistakes (index schema, document structure, query formulation, hybrid ranking configs, etc.) to the 16 WFGY failure types.
    • Shows how to capture a failing run (query, retrieved docs, ranking features, answer, logs) and feed it into the triage prompt.
  2. An example (blog or notebook) that:

    • Builds a small Vespa RAG demo.
    • Intentionally misconfigures it in several realistic ways.
    • Uses the WFGY triage prompt to classify each failure and then shows how to fix the Vespa and pipeline configuration.

This would give Vespa users a structured checklist for debugging RAG behavior, while keeping Vespa itself focused on high-performance search and ranking.

If this sounds useful, I would be happy to draft a first version of the doc / example in the style you prefer and submit it as a PR.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions