A personal, LLM-maintained knowledge wiki that compiles raw sources into structured, interlinked markdown — no RAG, no vector databases, just clean wiki pages built and maintained by Claude.
Inspired by Karpathy's LLM Knowledge Bases pattern.
raw/ wiki/
articles/ entities/
papers/ Ingest concepts/ Query
videos/ ---------> comparisons/ ----------> Answers
repos/ Compile summaries/ with citations
books/ synthesis/
... index.md
log.md
Lint -----> Health report
Evolve ---> Gap analysis
Human curates sources. LLM handles everything else — extraction, compilation, linking, querying, maintenance, and gap analysis.
| Layer | Path | Owner | Purpose |
|---|---|---|---|
| Raw | raw/ |
Human | Immutable source documents (articles, papers, videos, repos, etc.) |
| Wiki | wiki/ |
LLM | Generated and maintained markdown pages with YAML frontmatter |
| Research | research/ |
Human | Analysis, project ideas, meta-research |
| Operation | What it does |
|---|---|
| Ingest | Read a raw source, extract structured data via LLM, generate summary + entity + concept pages, update indexes |
| Compile | Scan all raw sources, detect changes via content hashes, batch-ingest new/modified sources |
| Query | Search wiki pages using BM25 ranking, synthesize answers with inline citations using Claude |
| Lint | Health checks: dead links, orphan pages, staleness, frontmatter validation, source coverage, wikilink cycles |
| Evolve | Gap analysis: under-linked concepts, missing page types, connection opportunities, new page suggestions |
# Clone and set up
git clone https://github.com/Asun28/LLM-Knowledge-Base.git
cd LLM-Knowledge-Base
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
source .venv/bin/activate # Unix
# Install dependencies
pip install -r requirements.txt
pip install -e .
# Configure API key (optional — not needed with Claude Code Max)
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY (only for kb ingest / kb query CLI commands)
# Verify installation
kb --versionDrop a markdown file into raw/ and ingest it:
# Web article to markdown (pick one)
trafilatura -u https://example.com/article > raw/articles/article-name.md
crwl https://example.com/article -o markdown > raw/articles/article-name.md
# Ingest into the wiki
kb ingest raw/articles/article-name.md --type articleThe ingest pipeline:
- Reads the raw source and checks for duplicate content (hash-based dedup against manifest)
- Calls Claude (Sonnet) to extract title, key claims, entities, concepts
- Creates
wiki/summaries/article-name.md - Creates/updates entity pages in
wiki/entities/(with context from extraction data) - Creates/updates concept pages in
wiki/concepts/(with context from extraction data) - Injects retroactive wikilinks into existing pages that mention newly created page titles
- Updates
wiki/index.md,wiki/_sources.md,wiki/log.md - Returns affected pages (backlinks + shared sources) for cascade review
- Detects and warns on slug collisions (e.g., "GPT 4" and "GPT-4" both →
gpt-4) - Short sources (<1000 chars) can defer entity/concept creation to prevent stub proliferation
Source type is auto-detected from the raw/ subdirectory, or specify with --type:
article, paper, repo, video, podcast, book, dataset, conversation
Process all new and changed sources at once:
kb compile # Incremental (only new/changed sources)
kb compile --full # Full recompileUses SHA-256 content hashes stored in .data/hashes.json to detect changes. Manifest is saved after each successful source ingest (crash-safe).
Ask questions and get answers with citations:
kb query "What is compile-not-retrieve?"Searches wiki pages using BM25 ranking blended with PageRank (well-linked pages rank higher), builds context (truncated to 80K chars), and calls Claude (Opus) to synthesize an answer with [source: page_id] citations.
Run health checks:
kb lint # Report issues
kb lint --fix # Auto-fix dead links (replaces broken [[links]] with plain text)Checks for:
- Dead wikilinks (broken
[[references]]) — auto-fixable with--fix - Orphan pages (no incoming links)
- Stale pages (not updated in 90+ days)
- Stub pages (body content under 100 chars)
- Invalid frontmatter (missing required fields)
- Uncovered raw sources (not referenced by any wiki page)
- Wikilink cycles (A → B → C → A)
- Low-trust pages flagged by query feedback
Analyze gaps and get improvement suggestions:
kb evolveReports:
- Coverage by page type (entities, concepts, comparisons, summaries, synthesis)
- Orphan concepts with no backlinks
- Unlinked pages that share terms (connection opportunities)
- Dead links that suggest new pages to create
- Disconnected graph components
- Coverage gaps from query feedback
The knowledge base ships with a built-in MCP server with 25 tools. Claude Code is the default LLM — no API key needed. kb_query and kb_ingest use Claude Code for all intelligence; add use_api=true to call the Anthropic API instead.
# Start the MCP server standalone
kb mcp
# Or run as a Python module
python -m kb.mcp_serverSetup: Add to your .mcp.json (already configured in this repo):
{
"mcpServers": {
"kb": {
"command": ".venv/Scripts/python.exe",
"args": ["-m", "kb.mcp_server"]
}
}
}After restarting Claude Code, you get 25 tools:
| Tool | Description |
|---|---|
kb_query |
Query the wiki. Returns context for Claude Code to answer. Add use_api=true for API synthesis. |
kb_ingest |
Ingest a source file. Pass extraction_json with your extraction; omit it to get the prompt first. Add use_api=true for API extraction. |
kb_ingest_content |
One-shot: provide raw content + extraction JSON; saves to raw/ and creates all wiki pages. |
kb_save_source |
Save content to raw/ without ingesting (ingest later with kb_ingest). Errors if file already exists unless overwrite=true. |
kb_compile_scan |
List new/changed sources that need kb_ingest. |
| Tool | Description |
|---|---|
kb_search |
Keyword search across wiki pages (BM25 + PageRank blending) |
kb_read_page |
Read a specific wiki page by ID |
kb_list_pages |
List all pages, optionally filtered by type |
kb_list_sources |
List all raw source files |
kb_stats |
Page counts, graph metrics, coverage info |
kb_lint |
Health checks (dead links, orphans, staleness, stubs, low-trust pages) |
kb_evolve |
Gap analysis and connection suggestions |
kb_detect_drift |
Find wiki pages stale due to raw source changes |
kb_compile |
Compile wiki from raw sources (incremental or full) |
kb_graph_viz |
Export knowledge graph as Mermaid diagram (auto-prunes large graphs) |
kb_verdict_trends |
Show weekly quality trends from verdict history |
| Tool | Description |
|---|---|
kb_review_page |
Page + sources + checklist for quality review |
kb_refine_page |
Update page preserving frontmatter, with audit trail |
kb_lint_deep |
Source fidelity check (page vs raw source side-by-side) |
kb_lint_consistency |
Cross-page contradiction check |
kb_query_feedback |
Record query success/failure for trust scoring |
kb_reliability_map |
Page trust scores from feedback history |
kb_affected_pages |
Pages affected by a change (backlinks + shared sources) |
kb_save_lint_verdict |
Record lint/review verdict for persistent audit trail |
kb_create_page |
Create comparison/synthesis/any wiki page directly |
Workflows:
# Query (Claude Code answers directly)
kb_query("What is RAG?") -> returns wiki context -> Claude Code synthesizes answer
# Ingest a file in raw/
kb_ingest("raw/articles/rag.md") -> returns extraction prompt
kb_ingest("raw/articles/rag.md", extraction_json=...) -> creates wiki pages
# Ingest a URL (one-shot)
1. Fetch content from URL
2. Extract title, entities, concepts
3. kb_ingest_content(content, "article-name", "article", extraction_json)
# Batch compile
kb_compile_scan() -> lists sources -> kb_ingest each with extraction_json
# Quality review (Phase 2)
kb_review_page("concepts/rag") -> review context -> kb_refine_page if issues
kb_lint_deep("concepts/rag") -> fidelity check -> fix unsourced claims
kb_query_feedback(question, "useful", "concepts/rag") -> builds trust scores
# Create comparison/synthesis pages
kb_create_page("comparisons/rag-vs-finetuning", "RAG vs Fine-tuning", content)
# Record lint verdicts for audit trail
kb_save_lint_verdict("concepts/rag", "fidelity", "pass", notes="All claims traced")
# Visualize and monitor
kb_graph_viz(max_nodes=30) -> Mermaid diagram of knowledge graph
kb_verdict_trends() -> weekly quality improvement dashboard
kb_detect_drift() -> find wiki pages stale due to source changes
Example prompts in Claude Code:
"Search my knowledge base for RAG" ->
kb_search"What does my wiki say about transformers?" ->kb_query"Ingest this article into my wiki" ->kb_ingestorkb_ingest_content"Show me wiki health" ->kb_lint"What sources need processing?" ->kb_compile_scan"Review this wiki page for accuracy" ->kb_review_page"Show me the knowledge graph" ->kb_graph_viz"How is wiki quality trending?" ->kb_verdict_trends
| Type | Template | Capture Method |
|---|---|---|
| Article | templates/article.yaml |
trafilatura -u URL or crwl URL -o markdown |
| Paper | templates/paper.yaml |
markitdown file.pdf or docling file.pdf |
| Video | templates/video.yaml |
yt-dlp --write-auto-sub --skip-download URL |
| Repo | templates/repo.yaml |
Manual markdown summary |
| Podcast | templates/podcast.yaml |
Transcript markdown |
| Book | templates/book.yaml |
Manual notes or markitdown |
| Dataset | templates/dataset.yaml |
Schema documentation |
| Conversation | templates/conversation.yaml |
Chat/interview transcript |
| Comparison | templates/comparison.yaml |
Created via kb_create_page (multi-source) |
| Synthesis | templates/synthesis.yaml |
Created via kb_create_page (cross-source) |
Each template defines extraction fields and wiki output mappings. The LLM uses these to consistently extract structured data from any source type. Source types are validated against this whitelist before processing.
Every wiki page uses YAML frontmatter for metadata:
---
title: Retrieval Augmented Generation
source:
- raw/articles/rag-overview.md
created: 2026-04-06
updated: 2026-04-06
type: concept
confidence: stated
---
# Retrieval Augmented Generation
RAG combines retrieval with generation...
## Key Claims
- Claim 1
- Claim 2
## Entities Mentioned
- [[entities/openai|OpenAI]]
## Concepts
- [[concepts/vector-search|Vector Search]]Page types: entity, concept, comparison, summary, synthesis
Confidence levels: stated (directly from source), inferred (derived from multiple sources), speculative (LLM reasoning)
The knowledge base includes a multi-layer quality system:
Trust scoring — Bayesian page trust based on query feedback. "Wrong" answers penalized 2x vs "incomplete". Pages below the trust threshold are automatically flagged during lint.
Review workflow — kb_review_page pairs wiki pages with their raw sources and a 6-item checklist (source fidelity, entity accuracy, wikilink validity, confidence match, no hallucination, title accuracy). Claude Code or a wiki-reviewer sub-agent evaluates and produces structured JSON reviews. Issues are fixed via kb_refine_page (max 2 rounds). Review history tracks content length and status for auditability.
Semantic lint — Deep fidelity checks (kb_lint_deep) compare page claims against source content. Consistency checks (kb_lint_consistency) group related pages by shared sources, wikilinks, and significant term overlap (with frontmatter stripping and common-word filtering) to detect contradictions.
Affected page tracking — After updating a page, kb_affected_pages identifies backlinks and shared-source pages that may need review. The ingest pipeline now returns affected_pages automatically for cascade review.
Verdict trends — kb_verdict_trends analyzes verdict history to show weekly pass/fail/warning rates and whether quality is improving, stable, or declining.
Graph visualization — kb_graph_viz exports the knowledge graph as a Mermaid diagram, auto-pruning to the most-connected nodes for large graphs. Compatible with Obsidian, GitHub, and VS Code.
LLM resilience — All API calls retry up to 3 times with exponential backoff on rate limits, overload, connection errors, and timeouts. Non-retryable errors (401/403) raise immediately with descriptive LLMError.
The system uses three Claude model tiers to balance cost and quality. Override via environment variables:
| Tier | Model | Env Override | Used For |
|---|---|---|---|
scan |
Claude Haiku 4.5 | CLAUDE_SCAN_MODEL |
Index reads, link checks, file diffs |
write |
Claude Sonnet 4.6 | CLAUDE_WRITE_MODEL |
Article writing, extraction, summaries |
orchestrate |
Claude Opus 4.6 | CLAUDE_ORCHESTRATE_MODEL |
Query answering, orchestration, verification |
LLM-Knowledge-Base/
raw/ # Immutable source documents
articles/papers/repos/videos/podcasts/books/datasets/conversations/assets/
wiki/ # LLM-generated wiki pages
entities/concepts/comparisons/summaries/synthesis/
index.md # Master catalog
_sources.md # Source traceability
_categories.md # Category tree
log.md # Activity log
contradictions.md # Conflict tracker
research/ # Human-authored analysis
templates/ # 10 YAML extraction schemas
src/kb/ # Python package (~4,100 lines)
cli.py # Click CLI (6 commands)
config.py # Paths, model tiers, tuning constants
mcp_server.py # MCP entry point (thin wrapper)
mcp/ # FastMCP server package (25 tools: core, browse, health, quality)
models/ # WikiPage, RawSource, frontmatter validation
ingest/ # Pipeline (dedup, cascade, tiering) + extractors (template-driven)
compile/ # Hash-based incremental compiler (crash-safe) + linker (wikilink injection)
query/ # BM25 + PageRank blended search + context truncation + citations
lint/ # 8 mechanical checks + semantic context builders + verdict trends
evolve/ # Coverage analysis + connection discovery
graph/ # NetworkX graph builder + stats + Mermaid export
feedback/ # Bayesian trust scoring + reliability analysis
review/ # Page-source pairing + frontmatter-preserving refiner
utils/ # Shared: hashing, markdown, LLM (retry/timeout), text, wiki_log, pages, io
tests/ # 564 tests across 38 test files (~35s)
# Activate venv (always use project .venv)
.venv\Scripts\activate # Windows
source .venv/bin/activate # Unix
# Install
pip install -r requirements.txt && pip install -e .
# Run tests
python -m pytest
# Lint & format
ruff check src/ tests/
ruff check src/ tests/ --fix
ruff format src/ tests/Python 3.12+. Ruff for linting (line length 100, rules E/F/I/W/UP).
- Phase 1 (complete, v0.3.0): 5 operations + graph + CLI + MCP server (12 tools), hash-based incremental compile, model tiering
- Phase 2 (complete, v0.4.0): Quality system — feedback loop with Bayesian trust scoring, Actor-Critic review workflow, semantic lint (fidelity + consistency), page refiner with audit trail. 7 new MCP tools, wiki-reviewer agent
- Phase 2.1 (complete, v0.5.0): Robustness — weighted trust formula, path canonicalization, YAML injection protection, extraction validation, config-driven tuning
- Phase 2.2 (complete, v0.6.0): DRY refactor — shared utilities eliminated all code duplication, source type validation, source field normalization, consolidated test fixtures. 180 tests
- Phase 2.3 (complete, v0.7.0): S+++ upgrade — MCP server split, graph PageRank/centrality, entity enrichment, persistent lint verdicts, case-insensitive wikilinks, template hash detection, comparison/synthesis templates, 2 new tools. 21 MCP tools, 234 tests
- Phase 3.0 (complete, v0.8.0): BM25 search engine — replaced bag-of-words with BM25 ranking. 252 tests
- Phase 3.1 (complete, v0.9.0): Hardening — path traversal, citation regex, MCP error handling, SDK fixes. 289 tests
- Phase 3.2 (complete, v0.9.1): Comprehensive audit — 93 new tests, MCP coverage 41%→95%. 382 tests
- Phase 3.3 (complete, v0.9.2): 15 bug fixes, input validation hardening. 414 tests
- Phase 3.4 (complete, v0.9.3):
kb_compile+kb lint --fix. 431 tests - Phase 3.5–3.8 (complete, v0.9.4–v0.9.7): Stub detection, drift detection, tier audits, observability. 490 tests
- Phase 3.9a (complete, v0.9.8): Structured outputs (
call_llm_json), shared retry, atomic writes, extraction schema builder. 518 tests - Phase 3.9 (complete, v0.9.9): Content growth infrastructure — env-configurable model tiers, PageRank-blended search, hash-based duplicate detection, verdict trend dashboard (
kb_verdict_trends), Mermaid graph export (kb_graph_viz), retroactive wikilink injection (auto-triggered on ingest), content-length ingest tiering, cascade update detection (surfaced inkb_ingestMCP output). 3 new MCP tools (26 total), 46 new tests (564 total) - Phase 4 (200+ pages): DSPy Teacher-Student optimization, RAGAS evaluation, Reweave, Pydantic extraction validation, arxiv MCP integration, semantic dependency tracking, URL-based smart routing. Research in
research/agent-architecture-research.md
This project stands on the shoulders of these ideas, tools, and people:
| Project | Author | Contribution |
|---|---|---|
| LLM Knowledge Bases | Andrej Karpathy | The original "compile, don't retrieve" pattern that started it all |
| Project | What we learned |
|---|---|
| DocMason | Architecture diagram style, pre-publish validation gate, iterative retrieve/trace loop, answer trace enforcement, structured knowledge index |
| llm-wiki-compiler | Two-phase compile pipeline (extract across all sources before writing) |
| Graphify | Leiden community detection, health report generation, surprise scoring, per-claim confidence markers, deterministic pre-extraction |
| rvk7895/llm-knowledge-bases | Reference Claude Code plugin implementing compile/query/lint cycle for Obsidian |
| Project | What we learned |
|---|---|
| Ars Contexta | Individualized knowledge system generation through conversation |
| Remember.md | Session knowledge extraction, YAML frontmatter + wikilinks for Obsidian compatibility |
| kepano/obsidian-skills | Agent skills for working with Obsidian vaults |
| lean-ctx | Hybrid context optimization techniques for reducing token consumption |
| DSPy optimization patterns | Teacher-Student optimization for prompt tuning |
| Project | What we learned |
|---|---|
| awesome-llm-knowledge-bases | Curated tool list for LLM-powered personal knowledge bases |
| qmd | Markdown-native querying patterns |
| Quartz | Static site generation from wiki content |
| Microsoft GraphRAG | Graph-based retrieval augmented generation patterns |
