This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
LLM Knowledge Base — a personal, LLM-maintained knowledge wiki inspired by Karpathy's LLM Knowledge Bases pattern. The system compiles raw sources into structured, interlinked markdown rather than using RAG/vector retrieval.
Philosophy: Human curates sources and approves decisions; LLM handles all compilation, querying, maintenance, and coding.
Phase 3.93 complete (v0.9.12). 613 tests, 25 MCP tools, 12 modules. Phase 1 core (5 operations + graph + CLI) plus Phase 2 quality system (feedback, review, semantic lint) plus v0.5.0 fixes plus v0.6.0 DRY refactor plus v0.7.0 S+++ upgrade (MCP server split into package, graph PageRank/centrality, entity enrichment on multi-source ingestion, persistent lint verdicts, case-insensitive wikilinks, trust threshold fix, template hash change detection, comparison/synthesis templates, 2 new MCP tools). Plus v0.8.0 BM25 search engine (replaces bag-of-words keyword matching with BM25 ranking — term frequency saturation, inverse document frequency, document length normalization). Plus v0.9.0 hardening release (path traversal protection, citation regex fix, slug collision tracking, JSON fence hardening, MCP error handling, max_results bounds, MCP Phase 2 instructions).
Phase 1 modules: kb.config, kb.models, kb.utils, kb.ingest, kb.compile, kb.query, kb.lint, kb.evolve, kb.graph, kb.mcp_server, CLI (6 commands: ingest, compile, query, lint, evolve, mcp). MCP server split into kb.mcp package (app, core, browse, health, quality).
Phase 2 modules:
kb.feedback— query feedback store (weighted Bayesian trust scoring — "wrong" penalized 2x vs "incomplete"), reliability analysis (flagged pages, coverage gaps)kb.review— page-source pairing, review context/checklist builder, page refiner (frontmatter-preserving updates with audit trail)kb.lint.semantic— fidelity, consistency, completeness context builders for LLM-powered evaluation.claude/agents/wiki-reviewer.md— Actor-Critic reviewer agent definitionkb.lint.verdicts— persistent lint/review verdict storage (pass/fail/warning with audit trail)
Phase 3+ (200+ pages): DSPy Teacher-Student optimization, RAGAS evaluation, Reweave. Research in research/agent-architecture-research.md.
# Activate venv (ALWAYS use project .venv, never global Python)
.venv\Scripts\activate # Windows
source .venv/bin/activate # Unix
# Environment setup
cp .env.example .env # then fill in:
# ANTHROPIC_API_KEY (required), FIRECRAWL_API_KEY (optional), OPENAI_API_KEY (optional)
# Install deps + editable package (enables `kb` CLI command)
# NOTE: `pip install -e .` must be run before `kb` CLI or `from kb import ...` works outside pytest
pip install -r requirements.txt && pip install -e .
# Run all tests
python -m pytest
# Run single test
python -m pytest tests/test_models.py::test_extract_wikilinks -v
# Lint, auto-fix, & format
ruff check src/ tests/
ruff check src/ tests/ --fix
ruff format src/ tests/
# CLI (after pip install -e .)
kb --version
kb ingest raw/articles/example.md --type article
kb compile [--full]
kb query "What is compile-not-retrieve?"
kb lint [--fix]
kb evolve
kb mcp # Start MCP server for Claude Code
# Playwright browser (needed by crawl4ai)
python -m playwright install chromiumRuff config: line length 100, Python 3.12+, rules E/F/I/W/UP (see pyproject.toml).
raw/— Immutable source documents. The LLM reads but never modifies files here. Subdirs:articles/,papers/,repos/,videos/,podcasts/,books/,datasets/,conversations/,assets/. Use Obsidian Web Clipper for web→markdown; download images toraw/assets/.wiki/— LLM-generated and LLM-maintained markdown. Page subdirs:entities/,concepts/,comparisons/,summaries/,synthesis/. All pages use YAML frontmatter (see template below).research/— Human-authored analysis, project ideas, and meta-research about the knowledge base approach.
- Ingest: User adds source to
raw/. LLM reads it, writes summary towiki/summaries/, updateswiki/index.md, modifies relevant entity/concept pages, appends towiki/log.md. - Compile: LLM builds/updates interlinked wiki pages. Uses hash-based incremental detection — only processes new/changed sources. Manifest saved after each source (crash-safe). Proposes diffs, not full rewrites.
- Query: User asks questions. LLM searches wiki using BM25 ranking (term frequency saturation, IDF weighting, length normalization), synthesizes answers with inline citations to wiki pages and raw sources. Title tokens boosted by
SEARCH_TITLE_WEIGHT(repeated N times before indexing). Context truncated to 80K chars. Good answers become new wiki pages. - Lint: Health check — orphan pages, dead links, staleness, wikilink cycles, coverage gaps, frontmatter validation. Produces a report, does not silently fix.
- Evolve: Gap analysis — what topics lack coverage, what concepts should be linked, what raw sources would fill gaps. User picks suggestions, LLM executes.
Entry point: kb = "kb.cli:cli" in pyproject.toml. Version in src/kb/__init__.py. MCP server entry: python -m kb.mcp_server or kb mcp. MCP package: kb.mcp (split from monolithic mcp_server.py).
All paths, model tiers, page types, and confidence levels are defined in kb.config — import from there, never hardcode. PROJECT_ROOT resolves from config.py's location, so it works regardless of working directory.
Key APIs (non-obvious behavior — for full signatures, read the source):
call_llm(prompt, tier="write")/call_llm_json(prompt, tier, schema)— Inkb.utils.llm. Tiers:scan(Haiku),write(Sonnet),orchestrate(Opus).call_llm_jsonuses forced tool_use for guaranteed structured JSON — no fence-stripping needed. RaisesLLMErroron failure,ValueErroron invalid tier.ingest_source(path, source_type=None, extraction=None, *, defer_small=False)— Inkb.ingest.pipeline. Returns dict withpages_created,pages_updated,pages_skipped,affected_pages,wikilinks_injected, andduplicate: Trueon hash match. Passextractiondict to skip LLM call (Claude Code mode).load_all_pages(wiki_dir=None)— Inkb.utils.pages. Returns list of dicts. Gotcha:raw_contentfield is pre-lowercased (for BM25), not verbatim.slugify(text)/yaml_escape(value)— Inkb.utils.text. Single source of truth — imported everywhere, never duplicate.build_extraction_schema(template)— Inkb.ingest.extractors. Builds JSON Schema from template fields.load_template()is LRU-cached.refine_page(page_id, content, notes)— Inkb.review.refiner. Uses regex-based frontmatter split (not YAML parser), rejects content starting with---to prevent corruption.
| File | Purpose |
|---|---|
wiki/index.md |
Master catalog, one-line per page, under 500 lines |
wiki/_sources.md |
Raw source → wiki page mapping (traceability for lint) |
wiki/_categories.md |
Auto-maintained category tree |
wiki/log.md |
Append-only chronological record of all operations |
wiki/contradictions.md |
Explicit tracker for conflicting claims across sources |
Local-only directories (git-ignored): .claude/, .tools/, .memory/, .data/, openspec/, .mcp.json. The others/ directory holds misc files like screenshots.
| Tier | Model ID | Env Override | Use |
|---|---|---|---|
scan |
claude-haiku-4-5-20251001 |
CLAUDE_SCAN_MODEL |
Index reads, link checks, file diffs — mechanical, low-reasoning |
write |
claude-sonnet-4-6 |
CLAUDE_WRITE_MODEL |
Article writing, extraction, summaries — quality at reasonable cost |
orchestrate |
claude-opus-4-6 |
CLAUDE_ORCHESTRATE_MODEL |
Orchestration, query answering, verification — highest reasoning |
10 YAML schemas (article, paper, video, repo, podcast, book, dataset, conversation, comparison, synthesis). Each defines extract: fields and wiki_outputs: mapping (documentation-only, not enforced in code). All follow the same output pattern: summaries → entities → concepts. Used by the ingest pipeline to drive consistent extraction via the extract: fields.
Pytest with testpaths = ["tests"], pythonpath = ["src"]. Fixtures in conftest.py:
project_root/raw_dir/wiki_dir— point to real project directories (read-only use)tmp_wiki(tmp_path)— isolated wiki directory with all 5 subdirectories for tests that write wiki pagestmp_project(tmp_path)— full project directory with wiki/ (5 subdirs + log.md) and raw/ (4 subdirs) for Phase 2 testscreate_wiki_page— factory fixture for creating wiki pages with proper frontmatter (parameterized: page_id, title, content, source_ref, page_type, confidence, updated, wiki_dir)create_raw_source— factory fixture for creating raw source files
583 tests across 38 test files — run python -m pytest -v to list all. New tests per phase go in versioned files (e.g., test_v099_phase39.py). Use the tmp_wiki/tmp_project fixtures for any test that writes files — never write to the real wiki/ or raw/ in tests.
- MCP tools: Return
"Error: ..."strings on failure — never raise exceptions to the MCP client. Page IDs validated via_validate_page_id()(rejects path traversal, verifies path within WIKI_DIR). - LLM calls: Shared retry logic in
_make_api_call()— 3 attempts with exponential backoff on rate limits/overload/timeout.LLMErroron exhaustion.call_llm_json()uses forced tool_use for guaranteed structured JSON (no fence-stripping needed). - Page loading loops: Catch specific exceptions (not broad
Exception) and skip with warning — never abort a full scan for one bad file. - JSON stores: All use
atomic_json_write()(temp file + rename). Capped at 10,000 entries each (feedback, verdicts). - Ingest limits:
MAX_ENTITIES_PER_INGEST=50,MAX_CONCEPTS_PER_INGEST=50— prevents runaway page creation from hallucinated lists. - Path traversal: Validated at MCP boundary (
_validate_page_id, ingest path check), at library level (refine_page,pair_page_with_sources), and in reference extraction.extract_citations()andextract_raw_refs()reject..and leading/.
kb_ingest(path)— get extraction prompt- Extract JSON —
kb_ingest(path, extraction_json) - For each created page:
kb_review_page(page_id)— self-critique - If issues:
kb_refine_page(page_id, updated_content)(max 2 rounds)
1-4. Same as Standard Ingest
5. Spawn wiki-reviewer agent with created page_ids
6. Review findings — fix or accept
7. kb_affected_pages — flag related pages
kb_lint()— mechanical report- For errors:
kb_lint_deep(page_id)— evaluate fidelity - Fix issues via
kb_refine_page kb_lint_consistency()— contradiction check- Re-run
kb_lint()to verify (max 3 rounds)
kb_query(question)— synthesize answer- After user reaction:
kb_query_feedback(question, rating, pages)
- All wiki pages must link claims back to specific
raw/source files. Unsourced claims should be flagged. - Use
[[wikilinks]]for inter-page links withinwiki/. - Distinguish stated facts (
source says X) from inferences (based on A and B, we infer Y). - When updating wiki pages, prefer proposing diffs over full rewrites for auditability.
- Keep
wiki/index.mdunder 500 lines — use category groupings, one line per page. - Always install Python packages into the project
.venv, never globally. - Architecture diagram sync (MANDATORY): Source:
others/architecture-diagram.html→ Rendered:others/architecture-diagram.png→ Displayed:README.mdline 9. Every time the HTML is modified, you MUST re-render the PNG and commit it. Render command:# Run from project root with .venv activated .venv/Scripts/python -c " import asyncio from playwright.async_api import async_playwright async def main(): async with async_playwright() as p: browser = await p.chromium.launch() page = await browser.new_page(viewport={'width': 1440, 'height': 900}, device_scale_factor=3) await page.goto('file:///D:/Projects/LLM-Knowledge-Base/others/architecture-diagram.html') await page.wait_for_timeout(1500) dim = await page.evaluate('() => ({ w: document.body.scrollWidth, h: document.body.scrollHeight })') await page.set_viewport_size({'width': dim['w'], 'height': dim['h']}) await page.wait_for_timeout(500) await page.screenshot(path='others/architecture-diagram.png', full_page=True, type='png') await browser.close() asyncio.run(main()) "
---
title: "Page Title"
source:
- "raw/articles/source-file.md"
created: 2026-04-05
updated: 2026-04-05
type: entity | concept | comparison | synthesis | summary
confidence: stated | inferred | speculative
---# Web page → markdown (heavy, JavaScript-rendered pages)
crwl URL -o markdown > raw/articles/page-name.md
# Web page → markdown (lightweight, articles/blogs — faster)
trafilatura -u URL > raw/articles/page-name.md
# PDF/DOCX → markdown (simple documents)
markitdown file.pdf > raw/papers/paper-name.md
# PDF → markdown (complex documents with tables/figures)
docling file.pdf --output raw/papers/
# YouTube transcript
yt-dlp --write-auto-sub --skip-download URL -o raw/videos/video-name
# arXiv paper (Python)
# import arxiv; paper = next(arxiv.Client().results(arxiv.Search(id_list=["2401.12345"])))Configured in .mcp.json (git-ignored, local only): kb, git-mcp, context7, fetch, memory, filesystem, git, arxiv, sqlite. See .mcp.json for connection details.
Key usage:
- kb — The knowledge base MCP server (
kb.mcp_server, 25 tools). Start withkb mcporpython -m kb.mcp_server. Claude Code is the default LLM — no API key needed.kb_query(question)— returns wiki context with trust scores; Claude Code synthesizes the answer. Adduse_api=truefor Anthropic API synthesis.kb_ingest(path, extraction_json=...)— creates wiki pages from Claude Code's extraction. Omitextraction_jsonto get the extraction prompt. Adduse_api=truefor API extraction. Output includesaffected_pages(cascade review list) andwikilinks_injected(pages updated with retroactive links). Shows "Duplicate content detected" with hash if source was already ingested.kb_ingest_content(content, filename, type, extraction_json)— one-shot: saves content toraw/and creates wiki pages in one call.kb_save_source(content, filename, overwrite=false)— save content toraw/for later ingestion. Returns error if file already exists unlessoverwrite=true.kb_compile_scan()— find changed sources, thenkb_ingesteach.kb_compile(incremental=true)— run full compilation (requires ANTHROPIC_API_KEY for LLM extraction).- Browse:
kb_search,kb_read_page,kb_list_pages,kb_list_sources. - Health:
kb_stats,kb_lint(includes flagged pages + stub detection, supports--fix),kb_evolve(includes coverage gaps + stub enrichment suggestions),kb_detect_drift(finds wiki pages stale due to raw source changes),kb_graph_viz(Mermaid graph export with auto-pruning),kb_verdict_trends(weekly quality dashboard). - Quality (Phase 2):
kb_review_page,kb_refine_page,kb_lint_deep,kb_lint_consistency,kb_query_feedback,kb_reliability_map,kb_affected_pages.
- memory — Persistent knowledge graph in
.memory/memory.jsonl. Track wiki entity relationships across sessions. - arxiv — Search/download papers to
raw/papers/. - sqlite — Metadata DB at
.data/metadata.db. For wikilink graph, ingestion history, lint results.
See CHANGELOG.md for the full phase history (v0.3.0 → v0.9.11). Format: Keep a Changelog with Added/Changed/Fixed/Removed categories per version.
Current: Phase 3.93 (v0.9.12) — 613 tests, 25 MCP tools, 12 modules.
Known issues: See BACKLOG.md for the Phase 3.94 backlog. Format guide is in the HTML comment at the top of that file. Severity levels: CRITICAL (blocks release), HIGH (silent wrong results / security), MEDIUM (quality gaps / missing coverage), LOW (style/naming). Items grouped by severity then by module area. Resolved items are deleted (fix recorded in CHANGELOG.md); resolved phases collapse to a one-liner under "Resolved Phases".
Phase 4 next: Two-phase compile pipeline (batch cross-source merging before writing), pre-publish validation gate, iterative multi-hop retrieve/trace loop (BM25 + graph traversal), answer trace enforcement (reject uncited claims), conversation→KB promotion (positively-rated query answers → wiki pages), DSPy Teacher-Student optimization, RAGAS evaluation, layered context assembly (replace naive 80K truncation with tiered loading: summaries first → full pages on demand — inspired by MemPalace's 4-layer context stack), temporal claim tracking (validity windows on individual claims within pages for staleness/contradiction resolution), raw-source fallback retrieval (search raw/ directly alongside wiki/ during queries for verbatim context when summaries lose nuance), BM25 + LLM reranking (optional Haiku rerank pass for improved relevance at ~$0.001/query), auto-contradiction detection on ingest (flag when new source conflicts with existing wiki claims — not just at lint time), semantic deduplication pre-ingest (embedding similarity check to catch same-topic-different-wording duplicates beyond content hash), exchange-pair chunking for conversation sources (preserve Q+A dialogue structure instead of flat text), pattern-based source pre-classification (regex heuristics to classify content type before LLM extraction — reduce API cost).
No auto-commit hooks. Doc updates and commits are done manually when ready to push.
Resolved items are deleted from BACKLOG.md (the fix is recorded in CHANGELOG.md). When all items in a phase section are resolved, the section collapses to a one-liner under "Resolved Phases" (e.g., - **Phase 3.92** — all items resolved in v0.9.11). This keeps the backlog focused on open work only.
When asked to update docs, review git diff and update as needed:
CHANGELOG.md— add entries under[Unreleased]BACKLOG.md— delete resolved items (never strikethrough); collapse empty phase sectionsCLAUDE.md— update version numbers, test counts, module/tool counts, API docsREADME.md— update if user-facing features or setup changedothers/architecture-diagram.html+ re-render PNG if architecture changed
All tools are auto-approved for this project (permissions in settings.local.json).