Skip to content

Latest commit

 

History

History
275 lines (206 loc) · 19.3 KB

File metadata and controls

275 lines (206 loc) · 19.3 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project

LLM Knowledge Base — a personal, LLM-maintained knowledge wiki inspired by Karpathy's LLM Knowledge Bases pattern. The system compiles raw sources into structured, interlinked markdown rather than using RAG/vector retrieval.

Philosophy: Human curates sources and approves decisions; LLM handles all compilation, querying, maintenance, and coding.

Implementation Status

Phase 3.93 complete (v0.9.12). 613 tests, 25 MCP tools, 12 modules. Phase 1 core (5 operations + graph + CLI) plus Phase 2 quality system (feedback, review, semantic lint) plus v0.5.0 fixes plus v0.6.0 DRY refactor plus v0.7.0 S+++ upgrade (MCP server split into package, graph PageRank/centrality, entity enrichment on multi-source ingestion, persistent lint verdicts, case-insensitive wikilinks, trust threshold fix, template hash change detection, comparison/synthesis templates, 2 new MCP tools). Plus v0.8.0 BM25 search engine (replaces bag-of-words keyword matching with BM25 ranking — term frequency saturation, inverse document frequency, document length normalization). Plus v0.9.0 hardening release (path traversal protection, citation regex fix, slug collision tracking, JSON fence hardening, MCP error handling, max_results bounds, MCP Phase 2 instructions).

Phase 1 modules: kb.config, kb.models, kb.utils, kb.ingest, kb.compile, kb.query, kb.lint, kb.evolve, kb.graph, kb.mcp_server, CLI (6 commands: ingest, compile, query, lint, evolve, mcp). MCP server split into kb.mcp package (app, core, browse, health, quality).

Phase 2 modules:

  • kb.feedback — query feedback store (weighted Bayesian trust scoring — "wrong" penalized 2x vs "incomplete"), reliability analysis (flagged pages, coverage gaps)
  • kb.review — page-source pairing, review context/checklist builder, page refiner (frontmatter-preserving updates with audit trail)
  • kb.lint.semantic — fidelity, consistency, completeness context builders for LLM-powered evaluation
  • .claude/agents/wiki-reviewer.md — Actor-Critic reviewer agent definition
  • kb.lint.verdicts — persistent lint/review verdict storage (pass/fail/warning with audit trail)

Phase 3+ (200+ pages): DSPy Teacher-Student optimization, RAGAS evaluation, Reweave. Research in research/agent-architecture-research.md.

Development Commands

# Activate venv (ALWAYS use project .venv, never global Python)
.venv\Scripts\activate        # Windows
source .venv/bin/activate     # Unix

# Environment setup
cp .env.example .env          # then fill in:
# ANTHROPIC_API_KEY (required), FIRECRAWL_API_KEY (optional), OPENAI_API_KEY (optional)

# Install deps + editable package (enables `kb` CLI command)
# NOTE: `pip install -e .` must be run before `kb` CLI or `from kb import ...` works outside pytest
pip install -r requirements.txt && pip install -e .

# Run all tests
python -m pytest

# Run single test
python -m pytest tests/test_models.py::test_extract_wikilinks -v

# Lint, auto-fix, & format
ruff check src/ tests/
ruff check src/ tests/ --fix
ruff format src/ tests/

# CLI (after pip install -e .)
kb --version
kb ingest raw/articles/example.md --type article
kb compile [--full]
kb query "What is compile-not-retrieve?"
kb lint [--fix]
kb evolve
kb mcp                        # Start MCP server for Claude Code

# Playwright browser (needed by crawl4ai)
python -m playwright install chromium

Ruff config: line length 100, Python 3.12+, rules E/F/I/W/UP (see pyproject.toml).

Architecture

Three-Layer Content Structure

  • raw/ — Immutable source documents. The LLM reads but never modifies files here. Subdirs: articles/, papers/, repos/, videos/, podcasts/, books/, datasets/, conversations/, assets/. Use Obsidian Web Clipper for web→markdown; download images to raw/assets/.
  • wiki/ — LLM-generated and LLM-maintained markdown. Page subdirs: entities/, concepts/, comparisons/, summaries/, synthesis/. All pages use YAML frontmatter (see template below).
  • research/ — Human-authored analysis, project ideas, and meta-research about the knowledge base approach.

Five Operations Cycle: Ingest → Compile → Query → Lint → Evolve

  • Ingest: User adds source to raw/. LLM reads it, writes summary to wiki/summaries/, updates wiki/index.md, modifies relevant entity/concept pages, appends to wiki/log.md.
  • Compile: LLM builds/updates interlinked wiki pages. Uses hash-based incremental detection — only processes new/changed sources. Manifest saved after each source (crash-safe). Proposes diffs, not full rewrites.
  • Query: User asks questions. LLM searches wiki using BM25 ranking (term frequency saturation, IDF weighting, length normalization), synthesizes answers with inline citations to wiki pages and raw sources. Title tokens boosted by SEARCH_TITLE_WEIGHT (repeated N times before indexing). Context truncated to 80K chars. Good answers become new wiki pages.
  • Lint: Health check — orphan pages, dead links, staleness, wikilink cycles, coverage gaps, frontmatter validation. Produces a report, does not silently fix.
  • Evolve: Gap analysis — what topics lack coverage, what concepts should be linked, what raw sources would fill gaps. User picks suggestions, LLM executes.

Python Package (src/kb/)

Entry point: kb = "kb.cli:cli" in pyproject.toml. Version in src/kb/__init__.py. MCP server entry: python -m kb.mcp_server or kb mcp. MCP package: kb.mcp (split from monolithic mcp_server.py).

All paths, model tiers, page types, and confidence levels are defined in kb.config — import from there, never hardcode. PROJECT_ROOT resolves from config.py's location, so it works regardless of working directory.

Key APIs (non-obvious behavior — for full signatures, read the source):

  • call_llm(prompt, tier="write") / call_llm_json(prompt, tier, schema) — In kb.utils.llm. Tiers: scan (Haiku), write (Sonnet), orchestrate (Opus). call_llm_json uses forced tool_use for guaranteed structured JSON — no fence-stripping needed. Raises LLMError on failure, ValueError on invalid tier.
  • ingest_source(path, source_type=None, extraction=None, *, defer_small=False) — In kb.ingest.pipeline. Returns dict with pages_created, pages_updated, pages_skipped, affected_pages, wikilinks_injected, and duplicate: True on hash match. Pass extraction dict to skip LLM call (Claude Code mode).
  • load_all_pages(wiki_dir=None) — In kb.utils.pages. Returns list of dicts. Gotcha: raw_content field is pre-lowercased (for BM25), not verbatim.
  • slugify(text) / yaml_escape(value) — In kb.utils.text. Single source of truth — imported everywhere, never duplicate.
  • build_extraction_schema(template) — In kb.ingest.extractors. Builds JSON Schema from template fields. load_template() is LRU-cached.
  • refine_page(page_id, content, notes) — In kb.review.refiner. Uses regex-based frontmatter split (not YAML parser), rejects content starting with --- to prevent corruption.

Wiki Index Files

File Purpose
wiki/index.md Master catalog, one-line per page, under 500 lines
wiki/_sources.md Raw source → wiki page mapping (traceability for lint)
wiki/_categories.md Auto-maintained category tree
wiki/log.md Append-only chronological record of all operations
wiki/contradictions.md Explicit tracker for conflicting claims across sources

Local-only directories (git-ignored): .claude/, .tools/, .memory/, .data/, openspec/, .mcp.json. The others/ directory holds misc files like screenshots.

Model Tiering

Tier Model ID Env Override Use
scan claude-haiku-4-5-20251001 CLAUDE_SCAN_MODEL Index reads, link checks, file diffs — mechanical, low-reasoning
write claude-sonnet-4-6 CLAUDE_WRITE_MODEL Article writing, extraction, summaries — quality at reasonable cost
orchestrate claude-opus-4-6 CLAUDE_ORCHESTRATE_MODEL Orchestration, query answering, verification — highest reasoning

Extraction Templates (templates/)

10 YAML schemas (article, paper, video, repo, podcast, book, dataset, conversation, comparison, synthesis). Each defines extract: fields and wiki_outputs: mapping (documentation-only, not enforced in code). All follow the same output pattern: summaries → entities → concepts. Used by the ingest pipeline to drive consistent extraction via the extract: fields.

Testing

Pytest with testpaths = ["tests"], pythonpath = ["src"]. Fixtures in conftest.py:

  • project_root / raw_dir / wiki_dir — point to real project directories (read-only use)
  • tmp_wiki(tmp_path) — isolated wiki directory with all 5 subdirectories for tests that write wiki pages
  • tmp_project(tmp_path) — full project directory with wiki/ (5 subdirs + log.md) and raw/ (4 subdirs) for Phase 2 tests
  • create_wiki_page — factory fixture for creating wiki pages with proper frontmatter (parameterized: page_id, title, content, source_ref, page_type, confidence, updated, wiki_dir)
  • create_raw_source — factory fixture for creating raw source files

583 tests across 38 test files — run python -m pytest -v to list all. New tests per phase go in versioned files (e.g., test_v099_phase39.py). Use the tmp_wiki/tmp_project fixtures for any test that writes files — never write to the real wiki/ or raw/ in tests.

Error Handling Conventions

  • MCP tools: Return "Error: ..." strings on failure — never raise exceptions to the MCP client. Page IDs validated via _validate_page_id() (rejects path traversal, verifies path within WIKI_DIR).
  • LLM calls: Shared retry logic in _make_api_call() — 3 attempts with exponential backoff on rate limits/overload/timeout. LLMError on exhaustion. call_llm_json() uses forced tool_use for guaranteed structured JSON (no fence-stripping needed).
  • Page loading loops: Catch specific exceptions (not broad Exception) and skip with warning — never abort a full scan for one bad file.
  • JSON stores: All use atomic_json_write() (temp file + rename). Capped at 10,000 entries each (feedback, verdicts).
  • Ingest limits: MAX_ENTITIES_PER_INGEST=50, MAX_CONCEPTS_PER_INGEST=50 — prevents runaway page creation from hallucinated lists.
  • Path traversal: Validated at MCP boundary (_validate_page_id, ingest path check), at library level (refine_page, pair_page_with_sources), and in reference extraction. extract_citations() and extract_raw_refs() reject .. and leading /.

Phase 2 Workflows

Standard Ingest (with Self-Refine)

  1. kb_ingest(path) — get extraction prompt
  2. Extract JSON — kb_ingest(path, extraction_json)
  3. For each created page: kb_review_page(page_id) — self-critique
  4. If issues: kb_refine_page(page_id, updated_content) (max 2 rounds)

Thorough Ingest (with Actor-Critic)

1-4. Same as Standard Ingest 5. Spawn wiki-reviewer agent with created page_ids 6. Review findings — fix or accept 7. kb_affected_pages — flag related pages

Deep Lint

  1. kb_lint() — mechanical report
  2. For errors: kb_lint_deep(page_id) — evaluate fidelity
  3. Fix issues via kb_refine_page
  4. kb_lint_consistency() — contradiction check
  5. Re-run kb_lint() to verify (max 3 rounds)

Query with Feedback

  1. kb_query(question) — synthesize answer
  2. After user reaction: kb_query_feedback(question, rating, pages)

Conventions

  • All wiki pages must link claims back to specific raw/ source files. Unsourced claims should be flagged.
  • Use [[wikilinks]] for inter-page links within wiki/.
  • Distinguish stated facts (source says X) from inferences (based on A and B, we infer Y).
  • When updating wiki pages, prefer proposing diffs over full rewrites for auditability.
  • Keep wiki/index.md under 500 lines — use category groupings, one line per page.
  • Always install Python packages into the project .venv, never globally.
  • Architecture diagram sync (MANDATORY): Source: others/architecture-diagram.html → Rendered: others/architecture-diagram.png → Displayed: README.md line 9. Every time the HTML is modified, you MUST re-render the PNG and commit it. Render command:
    # Run from project root with .venv activated
    .venv/Scripts/python -c "
    import asyncio
    from playwright.async_api import async_playwright
    async def main():
        async with async_playwright() as p:
            browser = await p.chromium.launch()
            page = await browser.new_page(viewport={'width': 1440, 'height': 900}, device_scale_factor=3)
            await page.goto('file:///D:/Projects/LLM-Knowledge-Base/others/architecture-diagram.html')
            await page.wait_for_timeout(1500)
            dim = await page.evaluate('() => ({ w: document.body.scrollWidth, h: document.body.scrollHeight })')
            await page.set_viewport_size({'width': dim['w'], 'height': dim['h']})
            await page.wait_for_timeout(500)
            await page.screenshot(path='others/architecture-diagram.png', full_page=True, type='png')
            await browser.close()
    asyncio.run(main())
    "

Wiki Page Frontmatter Template

---
title: "Page Title"
source:
  - "raw/articles/source-file.md"
created: 2026-04-05
updated: 2026-04-05
type: entity | concept | comparison | synthesis | summary
confidence: stated | inferred | speculative
---

Ingestion Commands

# Web page → markdown (heavy, JavaScript-rendered pages)
crwl URL -o markdown > raw/articles/page-name.md

# Web page → markdown (lightweight, articles/blogs — faster)
trafilatura -u URL > raw/articles/page-name.md

# PDF/DOCX → markdown (simple documents)
markitdown file.pdf > raw/papers/paper-name.md

# PDF → markdown (complex documents with tables/figures)
docling file.pdf --output raw/papers/

# YouTube transcript
yt-dlp --write-auto-sub --skip-download URL -o raw/videos/video-name

# arXiv paper (Python)
# import arxiv; paper = next(arxiv.Client().results(arxiv.Search(id_list=["2401.12345"])))

MCP Servers

Configured in .mcp.json (git-ignored, local only): kb, git-mcp, context7, fetch, memory, filesystem, git, arxiv, sqlite. See .mcp.json for connection details.

Key usage:

  • kb — The knowledge base MCP server (kb.mcp_server, 25 tools). Start with kb mcp or python -m kb.mcp_server. Claude Code is the default LLM — no API key needed.
    • kb_query(question) — returns wiki context with trust scores; Claude Code synthesizes the answer. Add use_api=true for Anthropic API synthesis.
    • kb_ingest(path, extraction_json=...) — creates wiki pages from Claude Code's extraction. Omit extraction_json to get the extraction prompt. Add use_api=true for API extraction. Output includes affected_pages (cascade review list) and wikilinks_injected (pages updated with retroactive links). Shows "Duplicate content detected" with hash if source was already ingested.
    • kb_ingest_content(content, filename, type, extraction_json) — one-shot: saves content to raw/ and creates wiki pages in one call.
    • kb_save_source(content, filename, overwrite=false) — save content to raw/ for later ingestion. Returns error if file already exists unless overwrite=true.
    • kb_compile_scan() — find changed sources, then kb_ingest each.
    • kb_compile(incremental=true) — run full compilation (requires ANTHROPIC_API_KEY for LLM extraction).
    • Browse: kb_search, kb_read_page, kb_list_pages, kb_list_sources.
    • Health: kb_stats, kb_lint (includes flagged pages + stub detection, supports --fix), kb_evolve (includes coverage gaps + stub enrichment suggestions), kb_detect_drift (finds wiki pages stale due to raw source changes), kb_graph_viz (Mermaid graph export with auto-pruning), kb_verdict_trends (weekly quality dashboard).
    • Quality (Phase 2): kb_review_page, kb_refine_page, kb_lint_deep, kb_lint_consistency, kb_query_feedback, kb_reliability_map, kb_affected_pages.
  • memory — Persistent knowledge graph in .memory/memory.jsonl. Track wiki entity relationships across sessions.
  • arxiv — Search/download papers to raw/papers/.
  • sqlite — Metadata DB at .data/metadata.db. For wikilink graph, ingestion history, lint results.

Implementation History

See CHANGELOG.md for the full phase history (v0.3.0 → v0.9.11). Format: Keep a Changelog with Added/Changed/Fixed/Removed categories per version.

Current: Phase 3.93 (v0.9.12) — 613 tests, 25 MCP tools, 12 modules.

Known issues: See BACKLOG.md for the Phase 3.94 backlog. Format guide is in the HTML comment at the top of that file. Severity levels: CRITICAL (blocks release), HIGH (silent wrong results / security), MEDIUM (quality gaps / missing coverage), LOW (style/naming). Items grouped by severity then by module area. Resolved items are deleted (fix recorded in CHANGELOG.md); resolved phases collapse to a one-liner under "Resolved Phases".

Phase 4 next: Two-phase compile pipeline (batch cross-source merging before writing), pre-publish validation gate, iterative multi-hop retrieve/trace loop (BM25 + graph traversal), answer trace enforcement (reject uncited claims), conversation→KB promotion (positively-rated query answers → wiki pages), DSPy Teacher-Student optimization, RAGAS evaluation, layered context assembly (replace naive 80K truncation with tiered loading: summaries first → full pages on demand — inspired by MemPalace's 4-layer context stack), temporal claim tracking (validity windows on individual claims within pages for staleness/contradiction resolution), raw-source fallback retrieval (search raw/ directly alongside wiki/ during queries for verbatim context when summaries lose nuance), BM25 + LLM reranking (optional Haiku rerank pass for improved relevance at ~$0.001/query), auto-contradiction detection on ingest (flag when new source conflicts with existing wiki claims — not just at lint time), semantic deduplication pre-ingest (embedding similarity check to catch same-topic-different-wording duplicates beyond content hash), exchange-pair chunking for conversation sources (preserve Q+A dialogue structure instead of flat text), pattern-based source pre-classification (regex heuristics to classify content type before LLM extraction — reduce API cost).

Automation

No auto-commit hooks. Doc updates and commits are done manually when ready to push.

BACKLOG.md lifecycle

Resolved items are deleted from BACKLOG.md (the fix is recorded in CHANGELOG.md). When all items in a phase section are resolved, the section collapses to a one-liner under "Resolved Phases" (e.g., - **Phase 3.92** — all items resolved in v0.9.11). This keeps the backlog focused on open work only.

Doc update checklist (before push)

When asked to update docs, review git diff and update as needed:

  • CHANGELOG.md — add entries under [Unreleased]
  • BACKLOG.mddelete resolved items (never strikethrough); collapse empty phase sections
  • CLAUDE.md — update version numbers, test counts, module/tool counts, API docs
  • README.md — update if user-facing features or setup changed
  • others/architecture-diagram.html + re-render PNG if architecture changed

All tools are auto-approved for this project (permissions in settings.local.json).