Andrej Karpathy's viral post on LLM Knowledge Bases hit 1.7M views. This is the definitive resource list for the workflow he described.
"raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM."
— Andrej Karpathy, Apr 3, 2026
The workflow: Ingest → Compile → Lint → View → Query → Enhance → Repeat.
Part of the LLM KB Ecosystem: karpathy-kb-template | wiki-compiler | kb-lint
- Data Ingestion
- Wiki Compilation
- Knowledge Base Linting
- Obsidian Plugins
- IDE & Viewers
- RAG & Search
- LLM Agents & Frameworks
- Visualization & Output
- Synthetic Data & Fine-tuning
- Workflows & Guides
- Similar Projects
- Contributing
Tools for converting web pages, PDFs, papers, and other sources into clean markdown.
- Obsidian Web Clipper - Browser extension that clips web pages directly into your Obsidian vault as markdown.
- Markdownload - Browser extension to convert web pages to markdown files.
- Jina Reader - Converts any URL to LLM-friendly markdown via
r.jina.ai. Handles JavaScript-rendered pages. - Docling - IBM's document conversion library. Parses PDFs, DOCX, PPTX, HTML to markdown with table and figure extraction.
- Marker - Converts PDF, EPUB, and MOBI to markdown with high accuracy. Handles complex layouts, tables, and equations.
- Trafilatura - Python library for web scraping and text extraction. Focuses on main content extraction from web pages.
- Pandoc - Universal document converter. Converts between dozens of formats including markdown, LaTeX, DOCX, and HTML.
- pdf2md - Converts PDF files to markdown, preserving structure and formatting.
- Unstructured - General-purpose document parsing library. Handles PDFs, images, HTML, Word docs, and more.
- Firecrawl - Crawls websites and converts pages to clean markdown. Built for LLM consumption.
- Crawl4AI - Open-source LLM-friendly web crawler that outputs clean markdown with structured extraction.
- MarkItDown - Microsoft's Python tool for converting various file formats (PDF, DOCX, XLSX, PPTX, images, audio) to markdown.
- Zerox - Zero-shot PDF OCR to markdown using vision models. Simple API for document extraction.
- MinerU - High-quality document content extraction tool supporting PDF to markdown/JSON conversion.
LLM-powered tools that organize, compile, and structure knowledge into coherent wikis.
- wiki-compiler - LLM-driven compiler that organizes raw markdown fragments into a structured, interlinked wiki.
- Fabric - AI-powered framework for augmenting humans. Includes patterns for extracting and organizing knowledge.
- Khoj - Personal AI assistant that indexes your markdown notes and documents for natural language interaction.
- Quivr - Personal productivity assistant that ingests documents and builds a searchable knowledge base.
- Mem0 - Memory layer for AI applications. Persists and organizes knowledge across interactions.
Tools for checking consistency, finding gaps, and maintaining quality in markdown knowledge bases.
- kb-lint - Linter for markdown knowledge bases. Detects broken links, orphan pages, inconsistent terminology, and coverage gaps.
- Markdownlint - Style checker and linting tool for markdown files. Enforces consistent formatting.
- markdown-link-check - Checks all hyperlinks in markdown files for broken or dead links.
- Obsidian Linter - Obsidian plugin that enforces consistent markdown formatting and style rules across your vault.
- Vale - Prose linter that brings code-like linting to natural language. Supports custom style rules.
Obsidian plugins that enhance the knowledge base workflow.
- Obsidian Copilot - AI assistant inside Obsidian. Chat with your notes, generate content, and get suggestions using multiple LLM providers.
- Smart Connections - AI-powered note connections. Finds related notes using embeddings and enables chat with your vault.
- Dataview - Query engine for your vault. Treat your notes as a database with inline queries and JavaScript API.
- Marp Slides - Create presentation slides from markdown notes using the Marp framework.
- Templater - Advanced template engine for Obsidian. Create dynamic templates with JavaScript execution.
- Obsidian Git - Automatic backup and version control for your vault using Git.
- Local GPT - Run local LLMs directly within Obsidian for private AI-assisted note-taking.
- Text Generator - AI text generation plugin supporting multiple providers. Generates, rewrites, and summarizes within notes.
- Canvas - Built-in infinite canvas for visual knowledge mapping and spatial organization of notes.
Applications for viewing, editing, and navigating markdown knowledge bases.
- Obsidian - The gold standard for local-first markdown knowledge bases. Graph view, backlinks, plugins ecosystem.
- Logseq - Open-source outliner-style knowledge base with bidirectional linking and graph visualization.
- Notion - All-in-one workspace with databases, wikis, and AI features. Cloud-based.
- Foam - Personal knowledge management and sharing system built on VS Code and markdown.
- Dendron - Developer-focused knowledge management tool built on VS Code. Hierarchical note organization.
- SiYuan - Privacy-first personal knowledge management system with block-level references and sync.
- Zettlr - Markdown editor designed for academic writing and Zettelkasten-style note-taking.
- Marktext - Simple and elegant open-source markdown editor with real-time preview.
Retrieval-Augmented Generation frameworks and local search engines for querying knowledge bases.
- LlamaIndex - Data framework for connecting custom data sources to LLMs. Indexing, retrieval, and query engines.
- LangChain - Framework for developing LLM-powered applications with chains, agents, and retrieval.
- Haystack - End-to-end NLP framework for building RAG pipelines, search systems, and question answering.
- RAGFlow - Open-source RAG engine with deep document understanding and chunk-level citation.
- Chroma - Open-source embedding database. Simple API for storing and querying document embeddings.
- Qdrant - High-performance vector similarity search engine with filtering and payload support.
- Milvus - Cloud-native vector database for scalable similarity search and AI applications.
- txtai - All-in-one embeddings database for semantic search, LLM orchestration, and language model workflows.
- Vanna - RAG framework for SQL generation. Train on your database schema and documentation.
AI coding agents and frameworks for operating on knowledge bases via CLI.
- Claude Code - Anthropic's agentic CLI for Claude. Operates on files, runs commands, and iterates on codebases and knowledge bases.
- Cursor - AI code editor with deep codebase understanding. Chat, edit, and generate across files.
- Aider - AI pair programming in your terminal. Works with local Git repos and supports multiple LLM providers.
- OpenAI Codex CLI - OpenAI's lightweight coding agent that runs in the terminal with sandboxed execution.
- Continue - Open-source AI code assistant for VS Code and JetBrains. Supports multiple models.
- Open Interpreter - Natural language interface for your computer. Runs code locally to complete tasks.
- CrewAI - Framework for orchestrating role-playing AI agents that collaborate on complex tasks.
- AutoGen - Microsoft's framework for building multi-agent conversational AI systems.
- Pydantic AI - Agent framework built on Pydantic for type-safe, structured AI application development.
Tools for turning knowledge base content into presentations, diagrams, and visual outputs.
- Marp - Markdown presentation ecosystem. Convert markdown files to slides, PDFs, and HTML presentations.
- Mermaid - Generate diagrams and flowcharts from markdown-like text. Supported natively in GitHub markdown.
- D3.js - JavaScript library for data-driven visualizations. Create interactive charts and graphs from knowledge base data.
- Markmap - Visualize markdown documents as interactive mind maps.
- Matplotlib - Python plotting library for generating charts, graphs, and figures from data.
- Excalidraw - Virtual whiteboard for sketching hand-drawn-style diagrams. Has an Obsidian integration.
- Slidev - Presentation slides for developers using markdown and Vue components.
- reveal.js - HTML presentation framework with markdown support and a rich plugin ecosystem.
Tools for distilling knowledge bases into training data and fine-tuned model weights.
- Distilabel - Framework for synthetic data generation and AI feedback. Create training datasets from knowledge bases.
- Axolotl - Streamlined fine-tuning tool supporting multiple architectures. LoRA, QLoRA, full fine-tuning.
- Unsloth - Fast LLM fine-tuning with 2x speed and 60% less memory. Supports Llama, Mistral, and more.
- LitGPT - Hackable implementation of open-source LLMs for pretraining, fine-tuning, and deployment.
- MLX - Apple's array framework for machine learning on Apple silicon. Efficient local fine-tuning.
- Argilla - Collaboration platform for AI engineers and domain experts to build high-quality datasets.
Blog posts, tutorials, and videos about building LLM knowledge bases.
- Karpathy's Tweet Thread (Apr 2026) - The viral thread that crystallized the raw data → LLM → wiki → CLI workflow.
- Building a Second Brain (Tiago Forte) - The original methodology for personal knowledge management that inspired many tools in this list.
- Zettelkasten Method - The atomic note-taking method. Foundation for tools like Obsidian and Logseq.
- Obsidian Hub - Community-maintained documentation, guides, and resources for Obsidian workflows.
- LlamaIndex Documentation - Comprehensive guides on building RAG pipelines over document collections.
- Simon Willison's Blog - Prolific coverage of LLM tools, workflows, and practical AI engineering.
Existing knowledge base, second brain, and personal wiki projects.
- Awesome Knowledge Management - Curated list of knowledge management tools and resources.
- Awesome Obsidian - Curated list of Obsidian resources, plugins, and themes.
- Second Brain - Resources and tools for building a digital second brain.
- Awesome Zettelkasten - Curated list of Zettelkasten tools, guides, and resources.
- Project Memex - Vannevar Bush's 1945 vision of a personal knowledge device — the ancestor of this entire space.
Contributions welcome! Please read the contributing guidelines first.
To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this work.
