Context Hub bridges the gap between rapidly evolving APIs and LLM knowledge cutoffs. It's a repository of curated, LLM-optimized documentation and skills that AI agents (and humans) can search and retrieve via a CLI.
There are two kinds of content, with a fundamental distinction:
- Docs ("what to know") โ API/SDK reference documentation, factual knowledge that fills knowledge cutoff gaps. Large, detailed, fetched on-demand for a specific task.
- Skills ("how to do it") โ Behavioral instructions, coding patterns, automation playbooks. Smaller, actionable, can be installed into agent skill directories for persistent availability.
Each entry also has a source field (official | maintainer | community) for trust/quality signaling. Users control which sources agents see via ~/.chub/config.yaml.
Content repo (source of truth)
โ chub build โ registry.json + content tree
CDN (serves registry + individual files + optional full bundle) โ remote source
โ CLI fetches from here
~/.chub/ (local cache) โ cached remote data
โ CLI reads from here
Agent/Human (consumes docs via stdout or -o file)
โ CLI also reads directly from
Local folders (private/internal docs) โ local source
The CLI supports multiple sources โ both remote CDNs and local folders. Entries from all sources are merged.
We initially treated all content uniformly โ just tags, no rigid types. But docs and skills have fundamentally different access patterns:
| Docs | Skills | |
|---|---|---|
| Purpose | Reference knowledge ("what to know") | Behavioral instructions ("how to do it") |
| Size | Large (10K-50K+ tokens) | Small (<500 lines entry point) |
| Lifecycle | Ephemeral, fetched per-task | Can be persistent, installed into agent |
| Discovery | Agent explicitly searches and fetches | Agent can auto-discover from filesystem |
| Install target | .context/ or any file |
.claude/skills/, .cursor/skills/, etc. |
| Language/version | Yes โ per-language, per-version variants | No โ skills are typically language-agnostic |
This distinction drives the registry format split into docs[] and skills[]. The CLI uses a single chub get <id> command that auto-detects the type.
The original format had a single entries[] array with a provides field to indicate doc/skill. We split it because:
- Different schemas: Docs need
languages[].versions[]nesting. Skills are flat โ no language or version, justname,path,files. - Array membership IS the type: No need for a
providesfield. A doc is indocs[], a skill is inskills[]. - Bundled entries: When a topic has both DOC.md and SKILL.md, they appear as separate items in their respective arrays. Clean separation.
- CLI mapping:
chub getsearches both arrays and auto-detects type.chub searchsearches both.
Skills are behavioral instructions ("how to integrate Stripe", "how to write Playwright login flows"). They're typically language-agnostic or written for a single context. Adding language/version nesting would add complexity without value โ a skill author who needs Python and TypeScript variants can create two separate skill entries.
Docs, on the other hand, have fundamentally different content per language (Python SDK vs JavaScript SDK) and evolve with API versions.
We originally had chub get docs <id> and chub get skills <id> as separate subcommands. We simplified to chub get <id> because the CLI can auto-detect the type from the registry (docs have languages[], skills don't). The user shouldn't need to know internal taxonomy to fetch content. --lang and --version flags apply when the entry is a doc and are silently ignored for skills.
We considered using SKILL.md for everything since the Agent Skills spec is the format standard. But calling a 50K API reference "SKILL.md" is semantically misleading โ agents that scan for skills would load doc descriptions into their system prompt (wasting ~100 tokens per doc entry), and might "activate" a doc when the user just wants to write code.
Originally: chub get openai-chat python. Changed to: chub get openai/chat-api --lang python.
Reasons:
- Multi-id support (
chub get openai/chat-api stripe/payments) would make a positional language argument ambiguous - Language can be auto-inferred when an entry has only one โ the flag is only needed for disambiguation
- Flags are self-documenting; a bare
pythonafter an id is ambiguous to readers
Agents often need multiple entries in one operation. Rather than looping, chub get openai/chat-api stripe/payments fetches both. Output is concatenated with --- separators for stdout, or written as separate files when -o points to a directory.
We considered separate tools for docs and skills. Rejected because they share the same registry, config, sources, search, and cache infrastructure.
We started with 8 commands and trimmed to 5 core + 1 build: search, get, update, cache, feedback, and build. list and info were merged into search. get docs and get skills were merged into get with auto-detection.
Each entry has source: "official" | "maintainer" | "community". The human controls trust policy via ~/.chub/config.yaml. An enterprise can restrict agents to source: official,maintainer without the agent needing to know about quality tiers.
Rather than rigid sub-types, entries use free-form tags. This is flexible โ new categories emerge without schema changes.
A monolithic 50K-token doc file wastes context. Each entry is a directory with a small entry point (DOC.md or SKILL.md, ~500 lines max) that links to detailed reference files. The agent reads the overview first, then selectively loads only what it needs.
The --full flag exists for when you want everything. With -o <dir>, --full writes individual files preserving directory structure so relative links resolve on disk. Without -o, it concatenates to stdout with # FILE: headers.
Three approaches were considered:
- Full bundle (download everything) โ simple but doesn't scale
- Index + on-demand (fetch individual docs) โ lightweight but needs network per doc
- Hybrid (chosen) โ registry-only by default, on-demand doc fetching, optional full bundle
IDs are always author/name โ e.g., openai/chat-api, stripe/payments, playwright-community/login-flows. The author is the top-level directory name in the content repo; the name comes from frontmatter. This eliminates name collisions by construction โ two authors can both have a chat entry, but their ids differ (openai/chat-api vs mycompany/chat). This is the same pattern as npm scopes, Docker images, and GitHub repos.
When multiple sources define the same id, the user disambiguates with a source: prefix: internal:openai/chat-api vs community:openai/chat-api. We use colon instead of slash because ids already contain slashes (author/name). Using source/author/name would be ambiguous โ is internal the source or the author?
Teams often have internal/proprietary docs alongside the public community registry. The CLI supports multiple sources โ remote CDNs and local folders. Entries are merged, and IDs are namespaced with source: only when there's a collision across sources.
Content is organized by author directories. Each author gets a top-level directory and organizes their docs and skills inside it:
content-repo/
โโโ stripe/ # author directory
โ โโโ registry.json # OPTIONAL: author manages own index
โ โโโ docs/
โ โ โโโ payments/ # entry directory
โ โ โโโ DOC.md # frontmatter: name, description, languages, versions
โ โ โโโ references/
โ โ โ โโโ webhooks.md
โ โ โโโ examples/
โ โ โโโ checkout.py
โ โโโ skills/
โ โโโ integration/ # entry directory
โ โโโ SKILL.md
โ โโโ scripts/
โ โโโ setup.sh
โโโ openai/ # no registry.json โ auto-discover
โ โโโ docs/
โ โโโ chat/
โ โโโ DOC.md # languages: "python,javascript", versions: "1.52.0"
โ โโโ references/
โ โโโ streaming.md
โโโ playwright-community/
โโโ skills/
โโโ login-flows/
โโโ SKILL.md
โโโ helpers/
โโโ login-util.ts
Convention: <author>/{docs,skills}/<entry-name>/ with DOC.md or SKILL.md at root.
Following the convention established by Anthropic and OpenAI skill repos:
- SKILL.md (or DOC.md) lives in a directory
- All other files in that directory are companions โ installed together with
--full - References use relative paths (e.g.,
[Auth](references/auth.md)) --full -o <dir>writes individual files preserving directory structure, so relative links resolve on disk- Without
--full, only the entry point (DOC.md or SKILL.md) is fetched
If an author directory contains registry.json, the build uses it directly. Same schema as the top-level registry (with docs[] and skills[]). Paths are prefixed with the author directory name during merge.
This is for authors with complex organization who want full control over their index.
The build walks the author directory, finds all DOC.md and SKILL.md files, and parses frontmatter to generate registry entries.
DOC.md frontmatter:
---
name: chat-api
description: OpenAI Chat API - completions, streaming, function calling
metadata:
languages: "python,javascript,typescript" # comma-separated, multi-lang
versions: "1.52.0" # comma-separated, multi-version
updated-on: "2026-01-15"
source: maintainer
tags: "openai,chat,llm"
---SKILL.md frontmatter (no language/version needed):
---
name: login-flows
description: Login flow automation patterns for Playwright
metadata:
updated-on: "2026-01-15"
source: community
tags: "browser,playwright,automation"
---A single DOC.md can declare multiple languages and versions (comma-separated strings). The build expands this into the registry schema โ multiple languages[] entries, each with multiple versions[], all pointing to the same directory path.
This means one doc file can serve Python, JavaScript, and TypeScript users if the content is language-agnostic enough. When content differs per language, authors create separate DOC.md files in separate directories โ the build groups them by matching name.
When an API has breaking changes across versions, the author creates separate DOC.md files:
openai/
โโโ docs/
โโโ chat/
โโโ v1/
โ โโโ DOC.md # versions: "1.52.0,1.51.0", languages: "python,javascript"
โ โโโ references/
โ โโโ streaming.md
โโโ v2/
โโโ DOC.md # versions: "2.0.0", languages: "python,javascript"
โโโ references/
โโโ streaming.md
โโโ structured-outputs.md
Both DOC.md files have name: chat-api (under the openai/ author directory) โ they get grouped into id: openai/chat-api, into one docs[] entry with multiple versions pointing to different paths. recommendedVersion is the highest semver. chub get openai/chat-api gets the latest; --version 1.52.0 gets the older docs.
For different content per language:
stripe/
โโโ docs/
โโโ payments/
โโโ python/
โ โโโ DOC.md # languages: "python", versions: "14.0.0"
โโโ javascript/
โโโ DOC.md # languages: "javascript", versions: "14.0.0"
Same name: payments under stripe/ โ both contribute to id: stripe/payments. Different languages, different paths, different content.
chub build <content-dir> [options]Options:
-o, --output <dir>โ output directory (default:<content-dir>/dist)--base-url <url>โ setbase_urlin registry (for CDN deployment)--validate-onlyโ check frontmatter and structure without writing output--jsonโ output build summary as JSON
Build steps:
- List top-level directories in
<content-dir>(author directories) - For each author directory:
- If
registry.jsonexists โ use it directly, prefix paths - Else โ auto-discover DOC.md/SKILL.md, parse frontmatter, group by
name
- If
- Merge all author entries into one registry (ids are
author/name, so collisions are rare) - Write
registry.jsonto output dir - Copy content tree to output dir (preserving structure)
- Print summary: N docs, N skills, N warnings
- DOC.md must have
name,description,metadata.languages,metadata.versions - SKILL.md must have
name,description(no language/version required) - Warn on missing
metadata.source(default: "community") - Warn on missing
metadata.tags - Error on duplicate id (rare since ids are
author/name) - If both DOC.md and SKILL.md exist in the same directory,
namemust match
The build output is a static directory ready to serve:
dist/
โโโ registry.json # Generated index
โโโ stripe/docs/payments/DOC.md # Content files (copied)
โโโ stripe/docs/payments/references/...
โโโ openai/docs/chat-api/DOC.md
โโโ playwright-community/skills/login-flows/SKILL.md
Upload dist/ to any static file host (S3, CloudFlare R2, GitHub Pages). The CLI fetches registry.json first, then individual files on demand.
| Command | Purpose | Key Options |
|---|---|---|
chub search [query] |
Search (no query = list all, exact id = detail) | --tags, --lang, --limit, --json |
chub get <ids...> |
Fetch docs or skills (auto-detects type) | --lang, --version, --full, -o <path>, --json |
chub update |
Refresh cached registry | --force, --full |
chub cache status|clear |
Manage local cache | |
chub build <content-dir> |
Build registry from content | -o, --base-url, --validate-only, --json |
chub searchโ lists all entries (replaceslist)chub search openai/chat-apiโ exact id match shows full detail (replacesinfo)chub search "stripe"โ fuzzy search across id, name, description, tagschub search --tags browserโ filtered listing- Results show
[doc]or[skill]type labels
chub get openai/chat-api --lang pythonโ auto-detects doc, fetches DOC.mdchub get openai/chat-api --fullโ fetch all files in the entrychub get openai/chat-api --full -o .context/openai/โ write individual files preserving structurechub get openai/chat-api stripe/payments --lang jsโ fetch multiple entries at oncechub get pw-community/login-flowsโ auto-detects skill, fetches SKILL.mdchub get nonexistent/thingโ error:Entry "nonexistent/thing" not found.
- Entry has one language โ auto-selected, no
--langneeded - Entry has multiple languages, no
--langโ error with suggestion --langapplies to all ids in a multi-id command--langand--versionapply to doc entries, silently ignored for skills
- Default: Human-friendly, colored terminal output
--json: Structured JSON to stdout (no color escapes)-o <path>: Write content to file, print short confirmation to stderr-o <dir>/: Write each entry as separate file when fetching multiple--full -o <dir>: Write individual files preserving directory structure
# Get the top search result's id
chub search "stripe payments" --json | jq -r '.results[0].id'
# Full pipeline: search โ pick best โ fetch โ write to file
ID=$(chub search "stripe payments" --json | jq -r '.results[0].id')
chub get "$ID" --lang js -o .context/stripe.md
# Fetch top 3 results
chub search "stripe" --json | jq -r '.results[:3][].id' | xargs chub get -o .context/
# Fetch multiple at once
chub get openai/chat-api stripe/payments -o .context/
# Install a skill into Claude Code's skill directory
chub get pw-community/login-flows -o .claude/skills/login-flows/SKILL.md
# Install a skill with all companion files
chub get pw-community/login-flows --full -o .claude/skills/login-flows/
# Multi-source: disambiguate with source: prefix
chub get internal:openai/chat-apiAll content follows the Agent Skills spec. Both DOC.md and SKILL.md use the standard's frontmatter format (name, description, optional metadata). This makes chub content interoperable with Claude Code, Cursor, Codex, OpenCode, and 30+ agents.
cdn.aichub.org/v1/
โโโ registry.json # Index (~100KB)
โโโ bundle.tar.gz # Full bundle (optional)
โโโ stripe/docs/payments/DOC.md # Entry point
โโโ stripe/docs/payments/references/webhooks.md # Supporting file
โโโ playwright-community/skills/login-flows/SKILL.md # Skill
chub updateโ fetchesregistry.jsononly (~100KB), caches locallychub searchโ searches local registry (no network)chub get <id>โ auto-detects type, fetches entry point (DOC.md or SKILL.md), checks cache firstchub get <id> --fullโ fetches all files listed in registrychub update --fullโ downloads entirebundle.tar.gzfor offline use
~/.chub/
โโโ config.yaml # User config (optional, created manually)
โโโ sources/ # Per-source cache (remote sources only)
โโโ community/
โ โโโ registry.json # Cached index for this source
โ โโโ meta.json # { lastUpdated, registryHash }
โ โโโ data/ # Cached content (on-demand or full bundle)
โโโ another-remote/
โโโ ...
Local path sources are not cached โ the CLI reads directly from the configured path.
{
"version": "1.0.0",
"base_url": "https://cdn.aichub.org/v1",
"generated": "2026-02-02T00:00:00.000Z",
"docs": [
{
"id": "openai/chat-api",
"name": "chat-api",
"description": "Chat completions with GPT models",
"source": "maintainer",
"tags": ["openai", "chat", "llm"],
"languages": [
{
"language": "python",
"versions": [
{
"version": "1.52.0",
"path": "openai/docs/chat-api/v1",
"files": ["DOC.md", "references/streaming.md"],
"size": 42000,
"lastUpdated": "2026-01-15"
}
],
"recommendedVersion": "1.52.0"
}
]
}
],
"skills": [
{
"id": "playwright-community/login-flows",
"name": "login-flows",
"description": "Login flow automation patterns for Playwright",
"source": "community",
"tags": ["browser", "playwright"],
"path": "playwright-community/skills/login-flows",
"files": ["SKILL.md", "helpers/login-util.ts"],
"size": 12000,
"lastUpdated": "2026-01-15"
}
]
}Doc entry fields:
idโ unique identifier inauthor/nameformat, used bychub get <id>nameโ short name from frontmatter (the part after the author prefix)descriptionโ short description for search resultssourceโofficial(library author),maintainer(context-hub team),communitytagsโ free-form tags for filteringlanguages[]โ per-language groupingversions[]โ per-version, each withpath,files,size,lastUpdatedrecommendedVersionโ default version to fetch
Skill entry fields:
name,description,source,tagsโ same as docspathโ directory path (relative tobase_urlor source root)filesโ all files in the entry directorysize,lastUpdatedโ flat, no language/version nesting
# Multi-source (recommended)
sources:
- name: community
url: https://cdn.aichub.org/v1 # Remote CDN
- name: internal
path: /path/to/local/docs # Local folder (build output)
# Trust policy: which entry sources to show
source: "official,maintainer,community"
# Optional
refresh_interval: 86400 # Cache TTL in seconds (24h)Backward compat: If no sources array, falls back to single cdn_url field (or CHUB_BUNDLE_URL env var) as a source named "default".
Local source: Can be either a raw content repo or a chub build output directory โ both must contain registry.json at root with the standard schema.
Content follows the Agent Skills open standard, supported by Claude Code, Cursor, Codex, OpenCode, and 30+ agents.
| Layer | Agent Skills spec | npx skills (Vercel) | chub |
|---|---|---|---|
| Format | SKILL.md with frontmatter | SKILL.md | SKILL.md + DOC.md |
| Discovery | Local filesystem scan | npx skills search (git repos) |
chub search (registry index) |
| Distribution | None (copy files) | Git repos | CDN + local folders |
| Versioning | None | None | Per-entry, per-language (docs) |
| Multi-language | None | None | Yes (docs) |
| Trust/quality | None | None | source field + config filtering |
| Build pipeline | None | None | chub build |
Makes chub content interoperable with the broader agent ecosystem. A skill fetched via chub get can be piped directly into any agent's skill directory and discovered natively.
- Registry-based search and discovery over network
- Multi-source aggregation (CDN + local folders)
- Trust/quality filtering via
sourcefield - Progressive disclosure with
--fullflag - DOC.md for reference knowledge (uses same frontmatter format)
- Build pipeline to generate registry from content directories
chub-first-draft/
โโโ cli/
โ โโโ package.json # npm package with bin entry
โ โโโ bin/chub # #!/usr/bin/env node entry point
โ โโโ src/
โ โ โโโ index.js # Commander setup, global --json, preAction cache hook
โ โ โโโ commands/
โ โ โ โโโ search.js # search / list / info (all in one)
โ โ โ โโโ get.js # get command (auto-detects doc/skill)
โ โ โ โโโ build.js # build registry from content directory
โ โ โ โโโ update.js # refresh registry / full bundle
โ โ โ โโโ cache.js # cache status / clear
โ โ โโโ lib/
โ โ โโโ config.js # Load config.yaml, merge env vars, defaults
โ โ โโโ cache.js # Registry fetch, on-demand doc fetch, bundle extract
โ โ โโโ registry.js # Load registry, search/filter/query, resolve paths
โ โ โโโ frontmatter.js # YAML frontmatter parser
โ โ โโโ output.js # Dual-mode output (human with chalk / JSON)
โ โ โโโ normalize.js # Language aliases (jsโjavascript, pyโpython)
โโโ plans-for-reference/ # Archived design plans
โโโ NARRATIVE.md # Product pitch
โโโ DESIGN.md # This file
โโโ .gitignore
โโโ package.json # Root workspace
commander^12 โ CLI frameworkchalk^5 โ Terminal colorsyaml^2 โ Config + frontmatter parsingtar^7 โ Bundle extraction (for--fullmode)- Node.js >= 18 (built-in
fetch, nonode-fetchneeded)
skills_dir/docs_dirconfig โ default output directories for skills and docs- Agent detection โ auto-detect installed agents and write to the right skill directory
chub installโ dedicated install command if the piping pattern proves too verbose- Usage telemetry โ agents report which docs/skills they used, enabling quality signals
- CI/CD integration โ GitHub Action that runs
chub buildand publishes to CDN on push
- Agent Skills specification: https://agentskills.io/specification
- Vercel Skills CLI: https://github.com/vercel-labs/skills