Skip to content

Commit 9b41856

Browse files
committed
feat: Implement optional version handling and improve CLI
This commit introduces support for storing and querying documentation without an explicit version, alongside significant improvements to version handling logic and CLI usability. **Core Changes:** * **Optional Versions:** Modified database schema and services (`DocumentManagementService`, `DocumentRetrieverService`) to handle optional versions. `null` or `undefined` versions are now treated internally as an empty string (`""`) to represent unversioned documents. * **Schema Update:** Changed `documents.version` column to `TEXT NOT NULL DEFAULT ''`. * **Service Layer:** Updated methods in services to accept optional version parameters and perform normalization (`null`/`undefined` -> `""`). `listLibraries` now filters out the internal empty string version. * **Scraping Validation:** Implemented stricter version validation in `ScrapeTool`. It now accepts full semver (`1.2.3`), pre-release (`1.2.3-beta`), and partial (`1`, `1.2` coerced to `1.0.0`, `1.2.0`) versions, or no version (for unversioned). Ranges (`1.x`) and invalid formats are rejected during scraping. * **Version Finding:** Refactored `DocumentManagementService.findBestVersion` to return both the `bestMatch` (semver, with fallback to older versions) and `hasUnversioned` (boolean). Updated `FindVersionTool` and the `find-version` CLI command to provide more informative output based on this result. * **Search Tool:** Fixed a bug in `SearchTool` where the entire result object from `findBestVersion` was incorrectly passed to `searchStore`. * **CLI Consistency:** Updated `scrape`, `search`, and `find-version` CLI commands to use a consistent optional `--version` flag instead of positional arguments for the version. * **Documentation:** Updated `README.md` with a project summary, detailed explanations of the new version handling logic, and updated CLI command examples. * **Testing:** Added and updated tests to cover optional version handling, fallback logic, and validation, including fixing previous test failures.
1 parent 7899231 commit 9b41856

12 files changed

+588
-142
lines changed

README.md

Lines changed: 110 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# docs-mcp-server MCP Server
22

3+
This project provides a Model Context Protocol (MCP) server designed to scrape, process, index, and search documentation for various software libraries and packages. It fetches content from specified URLs, splits it into meaningful chunks using semantic splitting techniques, generates vector embeddings using OpenAI, and stores the data in an SQLite database. The server utilizes `sqlite-vec` for efficient vector similarity search and FTS5 for full-text search capabilities, combining them for hybrid search results. It supports versioning, allowing documentation for different library versions (including unversioned content) to be stored and queried distinctly. The server exposes MCP tools for scraping (`scrape_docs`), searching (`search_docs`), listing indexed libraries (`list_libraries`), and finding appropriate versions (`find_version`). A companion CLI (`docs-mcp`) is also included for local management and interaction.
4+
35
A MCP server for fetching and searching 3rd party package documentation
46

57
## CLI Usage
@@ -13,16 +15,118 @@ docs-mcp --help
1315
# Show help for a specific command
1416
docs-mcp scrape --help
1517
docs-mcp search --help
18+
docs-mcp find-version --help
19+
```
20+
21+
### Scraping Documentation (`scrape`)
22+
23+
Scrapes and indexes documentation from a given URL for a specific library.
24+
25+
```bash
26+
docs-mcp scrape <library> <url> [options]
27+
```
28+
29+
**Options:**
30+
31+
- `-v, --version <string>`: The specific version to associate with the scraped documents.
32+
- Accepts full versions (`1.2.3`), pre-release versions (`1.2.3-beta.1`), or partial versions (`1`, `1.2` which are expanded to `1.0.0`, `1.2.0`).
33+
- If omitted, the documentation is indexed as **unversioned**.
34+
- `-p, --max-pages <number>`: Maximum pages to scrape (default: 100).
35+
- `-d, --max-depth <number>`: Maximum navigation depth (default: 3).
36+
- `-c, --max-concurrency <number>`: Maximum concurrent requests (default: 3).
37+
- `--ignore-errors`: Ignore errors during scraping (default: true).
38+
39+
**Examples:**
40+
41+
```bash
42+
# Scrape React 18.2.0 docs
43+
docs-mcp scrape react --version 18.2.0 https://react.dev/
44+
45+
# Scrape React docs without a specific version (indexed as unversioned)
46+
docs-mcp scrape react https://react.dev/
47+
48+
# Scrape partial version (will be stored as 7.0.0)
49+
docs-mcp scrape semver --version 7 https://github.com/npm/node-semver
50+
51+
# Scrape pre-release version
52+
docs-mcp scrape mylib --version 2.0.0-rc.1 https://mylib.com/docs
53+
```
54+
55+
### Searching Documentation (`search`)
56+
57+
Searches the indexed documentation for a library, optionally filtering by version.
58+
59+
```bash
60+
docs-mcp search <library> <query> [options]
1661
```
1762

18-
### Version Handling
63+
**Options:**
64+
65+
- `-v, --version <string>`: The target version or range to search within.
66+
- Supports exact versions (`18.0.0`), partial versions (`18`), or ranges (`18.x`).
67+
- If omitted, searches the **latest** available indexed version.
68+
- If a specific version/range doesn't match, it falls back to the latest indexed version _older_ than the target.
69+
- To search **only unversioned** documents, explicitly pass an empty string: `--version ""`. (Note: Omitting `--version` searches latest, which _might_ be unversioned if no other versions exist).
70+
- `-l, --limit <number>`: Maximum number of results (default: 5).
71+
- `-e, --exact-match`: Only match the exact version specified (disables fallback and range matching) (default: false).
72+
73+
**Examples:**
74+
75+
```bash
76+
# Search latest React docs for 'hooks'
77+
docs-mcp search react 'hooks'
78+
79+
# Search React 18.x docs for 'hooks'
80+
docs-mcp search react --version 18.x 'hooks'
81+
82+
# Search React 17 docs (will match 17.x.x or older if 17.x.x not found)
83+
docs-mcp search react --version 17 'hooks'
84+
85+
# Search only React 18.0.0 docs
86+
docs-mcp search react --version 18.0.0 --exact-match 'hooks'
87+
88+
# Search only unversioned React docs
89+
docs-mcp search react --version "" 'hooks'
90+
```
91+
92+
### Finding Available Versions (`find-version`)
93+
94+
Checks the index for the best matching version for a library based on a target, and indicates if unversioned documents exist.
95+
96+
```bash
97+
docs-mcp find-version <library> [options]
98+
```
99+
100+
**Options:**
101+
102+
- `-v, --version <string>`: The target version or range. If omitted, finds the latest available version.
103+
104+
**Examples:**
105+
106+
```bash
107+
# Find the latest indexed version for react
108+
docs-mcp find-version react
109+
110+
# Find the best match for react version 17.x
111+
docs-mcp find-version react --version 17.x
112+
113+
# Find the best match for react version 17.0.0 (may fall back to older)
114+
docs-mcp find-version react --version 17.0.0
115+
```
116+
117+
### Listing Libraries (`list-libraries`)
118+
119+
Lists all libraries currently indexed in the store.
120+
121+
```bash
122+
docs-mcp list-libraries
123+
```
19124

20-
This server supports partial version matching, selecting the best available version based on these rules:
125+
### Version Handling Summary
21126

22-
- If no version is specified, the latest indexed version is used.
23-
- If a full version (e.g., `1.2.3`) is specified, that exact version is used, if available.
24-
- If a partial version (e.g., `1.2`) is specified, the latest matching version (e.g., `1.2.5`) is used.
25-
- If the specified version (full or partial) is not found, the server will attempt to find the closest preceding version.
127+
- **Scraping:** Requires a specific, valid version (`X.Y.Z`, `X.Y.Z-pre`, `X.Y`, `X`) or no version (for unversioned docs). Ranges (`X.x`) are invalid for scraping.
128+
- **Searching/Finding:** Accepts specific versions, partials, or ranges (`X.Y.Z`, `X.Y`, `X`, `X.x`). Falls back to the latest older version if the target doesn't match. Omitting the version targets the latest available. Explicitly searching `--version ""` targets unversioned documents.
129+
- **Unversioned Docs:** Libraries can have documentation stored without a specific version (by omitting `--version` during scrape). These can be searched explicitly using `--version ""`. The `find-version` command will also report if unversioned docs exist alongside any semver matches.
26130

27131
## Development
28132

src/cli.ts

Lines changed: 29 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -33,17 +33,19 @@ async function main() {
3333
.version("1.0.0");
3434

3535
program
36-
.command("scrape <library> <version> <url>")
36+
.command("scrape <library> <url>") // Remove <version> as positional
3737
.description("Scrape and index documentation from a URL")
38+
.option("-v, --version <string>", "Version of the library (optional)") // Add optional version flag
3839
.option("-p, --max-pages <number>", "Maximum pages to scrape", "100")
3940
.option("-d, --max-depth <number>", "Maximum navigation depth", "3")
4041
.option("-c, --max-concurrency <number>", "Maximum concurrent page requests", "3")
4142
.option("--ignore-errors", "Ignore errors during scraping", true)
42-
.action(async (library, version, url, options) => {
43+
.action(async (library, url, options) => {
44+
// Update action parameters
4345
const result = await tools.scrape.execute({
4446
url,
4547
library,
46-
version,
48+
version: options.version, // Get version from options
4749
options: {
4850
maxPages: Number.parseInt(options.maxPages),
4951
maxDepth: Number.parseInt(options.maxDepth),
@@ -55,24 +57,29 @@ async function main() {
5557
});
5658

5759
program
58-
.command("search <library> <version> <query>")
60+
.command("search <library> <query>") // Remove <version> as positional
5961
.description(
6062
"Search documents in a library. Version matching examples:\n" +
61-
" - search react 18.0.0 'hooks' -> matches docs for React 18.0.0 or earlier versions\n" +
62-
" - search react 18.0.0 'hooks' --exact-match -> only matches React 18.0.0\n" +
63-
" - search typescript 5.x 'types' -> matches any TypeScript 5.x.x version\n" +
64-
" - search typescript 5.2.x 'types' -> matches any TypeScript 5.2.x version",
63+
" - search react --version 18.0.0 'hooks' -> matches docs for React 18.0.0 or earlier versions\n" +
64+
" - search react --version 18.0.0 'hooks' --exact-match -> only matches React 18.0.0\n" +
65+
" - search typescript --version 5.x 'types' -> matches any TypeScript 5.x.x version\n" +
66+
" - search typescript --version 5.2.x 'types' -> matches any TypeScript 5.2.x version",
67+
)
68+
.option(
69+
"-v, --version <string>", // Add optional version flag
70+
"Version of the library (optional, supports ranges)",
6571
)
6672
.option("-l, --limit <number>", "Maximum number of results", "5")
6773
.option(
6874
"-e, --exact-match",
6975
"Only use exact version match (e.g., '18.0.0' matches only 18.0.0, not 17.x.x) (default: false)",
7076
false,
7177
)
72-
.action(async (library, version, query, options) => {
78+
.action(async (library, query, options) => {
79+
// Update action parameters
7380
const result = await tools.search.execute({
7481
library,
75-
version,
82+
version: options.version, // Get version from options
7683
query,
7784
limit: Number.parseInt(options.limit),
7885
exactMatch: options.exactMatch,
@@ -89,17 +96,22 @@ async function main() {
8996
});
9097

9198
program
92-
.command("find-version <library> [targetVersion]")
99+
.command("find-version <library>") // Remove [targetVersion] positional
93100
.description("Find the best matching version for a library")
94-
.action(async (library, targetVersion) => {
95-
const version = await tools.findVersion.execute({
101+
.option(
102+
"-v, --version <string>", // Add optional version flag
103+
"Target version to match (optional, supports ranges)",
104+
)
105+
.action(async (library, options) => { // Update action parameters
106+
const versionInfo = await tools.findVersion.execute({
96107
library,
97-
targetVersion,
108+
targetVersion: options.version, // Get version from options
98109
});
99-
if (!version) {
100-
throw new Error("No matching version found");
110+
// findVersion.execute now returns a string, handle potential error messages within it
111+
if (!versionInfo) { // Should not happen with current tool logic, but good practice
112+
throw new Error("Failed to get version information");
101113
}
102-
console.log(version);
114+
console.log(versionInfo); // Log the descriptive string from the tool
103115
});
104116

105117
await program.parseAsync();

src/mcp/index.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ export async function startServer() {
4747
{
4848
url: z.string().url().describe("URL of the documentation to scrape"),
4949
library: z.string().describe("Name of the library"),
50-
version: z.string().describe("Version of the library"),
50+
version: z.string().optional().describe("Version of the library"),
5151
maxPages: z
5252
.number()
5353
.optional()

0 commit comments

Comments
 (0)