Skip to content

Feat: add knowledge base, reranking, TXT/HTML support, and config merge#44

Merged
Helweg merged 22 commits intoHelweg:mainfrom
fsender:Resolve-conflict
Apr 11, 2026
Merged

Feat: add knowledge base, reranking, TXT/HTML support, and config merge#44
Helweg merged 22 commits intoHelweg:mainfrom
fsender:Resolve-conflict

Conversation

@fsender
Copy link
Copy Markdown
Contributor

@fsender fsender commented Apr 2, 2026

Summary

This PR adds several new features to enhance the plugin's capabilities:

1. Knowledge Base Support

Add the ability to index external documentation folders alongside project code:

  • New tools: add_knowledge_base, list_knowledge_bases, remove_knowledge_base
  • Configuration: Add knowledgeBases array in config to specify external directories
  • Use cases: Index API references, tutorials, example programs, hardware documentation
  • Same database: All knowledge base content is indexed into the same database as project code

Example config:

{
  "knowledgeBases": [
    "/home/user/docs/my-doc",
    "/home/user/docs/my-API-references"
  ]
}

2. Reranking API Integration

Add API-based reranking for improved search result quality:

  • New reranker config section
  • Support for SiliconFlow BAAI/bge-reranker-v2-m3 and other OpenAI-compatible endpoints
  • Configurable via topN, timeoutMs options

Example config:

{
  "reranker": {
    "enabled": true,
    "baseUrl": "https://api.siliconflow.cn/v1",
    "model": "BAAI/bge-reranker-v2-m3",
    "apiKey": "{env:SILICONFLOW_API_KEY}",
    "topN": 20
  }
}

3. Config Merging

Global and project configs are now merged:

  • Global config (~/.config/opencode/codebase-index.json) as base
  • Project config (.opencode/codebase-index.json) overrides
  • Arrays (knowledgeBases, additionalInclude) are merged (union, deduplicated)

4. File Pattern Improvements

  • New additionalInclude option to extend defaults without replacing
  • Added TXT/HTML/HTM to default include patterns
  • Exclude hidden files/folders (starting with .)
  • Exclude build folders (containing "build" in name)

5. Token Efficiency

  • Changed /index command default to verbose=false to reduce token consumption

Files Changed

  • src/config/schema.ts - Add reranker, knowledgeBases, additionalInclude
  • src/config/constants.ts - Update include/exclude patterns
  • src/index.ts - Config merging logic
  • src/indexer/index.ts - Integrate reranker and knowledge bases
  • src/utils/files.ts - Knowledge base file collection
  • src/rerank/index.ts - New reranker module
  • src/tools/index.ts - New knowledge base tools
  • skill/SKILL.md - Updated with new features
  • README.md - Documentation updates
  • tests/config.test.ts - Updated tests

Testing

All tests pass (492/492):

> opencode-codebase-index@0.6.1 test:run
> vitest run

Test Files  25 passed (25)
Tests  492 passed (492)

- Add knowledge base tools (add/list/remove_knowledge_base)
- Add SiliconFlow reranking with BAAI/bge-reranker-v2-m3
- Add TXT/HTML/HTM file format support
- Add global/project config merging for embedding provider
- Add additionalInclude config option
- Exclude hidden files and build folders from indexing
- Update SKILL.md and README.md documentation
- Fix test assertion for exclude patterns count
@github-actions github-actions bot added dependencies Dependency updates documentation Documentation changes test Test changes labels Apr 2, 2026
@Helweg
Copy link
Copy Markdown
Owner

Helweg commented Apr 2, 2026

Solid set of features — knowledge bases, reranking, config merging, and the file pattern improvements are well thought out.

A couple things I noticed:

Watcher doesn't account for additionalInclude patterns

The indexer merges additionalInclude into include patterns before collecting files:

const includePatterns = [...this.config.include, ...this.config.additionalInclude];

But FileWatcher.handleChange only passes this.config.include to shouldIncludeFile (line 70). If someone adds "**/*.csv" to additionalInclude, those files get indexed on full reindex but the watcher won't pick up changes to them.

add_knowledge_base / remove_knowledge_base reindex parameter is misleading

When reindex=true, the tool outputs:

Reindexing... (restart OpenCode to pick up the new config, then run /index)

It doesn't actually trigger reindexing — the reindex=true and reindex=false paths produce nearly identical outcomes. Might be cleaner to drop the parameter and just tell the user what to do, or note it as a TODO for when live config reload is supported.

Nit: rebuild exception in shouldIncludeFile is dead code

shouldIncludeFile carves out an exception for "rebuild" in the build-folder check, but walkDirectory and the watcher's ignored callback both exclude build-substring directories before shouldIncludeFile ever runs on their contents. The exception can never be reached.

User added 4 commits April 3, 2026 01:17
- Fix watcher to account for additionalInclude patterns
- Remove misleading reindex parameter from knowledge base tools
- Remove dead code in shouldIncludeFile
- Add error handling to plugin initialization to prevent opencode startup failure
- Add error handling to native module loading to prevent crashes
- Restore AVX-512 SIMD for production, disable only in tests via RUSTFLAGS
@Helweg
Copy link
Copy Markdown
Owner

Helweg commented Apr 2, 2026

Thanks for the updates — the recursion depth limit in extract_semantic_nodes is a good fix for the segfault, and expanding the semantic node types for JS/TS makes sense.

The three items from my earlier review still appear to be open in the current diff:

  1. Watcher not merging additionalInclude into include patterns
  2. reindex parameter on add_knowledge_base / remove_knowledge_base not actually triggering reindex
  3. Dead rebuild exception in shouldIncludeFile

Are these on your radar, or intentionally deferred?

- Add recursion depth detection (4096 limit) to prevent stack overflow in parser
- Skip custom knowledge base folders from file watcher to reduce watch overhead
- Optimize tool return formats by removing redundant prompt phrases
- Fix test expectations to match new tool output formats
- Add performance profiling for semantic node extraction
- Limit leading comment traversal to 5 siblings for better performance

This addresses issues reported by the repository author including:
1. File watcher intentionally excludes custom knowledge base folders
2. Iteration depth detection prevents segmentation faults
3. Other fixes for reported issues
4. Optimized tool call prompts for better LLM integration
@fsender
Copy link
Copy Markdown
Contributor Author

fsender commented Apr 2, 2026

I have updated this PR and have fixed and tested everything except "Watcher doesn't account for additionalInclude patterns."
For performance and saving-RAM reasons, monitoring a large number of files wastes memory, and files imported as knowledgebases, APIs and documents are rarely updated dynamically during use, so monitoring them makes little sense.
I also found that the iteration depth of the scan might be too deep and cause the stack to overflow and crash. I have fixed it and set the iteration depth to a maximum of 4096.
In addition, modifiers included in the returned content of tool calls may cause redundancy. For unambiguous tool calls, modifiers such as "Index of" and "Found cases: " are no longer present in the returned results of tool calls.

@fsender
Copy link
Copy Markdown
Contributor Author

fsender commented Apr 2, 2026

Some comments and commits are generated from OpenCode client. It created an unnecessary PR and I closed it manually. Now the current branch is latest.

@Helweg
Copy link
Copy Markdown
Owner

Helweg commented Apr 3, 2026

Thanks for the fixes on the reindex parameter and the rebuild dead code — those look good.

On the additionalInclude watcher issue — I think there might be a misunderstanding. This isn't about watching additional directories or knowledge base folders. additionalInclude matches files within the project root, which chokidar is already monitoring recursively. The include patterns don't control what chokidar watches — they only filter events after they're received in handleChange.

So right now, if someone adds "**/*.csv" to additionalInclude:

  • Full reindex picks them up ✅
  • Watcher receives change events for them (chokidar is already watching the project root) but silently drops them in shouldIncludeFile because it only checks config.include

The fix is just merging the patterns in handleChange — zero additional memory or watch scope, just one more glob check per event. Without it, the index goes stale for additionalInclude file types between full reindexes.

If the intent is "index these file types but never watch them," that's a valid design choice — but it should be a separate config option rather than an accidental mismatch between the indexer and watcher.

- Merge additionalInclude patterns with include patterns in handleChange
- Ensures files matching additionalInclude patterns are watched for changes
- Addresses PR review feedback from repository author
@fsender
Copy link
Copy Markdown
Contributor Author

fsender commented Apr 3, 2026

Thanks for the detailed review and clarification on the watcher issue. You're absolutely right — the additionalInclude patterns should be merged in handleChange to keep the index fresh between full reindexes. I've implemented the fix:

const includePatterns = [...this.config.include, ...(this.config.additionalInclude ?? [])];

This ensures files matching additionalInclude patterns are properly watched with zero additional memory overhead (chokidar is already watching the project root recursively). The fix has been pushed to the main branch.

All three issues from your review are now addressed:

  1. ✅ Watcher now accounts for additionalInclude patterns
  2. reindex parameter clarified (removed misleading functionality)
  3. ✅ Dead rebuild exception removed from shouldIncludeFile

Tests pass (492/492). Let me know if you'd like any further adjustments.

Helweg
Helweg previously approved these changes Apr 3, 2026
Copy link
Copy Markdown
Owner

@Helweg Helweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three review items addressed:

  1. Watcher now merges additionalInclude patterns in handleChange
  2. Misleading reindex parameter cleaned up
  3. Dead rebuild exception removed

No supply-chain concerns — no new dependencies, all lockfile changes are routine semver bumps. LGTM.

@Helweg
Copy link
Copy Markdown
Owner

Helweg commented Apr 3, 2026

CI is failing on typecheck:

src/tools/utils.ts(138,61): error TS6133: 'query' is declared but its value is never read.

The query parameter should be removed from the function signature (and the call site) since it's no longer used after the output cleanup.

@fsender
Copy link
Copy Markdown
Contributor Author

fsender commented Apr 3, 2026

fix CI typecheck

@fsender
Copy link
Copy Markdown
Contributor Author

fsender commented Apr 3, 2026

These verified commits are from me instead of OpenCode client.

@Helweg
Copy link
Copy Markdown
Owner

Helweg commented Apr 4, 2026

Two more issues found during deeper review of the config merging:

KB management tools only read project config, not merged config

loadPluginConfig() in src/index.ts properly merges global + project configs (with array union for knowledgeBases), and the Indexer receives this merged config. But the KB tools (add_knowledge_base, list_knowledge_bases, remove_knowledge_base) have their own loadConfig() that reads only the project config file:

function getConfigPath(): string {
  return path.join(sharedProjectRoot, ".opencode", "codebase-index.json");
}

If a user adds KBs in global config (~/.config/opencode/codebase-index.json), the Indexer will index them, but list_knowledge_bases won't show them. The tools and the Indexer have different views of the config.

CLI/MCP entrypoint doesn't merge configs

src/cli.ts isn't modified in this PR — it still uses fallback logic (project config if exists, else global config). Only the plugin entry (src/index.ts) got the merge logic. So the PR claim "global and project configs are now merged" only holds when running through the plugin host, not through the MCP/CLI path.

Entrypoint Config behavior
Plugin (src/index.ts) Merge ✅
CLI/MCP (src/cli.ts) Fallback ❌

This means the same project can behave differently depending on whether it's accessed via OpenCode plugin vs MCP client (Cursor, Claude Code, etc.).

@fsender fsender closed this Apr 4, 2026
@fsender fsender deleted the Resolve-conflict branch April 4, 2026 21:16
@fsender fsender restored the Resolve-conflict branch April 4, 2026 21:19
@fsender fsender reopened this Apr 4, 2026
@fsender
Copy link
Copy Markdown
Contributor Author

fsender commented Apr 4, 2026

Oops, OpenCode closed my PR branch while working on my Git workflow. It mistakenly thought that the new "Resolve-conflict" branch should be deleted after committing! I urgently restored the branch and reopened the PR. I have reopened the branch via GitHub on the webpage and apologize for the OpenCode operation. I have tried to solve compile and test for the two issues you raised.

@fsender
Copy link
Copy Markdown
Contributor Author

fsender commented Apr 6, 2026

These commits have fixed an OOM issue during my usage.

@Helweg
Copy link
Copy Markdown
Owner

Helweg commented Apr 8, 2026

I pulled the latest PR head, rebuilt locally, and re-tested this in a scratch workspace.

I’m still seeing a blocking issue in the real parse/index path:

  • .html files are chunked and indexed
  • .txt files still produce 0 chunks
  • .md knowledge-base files still produce 0 chunks

So TXT/HTML support is still only partially working from end-to-end local verification.

@fsender
Copy link
Copy Markdown
Contributor Author

fsender commented Apr 9, 2026

Now I have fixed the TXT/MD chunking issue and commited into the PR branch. Branch fsender/opencode-codebase-index/tree/main is my working branch. Tested and worked on my knowledgebase. This feature may require a lot of changes to the underlying architecture code. Bugs and issues are still inevitable during real machine testing. Please test again.

@Helweg Helweg merged commit 363b7b0 into Helweg:main Apr 11, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Dependency updates documentation Documentation changes test Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants