Skip to content

Commit 363b7b0

Browse files
authored
Merge pull request #44 from fsender/Resolve-conflict
Feat: add knowledge base, reranking, TXT/HTML support, and config merge
2 parents 3a2387b + 0ffca37 commit 363b7b0

36 files changed

+3081
-1278
lines changed

ARCHITECTURE.md

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,10 @@ This document explains the architecture of opencode-codebase-index, including da
2121
│ OpenCode Agent │
2222
│ │
2323
│ Tools: codebase_search, codebase_peek, find_similar, call_graph, │
24-
│ index_codebase, index_status, index_health_check, index_metrics, │
25-
│ index_logs │
26-
│ Commands: /search, /find, /call-graph, /index, /status │
24+
│ index_codebase, index_status, index_health_check, index_metrics, │
25+
│ index_logs, add_knowledge_base, list_knowledge_bases, │
26+
│ remove_knowledge_base │
27+
│ Commands: /search, /find, /call-graph, /index, /status │
2728
└─────────────────────────────────────────────────────────────────────────────┘
2829
2930
@@ -170,6 +171,7 @@ File system observer using chokidar:
170171
- Watches for file changes → triggers incremental index
171172
- Watches `.git/HEAD` → detects branch switches
172173
- Debounces rapid changes (500ms window)
174+
- Merges `additionalInclude` patterns with `include` patterns for proper file filtering
173175

174176
## Design Decisions
175177

@@ -223,6 +225,21 @@ BM25 hybrid provides:
223225
- Better results for technical queries
224226
- Configurable weighting (hybridWeight)
225227

228+
### Why Optimized Tool Return Formats?
229+
230+
Problem: Redundant prompt phrases in tool responses increase token usage and may cause LLMs to exit reasoning prematurely.
231+
232+
Solution:
233+
- **Remove summary phrases**: e.g., "Found X results", "Index status:", "Health check complete:"
234+
- **Return raw data**: Direct result lists without introductory text
235+
- **Maintain clarity**: Keep essential context for unambiguous results
236+
237+
Benefits:
238+
- Reduced token consumption for LLM tool calls
239+
- Faster LLM processing (less text to parse)
240+
- Better integration with LLM reasoning loops
241+
- Maintained functionality with cleaner output
242+
226243
### Why Branch-Aware Indexing?
227244

228245
Problem: Switching branches changes code but embeddings are expensive.
@@ -284,6 +301,21 @@ Benefits:
284301

285302
For a typical 500-file codebase (~5000 chunks): ~30MB total
286303

304+
### Tool Call Performance
305+
306+
Tool return formats are optimized to reduce token usage:
307+
308+
| Tool | Before Optimization | After Optimization | Token Savings |
309+
|------|---------------------|-------------------|---------------|
310+
| `codebase_search` | "Found X results for 'query': ..." | Raw result list | ~15-20 tokens |
311+
| `codebase_peek` | "Found X locations for 'query': ..." | Raw result list | ~15-20 tokens |
312+
| `find_similar` | "Found X similar code blocks: ..." | Raw result list | ~15-20 tokens |
313+
| `call_graph` | "X calls Y function(s): ..." | Raw result list | ~10-15 tokens |
314+
| `index_status` | "Index status: ..." | Raw data | ~5-10 tokens |
315+
| `formatHealthCheck` | "Health check complete: ..." | Raw data | ~5-10 tokens |
316+
317+
**Impact**: Reduces LLM context size, improves reasoning loop efficiency, and lowers API costs.
318+
287319
## Security Considerations
288320

289321
### What Gets Sent to Cloud
@@ -321,6 +353,7 @@ No credentials are stored by the plugin.
321353
- `ts_language()` match arm
322354
- `is_comment_node()` patterns
323355
- `is_semantic_node()` patterns
356+
- Note: Recursion depth is limited to 1024 levels to prevent stack overflow
324357
4. Add tests in `native/src/parser.rs`
325358

326359
### Adding a New Embedding Provider

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Added
11+
- **Knowledge base support**: Added `add_knowledge_base`, `list_knowledge_bases`, and `remove_knowledge_base` tools to manage external document folders indexed alongside the project
12+
- **Reranking with SiliconFlow**: Added `BAAI/bge-reranker-v2-m3` reranking support via SiliconFlow API for improved search result quality
13+
- **TXT/HTML file support**: Added `*.txt`, `*.html`, `*.htm` to default include patterns for document indexing
14+
- **Config merging**: Global and project configs are now merged, allowing shared provider settings at global level and knowledge base paths at project level
15+
- **Hidden file exclusion**: Files and folders starting with `.` are now excluded from indexing and file watching
16+
- **Build folder exclusion**: Folders containing "build" in their name (e.g., `build`, `mingwBuildDebug`) are now excluded from indexing and file watching
17+
- **additionalInclude config**: Added new config option to extend default file patterns without replacing them
18+
19+
### Changed
20+
- **Default verbose=false**: Changed `/index` command default to `verbose=false` to reduce token consumption
21+
1022
## [0.6.1] - 2026-03-29
1123

1224
### Added

0 commit comments

Comments
 (0)