Skip to content

Commit a2708a4

Browse files
committed
feat: branch-aware indexing with SQLite storage
- Add SQLite database for storing embeddings and chunks by content hash - Embeddings are deduplicated across branches - switching branches reuses existing embeddings - Search results are filtered to only include chunks on current branch - Auto-migrate legacy indexes on first run - Add git branch detection and HEAD watcher for automatic re-indexing on branch switch - Fix glob pattern to match root-level files (e.g., **/*.js now matches file.js) - Add health check garbage collection for orphaned embeddings and chunks - Fix lint errors: replace require() with ES imports - Update README with accurate Quick Start instructions Storage structure: .opencode/index/ ├── codebase.db # SQLite: embeddings, chunks, branch catalog ├── vectors.usearch # Vector index (uSearch) ├── inverted-index.json # BM25 keyword index └── file-hashes.json # File change detection
1 parent 987a81b commit a2708a4

File tree

15 files changed

+1965
-34
lines changed

15 files changed

+1965
-34
lines changed

README.md

Lines changed: 54 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
- 🧠 **Semantic Search**: Finds "user authentication" logic even if the function is named `check_creds`.
1616
-**Blazing Fast Indexing**: Powered by a Rust native module using `tree-sitter` and `usearch`. Incremental updates take milliseconds.
17+
- 🌿 **Branch-Aware**: Seamlessly handles git branch switches — reuses embeddings, filters stale results.
1718
- 🔒 **Privacy Focused**: Your vector index is stored locally in your project.
1819
- 🔌 **Model Agnostic**: Works out-of-the-box with GitHub Copilot, OpenAI, Gemini, or local Ollama models.
1920

@@ -31,11 +32,12 @@
3132
}
3233
```
3334

34-
3. **Start Searching**
35-
Load OpenCode and ask:
36-
> "Find the function that handles credit card validation errors"
35+
3. **Index your codebase**
36+
Run `/index` or ask the agent to index your codebase. This only needs to be done once — subsequent updates are incremental.
3737

38-
*The plugin will automatically index your codebase on the first run.*
38+
4. **Start Searching**
39+
Ask:
40+
> "Find the function that handles credit card validation errors"
3941
4042
## 🔍 See It In Action
4143

@@ -98,13 +100,16 @@ graph TD
98100
A[Source Code] -->|Tree-sitter| B[Semantic Chunks]
99101
B -->|Embedding Model| C[Vectors]
100102
C -->|uSearch| D[(Vector Store)]
103+
C -->|SQLite| G[(Embeddings DB)]
101104
B -->|BM25| E[(Inverted Index)]
105+
B -->|Branch Catalog| G
102106
end
103107
104108
subgraph Searching
105109
Q[User Query] -->|Embedding Model| V[Query Vector]
106110
V -->|Cosine Similarity| D
107111
Q -->|BM25| E
112+
G -->|Branch Filter| F
108113
D --> F[Hybrid Fusion]
109114
E --> F
110115
F --> R[Ranked Results]
@@ -114,14 +119,52 @@ graph TD
114119
1. **Parsing**: We use `tree-sitter` to intelligently parse your code into meaningful blocks (functions, classes, interfaces). JSDoc comments and docstrings are automatically included with their associated code.
115120
2. **Chunking**: Large blocks are split with overlapping windows to preserve context across chunk boundaries.
116121
3. **Embedding**: These blocks are converted into vector representations using your configured AI provider.
117-
4. **Storage**: Vectors are stored in a high-performance local index using `usearch` with F16 quantization for 50% memory savings.
118-
5. **Hybrid Search**: Combines semantic similarity (vectors) with BM25 keyword matching for best results.
122+
4. **Storage**: Embeddings are stored in SQLite (deduplicated by content hash) and vectors in `usearch` with F16 quantization for 50% memory savings. A branch catalog tracks which chunks exist on each branch.
123+
5. **Hybrid Search**: Combines semantic similarity (vectors) with BM25 keyword matching, filtered by current branch.
119124

120125
**Performance characteristics:**
121126
- **Incremental indexing**: ~50ms check time — only re-embeds changed files
122127
- **Smart chunking**: Understands code structure to keep functions whole, with overlap for context
123128
- **Native speed**: Core logic written in Rust for maximum performance
124129
- **Memory efficient**: F16 vector quantization reduces index size by 50%
130+
- **Branch-aware**: Automatically tracks which chunks exist on each git branch
131+
132+
## 🌿 Branch-Aware Indexing
133+
134+
The plugin automatically detects git branches and optimizes indexing across branch switches.
135+
136+
### How It Works
137+
138+
When you switch branches, code changes but embeddings for unchanged content remain the same. The plugin:
139+
140+
1. **Stores embeddings by content hash**: Embeddings are deduplicated across branches
141+
2. **Tracks branch membership**: A lightweight catalog tracks which chunks exist on each branch
142+
3. **Filters search results**: Queries only return results relevant to the current branch
143+
144+
### Benefits
145+
146+
| Scenario | Without Branch Awareness | With Branch Awareness |
147+
|----------|-------------------------|----------------------|
148+
| Switch to feature branch | Re-index everything | Instant — reuse existing embeddings |
149+
| Return to main | Re-index everything | Instant — catalog already exists |
150+
| Search on branch | May return stale results | Only returns current branch's code |
151+
152+
### Automatic Behavior
153+
154+
- **Branch detection**: Automatically reads from `.git/HEAD`
155+
- **Re-indexing on switch**: Triggers when you switch branches (via file watcher)
156+
- **Legacy migration**: Automatically migrates old indexes on first run
157+
- **Garbage collection**: Health check removes orphaned embeddings and chunks
158+
159+
### Storage Structure
160+
161+
```
162+
.opencode/index/
163+
├── codebase.db # SQLite: embeddings, chunks, branch catalog
164+
├── vectors.usearch # Vector index (uSearch)
165+
├── inverted-index.json # BM25 keyword index
166+
└── file-hashes.json # File change detection
167+
```
125168

126169
## 🧰 Tools Available
127170

@@ -151,7 +194,7 @@ Manually trigger indexing.
151194
Checks if the index is ready and healthy.
152195

153196
### `index_health_check`
154-
Maintenance tool to remove stale entries from deleted files.
197+
Maintenance tool to remove stale entries from deleted files and orphaned embeddings/chunks from the database.
155198

156199
## 🎮 Slash Commands
157200

@@ -263,12 +306,13 @@ CI will automatically run tests and type checking on your PR.
263306
│ ├── config/ # Configuration schema
264307
│ ├── embeddings/ # Provider detection and API calls
265308
│ ├── indexer/ # Core indexing logic + inverted index
309+
│ ├── git/ # Git utilities (branch detection)
266310
│ ├── tools/ # OpenCode tool definitions
267311
│ ├── utils/ # File collection, cost estimation
268312
│ ├── native/ # Rust native module wrapper
269-
│ └── watcher/ # File change watcher
313+
│ └── watcher/ # File/git change watcher
270314
├── native/
271-
│ └── src/ # Rust: tree-sitter, usearch, xxhash
315+
│ └── src/ # Rust: tree-sitter, usearch, xxhash, SQLite
272316
├── tests/ # Unit tests (vitest)
273317
├── commands/ # Slash command definitions
274318
├── skill/ # Agent skill guidance
@@ -280,6 +324,7 @@ CI will automatically run tests and type checking on your PR.
280324
The Rust native module handles performance-critical operations:
281325
- **tree-sitter**: Language-aware code parsing with JSDoc/docstring extraction
282326
- **usearch**: High-performance vector similarity search with F16 quantization
327+
- **SQLite**: Persistent storage for embeddings, chunks, and branch catalog
283328
- **BM25 inverted index**: Fast keyword search for hybrid retrieval
284329
- **xxhash**: Fast content hashing for change detection
285330

native/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ tree-sitter-json = "0.20"
2222

2323
usearch = "2.15"
2424

25+
rusqlite = { version = "0.31", features = ["bundled"] }
26+
2527
xxhash-rust = { version = "0.8", features = ["xxh3"] }
2628

2729
serde = { version = "1.0", features = ["derive"] }

0 commit comments

Comments
 (0)