-
Notifications
You must be signed in to change notification settings - Fork 4.4k
09.2 Hybrid Search
Relevant source files
The following files were used as context for generating this wiki page:
This page documents ZeroClaw's hybrid search implementation, which combines vector similarity search with keyword-based full-text search to provide robust memory recall. The system uses SQLite as the backing store with custom implementations for both search modes, requiring no external dependencies like Pinecone, Elasticsearch, or LangChain.
For information about available memory backends and how to configure them, see Memory Backends. For memory export and backup functionality, see Memory Snapshot.
ZeroClaw's hybrid search operates as a two-stage retrieval system that merges results from both vector and keyword search paths using configurable weights.
flowchart TB
Query["User Query / Recall Request"]
subgraph "Parallel Search Execution"
VectorPath["Vector Search Path"]
KeywordPath["Keyword Search Path"]
end
subgraph "Vector Search"
Embed["Generate Query Embedding<br/>(EmbeddingProvider)"]
VectorDB["SQLite BLOB Storage<br/>Cosine Similarity Scan"]
VectorResults["Vector Results<br/>(similarity scores)"]
end
subgraph "Keyword Search"
Tokenize["Tokenize Query<br/>(FTS5 tokenizer)"]
FTS5["FTS5 Virtual Table<br/>BM25 Ranking"]
KeywordResults["Keyword Results<br/>(BM25 scores)"]
end
subgraph "Merge Layer"
Normalize["Normalize Scores<br/>(0.0 - 1.0 range)"]
Weight["Apply Weights<br/>vector_weight * vec_score +<br/>keyword_weight * kw_score"]
Dedupe["Deduplicate & Sort<br/>by Combined Score"]
end
FinalResults["Ranked Memory Chunks"]
Query --> VectorPath
Query --> KeywordPath
VectorPath --> Embed
Embed --> VectorDB
VectorDB --> VectorResults
KeywordPath --> Tokenize
Tokenize --> FTS5
FTS5 --> KeywordResults
VectorResults --> Normalize
KeywordResults --> Normalize
Normalize --> Weight
Weight --> Dedupe
Dedupe --> FinalResults
Sources: README.md:330-345
The vector search path begins by converting the query text into a high-dimensional embedding vector using an EmbeddingProvider implementation. The provider is configurable via memory.embedding_provider:
| Provider Value | Behavior |
|---|---|
"none" |
No-op provider; vector search disabled (keyword-only mode) |
"openai" |
Uses OpenAI's embedding API (requires OPENAI_API_KEY) |
"custom:https://..." |
Custom OpenAI-compatible embedding endpoint |
Sources: README.md:346-377
Embeddings are stored as BLOBs directly in SQLite tables, typically in a memories or similar table structure. Each row contains:
- id: Unique memory chunk identifier
- content: Original text content
- embedding: Binary-encoded vector (typically float32 array)
- metadata: Timestamps, tags, source information
Vector retrieval performs a full table scan computing cosine similarity between the query embedding and each stored embedding:
similarity(A, B) = (A · B) / (||A|| * ||B||)
Results are ranked by similarity score (1.0 = identical, -1.0 = opposite, 0.0 = orthogonal).
Sources: README.md:336-337
The keyword search path leverages SQLite's FTS5 (Full-Text Search 5) extension, which creates inverted indices for fast text retrieval. FTS5 tables mirror the structure of the main memory table but are optimized for tokenized search.
Sources: README.md:338-339
FTS5 uses the BM25 (Best Match 25) algorithm to score document relevance. BM25 improves on TF-IDF by:
- Term Frequency Saturation: Logarithmic scaling prevents repeated terms from dominating scores
- Document Length Normalization: Shorter documents aren't penalized unfairly
- Inverse Document Frequency: Rare terms score higher than common terms
The FTS5 query syntax supports:
- Boolean operators:
AND,OR,NOT - Phrase matching:
"exact phrase" - Proximity search:
NEAR(term1 term2, distance) - Prefix matching:
term*
Sources: README.md:338-339
The merge layer combines vector and keyword results using a weighted linear combination. This is implemented in a custom merge function.
flowchart LR
VectorScores["Vector Scores<br/>(0.0 - 1.0)"]
KeywordScores["Keyword Scores<br/>(0.0 - 1.0)"]
VectorWeight["vector_weight<br/>(default: 0.7)"]
KeywordWeight["keyword_weight<br/>(default: 0.3)"]
Combine["Combined Score =<br/>vec_weight * vec_score +<br/>kw_weight * kw_score"]
Sort["Sort by Combined Score<br/>(descending)"]
VectorScores --> Combine
KeywordScores --> Combine
VectorWeight --> Combine
KeywordWeight --> Combine
Combine --> Sort
The custom weighted merge function in vector.rs performs:
- Score Normalization: Both vector similarity and BM25 scores are normalized to [0.0, 1.0]
-
Weight Application: Applies
vector_weightandkeyword_weightfrom config - Deduplication: Merges duplicate results (same memory chunk ID) by taking max combined score
- Sorting: Orders final results by combined score descending
Weights are configured in ~/.zeroclaw/config.toml:
[memory]
vector_weight = 0.7 # Emphasize semantic similarity
keyword_weight = 0.3 # De-emphasize exact keyword matchesTuning Guidelines:
| Use Case | vector_weight | keyword_weight | Rationale |
|---|---|---|---|
| Semantic search | 0.8 - 1.0 | 0.0 - 0.2 | Prioritize meaning over exact terms |
| Technical queries | 0.5 | 0.5 | Balance semantic and exact matches |
| Exact recall | 0.0 - 0.2 | 0.8 - 1.0 | Prioritize keyword precision |
Sources: README.md:340-341, README.md:346-377
classDiagram
class EmbeddingProvider {
<<trait>>
+embed(text: String) Result~Vec~f32~~
+embed_batch(texts: Vec~String~) Result~Vec~Vec~f32~~~
}
class OpenAIEmbedding {
-api_key: String
-model: String
+embed(text) Result~Vec~f32~~
+embed_batch(texts) Result~Vec~Vec~f32~~~
}
class CustomEmbedding {
-endpoint: String
-api_key: Option~String~
+embed(text) Result~Vec~f32~~
+embed_batch(texts) Result~Vec~Vec~f32~~~
}
class NoopEmbedding {
+embed(text) Result~Vec~f32~~
+embed_batch(texts) Result~Vec~Vec~f32~~~
}
EmbeddingProvider <|.. OpenAIEmbedding
EmbeddingProvider <|.. CustomEmbedding
EmbeddingProvider <|.. NoopEmbedding
| Provider | Configuration | Behavior |
|---|---|---|
| OpenAI | embedding_provider = "openai" |
Uses text-embedding-ada-002 model via OpenAI API |
| Custom | embedding_provider = "custom:https://..." |
OpenAI-compatible embedding endpoint |
| Noop | embedding_provider = "none" |
Returns empty vector; disables vector search |
Embedding providers resolve API keys using the same priority order as LLM providers:
- Explicit
api_keyin memory config - Provider-specific environment variable (e.g.,
OPENAI_API_KEY) - Generic
API_KEYenvironment variable - Encrypted secret store (
~/.zeroclaw/.secret_key)
Sources: README.md:342-343
To minimize redundant API calls and improve recall latency, ZeroClaw maintains an LRU (Least Recently Used) cache for embeddings.
The embedding_cache table in SQLite stores:
| Column | Type | Description |
|---|---|---|
text_hash |
TEXT PRIMARY KEY | SHA-256 hash of input text |
embedding |
BLOB | Binary-encoded vector |
created_at |
TIMESTAMP | Cache entry creation time |
accessed_at |
TIMESTAMP | Last access time (for LRU) |
flowchart TD
Query["Memory Recall Query"]
Hash["Compute SHA-256<br/>of query text"]
CacheLookup["SELECT FROM<br/>embedding_cache<br/>WHERE text_hash = ?"]
CacheHit{Cache Hit?}
UpdateAccess["UPDATE accessed_at<br/>SET NOW()"]
ReturnCached["Return Cached<br/>Embedding"]
CallProvider["Call EmbeddingProvider<br/>embed(text)"]
InsertCache["INSERT INTO<br/>embedding_cache<br/>(or replace if full)"]
ReturnNew["Return New<br/>Embedding"]
Query --> Hash
Hash --> CacheLookup
CacheLookup --> CacheHit
CacheHit -->|Yes| UpdateAccess
UpdateAccess --> ReturnCached
CacheHit -->|No| CallProvider
CallProvider --> InsertCache
InsertCache --> ReturnNew
When the cache reaches capacity (configurable, typically 10,000 entries):
- Identify the row with the oldest
accessed_attimestamp - Delete that row
- Insert the new embedding
This ensures frequently queried embeddings remain cached while stale entries are evicted.
Sources: README.md:344-345
Before storing memories, long-form content is split into chunks to improve retrieval granularity and context relevance.
ZeroClaw uses a line-based chunking strategy that preserves markdown structure:
flowchart TD
Document["Input Document<br/>(Markdown)"]
ParseLines["Parse Lines<br/>(split by \\n)"]
DetectHeadings["Detect Headings<br/>(# ## ### etc.)"]
GroupContent["Group Content<br/>Under Headings"]
SplitOversize["Split Oversize Chunks<br/>(max_chunk_size)"]
PreserveContext["Preserve Heading Context<br/>in each chunk"]
Chunks["Output Chunks<br/>(with metadata)"]
Document --> ParseLines
ParseLines --> DetectHeadings
DetectHeadings --> GroupContent
GroupContent --> SplitOversize
SplitOversize --> PreserveContext
PreserveContext --> Chunks
-
Heading Preservation: Each chunk retains its parent heading hierarchy for context
- Example:
## Installation > ### Prerequisitespreserved in chunk metadata
- Example:
-
Size Limits: Configurable
max_chunk_size(typically 1000-2000 characters)- Oversize paragraphs are split at sentence boundaries
-
Overlap: Optional sliding window overlap (default: 100 characters)
- Ensures context isn't lost at chunk boundaries
-
Code Block Handling: Code blocks are treated as atomic units
- Never split mid-block, even if exceeding size limit
- Preserves Structure: Markdown headings provide natural semantic boundaries
- Context-Aware: Parent headings give additional relevance signals
- Fast: No complex NLP parsing required
- Deterministic: Same input always produces same chunks
Sources: README.md:344-345
Over time, the memory system may require reindexing to:
- Rebuild corrupted FTS5 indices
- Re-embed content with a new embedding provider
- Update to a newer embedding model
- Compact database after many deletions
The reindex operation is designed to be safe and atomic:
sequenceDiagram
participant User
participant Memory as Memory Backend
participant FTS5 as FTS5 Virtual Table
participant Embeddings as Embedding Table
participant Provider as EmbeddingProvider
User->>Memory: Trigger Reindex
Memory->>Memory: BEGIN TRANSACTION
Memory->>FTS5: DROP VIRTUAL TABLE IF EXISTS memories_fts
Memory->>FTS5: CREATE VIRTUAL TABLE memories_fts
loop For each memory chunk
Memory->>Embeddings: Check if embedding exists
alt Missing Embedding
Memory->>Provider: embed(content)
Provider-->>Memory: embedding vector
Memory->>Embeddings: INSERT embedding
end
Memory->>FTS5: INSERT INTO memories_fts
end
Memory->>Memory: COMMIT TRANSACTION
Memory-->>User: Reindex Complete
-
FTS5 Rebuild:
- Drops and recreates FTS5 virtual table
- Re-inserts all content for fresh indexing
- Updates BM25 term statistics
-
Re-embedding:
- Scans all memory chunks
- Identifies missing or null embeddings
- Calls
EmbeddingProvider.embed()for each - Updates embedding BLOBs atomically
-
Atomicity:
- Entire operation runs within a SQLite transaction
- If any step fails, all changes rollback
- No partial or corrupt state
Reindex is typically triggered:
- Manually via CLI:
zeroclaw memory reindex - Automatically on provider change (if configured)
- On database health check failure
Sources: README.md:346-347
Complete hybrid search configuration options:
[memory]
# Backend type (must be "sqlite" for hybrid search)
backend = "sqlite"
# Auto-save conversation context to memory
auto_save = true
# Embedding provider for vector search
# Options: "none", "openai", "custom:https://..."
embedding_provider = "none"
# Hybrid merge weights (must sum to <= 1.0)
vector_weight = 0.7 # Weight for vector similarity scores
keyword_weight = 0.3 # Weight for keyword BM25 scores
# Optional: SQLite open timeout when file is locked
sqlite_open_timeout_secs = 30
# Optional: Override storage provider
# [storage.provider.config]
# provider = "postgres" # Use PostgreSQL instead of SQLite
# db_url = "postgres://..."| Parameter | Impact | Recommendation |
|---|---|---|
vector_weight |
Higher = more semantic | 0.7-0.9 for conversational queries |
keyword_weight |
Higher = more exact | 0.3-0.5 for technical/code queries |
embedding_provider |
API latency | Use caching; consider local model |
sqlite_open_timeout_secs |
Lock contention | Increase for high concurrency |
To disable vector search entirely (zero API calls, lower latency):
[memory]
backend = "sqlite"
embedding_provider = "none"
vector_weight = 0.0
keyword_weight = 1.0This mode uses FTS5 BM25 only, suitable for:
- Exact keyword recall scenarios
- No API key / offline environments
- Performance-critical deployments
Sources: README.md:346-377
The hybrid search system is invoked automatically during the agent turn cycle:
sequenceDiagram
participant Agent as Agent Turn Loop
participant Memory as Memory Backend
participant Hybrid as Hybrid Search
participant Provider as LLM Provider
Agent->>Memory: Recall relevant context<br/>recall(query, limit)
Memory->>Hybrid: Execute hybrid search
par Vector Search
Hybrid->>Hybrid: Generate query embedding
Hybrid->>Hybrid: Cosine similarity scan
and Keyword Search
Hybrid->>Hybrid: Tokenize query
Hybrid->>Hybrid: FTS5 BM25 search
end
Hybrid->>Hybrid: Merge & rank results
Hybrid-->>Memory: Ranked chunks
Memory-->>Agent: Recalled context
Agent->>Agent: Append context to<br/>system prompt
Agent->>Provider: chat(messages + context)
Provider-->>Agent: Response
alt auto_save enabled
Agent->>Memory: Store conversation<br/>store(content)
end
The agent can explicitly trigger memory recall via the recall tool:
{
"tool_calls": [{
"id": "call_abc123",
"function": {
"name": "recall",
"arguments": "{\"query\": \"authentication implementation\", \"limit\": 5}"
}
}]
}The hybrid search executes and returns top-ranked results to the LLM context.
Sources: README.md:330-345
ZeroClaw's hybrid search provides production-grade memory retrieval with:
- Zero external dependencies: All components (vector storage, keyword search, merge logic) implemented in SQLite
- Configurable weights: Tune vector vs. keyword emphasis per deployment
- LRU caching: Minimize embedding API calls for frequent queries
- Atomic operations: Safe reindexing with full transaction support
- Multiple embedding providers: OpenAI, custom endpoints, or none (keyword-only mode)
The system balances semantic understanding (vector search) with exact matching (keyword search), delivering robust recall across diverse query types without requiring heavyweight infrastructure like Pinecone or Elasticsearch.
Sources: README.md:330-347