09.2 Hybrid Search

Hybrid Search

Relevant source files

The following files were used as context for generating this wiki page:

Purpose and Scope

This page documents ZeroClaw's hybrid search implementation, which combines vector similarity search with keyword-based full-text search to provide robust memory recall. The system uses SQLite as the backing store with custom implementations for both search modes, requiring no external dependencies like Pinecone, Elasticsearch, or LangChain.

For information about available memory backends and how to configure them, see Memory Backends. For memory export and backup functionality, see Memory Snapshot.

Architecture Overview

ZeroClaw's hybrid search operates as a two-stage retrieval system that merges results from both vector and keyword search paths using configurable weights.

flowchart TB
    Query["User Query / Recall Request"]
    
    subgraph "Parallel Search Execution"
        VectorPath["Vector Search Path"]
        KeywordPath["Keyword Search Path"]
    end
    
    subgraph "Vector Search"
        Embed["Generate Query Embedding<br/>(EmbeddingProvider)"]
        VectorDB["SQLite BLOB Storage<br/>Cosine Similarity Scan"]
        VectorResults["Vector Results<br/>(similarity scores)"]
    end
    
    subgraph "Keyword Search"
        Tokenize["Tokenize Query<br/>(FTS5 tokenizer)"]
        FTS5["FTS5 Virtual Table<br/>BM25 Ranking"]
        KeywordResults["Keyword Results<br/>(BM25 scores)"]
    end
    
    subgraph "Merge Layer"
        Normalize["Normalize Scores<br/>(0.0 - 1.0 range)"]
        Weight["Apply Weights<br/>vector_weight * vec_score +<br/>keyword_weight * kw_score"]
        Dedupe["Deduplicate & Sort<br/>by Combined Score"]
    end
    
    FinalResults["Ranked Memory Chunks"]
    
    Query --> VectorPath
    Query --> KeywordPath
    
    VectorPath --> Embed
    Embed --> VectorDB
    VectorDB --> VectorResults
    
    KeywordPath --> Tokenize
    Tokenize --> FTS5
    FTS5 --> KeywordResults
    
    VectorResults --> Normalize
    KeywordResults --> Normalize
    Normalize --> Weight
    Weight --> Dedupe
    Dedupe --> FinalResults

Sources: README.md:330-345

Vector Search Component

Embedding Generation

The vector search path begins by converting the query text into a high-dimensional embedding vector using an EmbeddingProvider implementation. The provider is configurable via memory.embedding_provider:

Provider Value	Behavior
`"none"`	No-op provider; vector search disabled (keyword-only mode)
`"openai"`	Uses OpenAI's embedding API (requires `OPENAI_API_KEY`)
`"custom:https://..."`	Custom OpenAI-compatible embedding endpoint

Sources: README.md:346-377

Storage Format

Embeddings are stored as BLOBs directly in SQLite tables, typically in a memories or similar table structure. Each row contains:

id: Unique memory chunk identifier
content: Original text content
embedding: Binary-encoded vector (typically float32 array)
metadata: Timestamps, tags, source information

Cosine Similarity Search

Vector retrieval performs a full table scan computing cosine similarity between the query embedding and each stored embedding:

similarity(A, B) = (A · B) / (||A|| * ||B||)

Results are ranked by similarity score (1.0 = identical, -1.0 = opposite, 0.0 = orthogonal).

Sources: README.md:336-337

Keyword Search Component

FTS5 Virtual Tables

The keyword search path leverages SQLite's FTS5 (Full-Text Search 5) extension, which creates inverted indices for fast text retrieval. FTS5 tables mirror the structure of the main memory table but are optimized for tokenized search.

Sources: README.md:338-339

BM25 Ranking

FTS5 uses the BM25 (Best Match 25) algorithm to score document relevance. BM25 improves on TF-IDF by:

Term Frequency Saturation: Logarithmic scaling prevents repeated terms from dominating scores
Document Length Normalization: Shorter documents aren't penalized unfairly
Inverse Document Frequency: Rare terms score higher than common terms

The FTS5 query syntax supports:

Boolean operators: AND, OR, NOT
Phrase matching: "exact phrase"
Proximity search: NEAR(term1 term2, distance)
Prefix matching: term*

Sources: README.md:338-339

Hybrid Merge Algorithm

The merge layer combines vector and keyword results using a weighted linear combination. This is implemented in a custom merge function.

flowchart LR
    VectorScores["Vector Scores<br/>(0.0 - 1.0)"]
    KeywordScores["Keyword Scores<br/>(0.0 - 1.0)"]
    
    VectorWeight["vector_weight<br/>(default: 0.7)"]
    KeywordWeight["keyword_weight<br/>(default: 0.3)"]
    
    Combine["Combined Score =<br/>vec_weight * vec_score +<br/>kw_weight * kw_score"]
    
    Sort["Sort by Combined Score<br/>(descending)"]
    
    VectorScores --> Combine
    KeywordScores --> Combine
    VectorWeight --> Combine
    KeywordWeight --> Combine
    
    Combine --> Sort

Merge Function (vector.rs)

The custom weighted merge function in vector.rs performs:

Score Normalization: Both vector similarity and BM25 scores are normalized to [0.0, 1.0]
Weight Application: Applies vector_weight and keyword_weight from config
Deduplication: Merges duplicate results (same memory chunk ID) by taking max combined score
Sorting: Orders final results by combined score descending

Configuration

Weights are configured in ~/.zeroclaw/config.toml:

[memory]
vector_weight = 0.7    # Emphasize semantic similarity
keyword_weight = 0.3   # De-emphasize exact keyword matches

Tuning Guidelines:

Use Case	vector_weight	keyword_weight	Rationale
Semantic search	0.8 - 1.0	0.0 - 0.2	Prioritize meaning over exact terms
Technical queries	0.5	0.5	Balance semantic and exact matches
Exact recall	0.0 - 0.2	0.8 - 1.0	Prioritize keyword precision

Sources: README.md:340-341, README.md:346-377

Embedding Provider System

classDiagram
    class EmbeddingProvider {
        <<trait>>
        +embed(text: String) Result~Vec~f32~~
        +embed_batch(texts: Vec~String~) Result~Vec~Vec~f32~~~
    }
    
    class OpenAIEmbedding {
        -api_key: String
        -model: String
        +embed(text) Result~Vec~f32~~
        +embed_batch(texts) Result~Vec~Vec~f32~~~
    }
    
    class CustomEmbedding {
        -endpoint: String
        -api_key: Option~String~
        +embed(text) Result~Vec~f32~~
        +embed_batch(texts) Result~Vec~Vec~f32~~~
    }
    
    class NoopEmbedding {
        +embed(text) Result~Vec~f32~~
        +embed_batch(texts) Result~Vec~Vec~f32~~~
    }
    
    EmbeddingProvider <|.. OpenAIEmbedding
    EmbeddingProvider <|.. CustomEmbedding
    EmbeddingProvider <|.. NoopEmbedding

Provider Implementations

Provider	Configuration	Behavior
OpenAI	`embedding_provider = "openai"`	Uses `text-embedding-ada-002` model via OpenAI API
Custom	`embedding_provider = "custom:https://..."`	OpenAI-compatible embedding endpoint
Noop	`embedding_provider = "none"`	Returns empty vector; disables vector search

API Key Resolution

Embedding providers resolve API keys using the same priority order as LLM providers:

Explicit api_key in memory config
Provider-specific environment variable (e.g., OPENAI_API_KEY)
Generic API_KEY environment variable
Encrypted secret store (~/.zeroclaw/.secret_key)

Sources: README.md:342-343

Embedding Cache

To minimize redundant API calls and improve recall latency, ZeroClaw maintains an LRU (Least Recently Used) cache for embeddings.

Cache Table Structure

The embedding_cache table in SQLite stores:

Column	Type	Description
`text_hash`	TEXT PRIMARY KEY	SHA-256 hash of input text
`embedding`	BLOB	Binary-encoded vector
`created_at`	TIMESTAMP	Cache entry creation time
`accessed_at`	TIMESTAMP	Last access time (for LRU)

Cache Operations

flowchart TD
    Query["Memory Recall Query"]
    Hash["Compute SHA-256<br/>of query text"]
    CacheLookup["SELECT FROM<br/>embedding_cache<br/>WHERE text_hash = ?"]
    
    CacheHit{Cache Hit?}
    
    UpdateAccess["UPDATE accessed_at<br/>SET NOW()"]
    ReturnCached["Return Cached<br/>Embedding"]
    
    CallProvider["Call EmbeddingProvider<br/>embed(text)"]
    InsertCache["INSERT INTO<br/>embedding_cache<br/>(or replace if full)"]
    ReturnNew["Return New<br/>Embedding"]
    
    Query --> Hash
    Hash --> CacheLookup
    CacheLookup --> CacheHit
    
    CacheHit -->|Yes| UpdateAccess
    UpdateAccess --> ReturnCached
    
    CacheHit -->|No| CallProvider
    CallProvider --> InsertCache
    InsertCache --> ReturnNew

LRU Eviction

When the cache reaches capacity (configurable, typically 10,000 entries):

Identify the row with the oldest accessed_at timestamp
Delete that row
Insert the new embedding

This ensures frequently queried embeddings remain cached while stale entries are evicted.

Sources: README.md:344-345

Text Chunking

Before storing memories, long-form content is split into chunks to improve retrieval granularity and context relevance.

Line-Based Markdown Chunker

ZeroClaw uses a line-based chunking strategy that preserves markdown structure:

flowchart TD
    Document["Input Document<br/>(Markdown)"]
    
    ParseLines["Parse Lines<br/>(split by \\n)"]
    
    DetectHeadings["Detect Headings<br/>(# ## ### etc.)"]
    
    GroupContent["Group Content<br/>Under Headings"]
    
    SplitOversize["Split Oversize Chunks<br/>(max_chunk_size)"]
    
    PreserveContext["Preserve Heading Context<br/>in each chunk"]
    
    Chunks["Output Chunks<br/>(with metadata)"]
    
    Document --> ParseLines
    ParseLines --> DetectHeadings
    DetectHeadings --> GroupContent
    GroupContent --> SplitOversize
    SplitOversize --> PreserveContext
    PreserveContext --> Chunks

Chunking Strategy

Heading Preservation: Each chunk retains its parent heading hierarchy for context
- Example: ## Installation > ### Prerequisites preserved in chunk metadata
Size Limits: Configurable max_chunk_size (typically 1000-2000 characters)
- Oversize paragraphs are split at sentence boundaries
Overlap: Optional sliding window overlap (default: 100 characters)
- Ensures context isn't lost at chunk boundaries
Code Block Handling: Code blocks are treated as atomic units
- Never split mid-block, even if exceeding size limit

Why Line-Based?

Preserves Structure: Markdown headings provide natural semantic boundaries
Context-Aware: Parent headings give additional relevance signals
Fast: No complex NLP parsing required
Deterministic: Same input always produces same chunks

Sources: README.md:344-345

Reindexing

Over time, the memory system may require reindexing to:

Rebuild corrupted FTS5 indices
Re-embed content with a new embedding provider
Update to a newer embedding model
Compact database after many deletions

Safe Atomic Reindex

The reindex operation is designed to be safe and atomic:

sequenceDiagram
    participant User
    participant Memory as Memory Backend
    participant FTS5 as FTS5 Virtual Table
    participant Embeddings as Embedding Table
    participant Provider as EmbeddingProvider
    
    User->>Memory: Trigger Reindex
    Memory->>Memory: BEGIN TRANSACTION
    
    Memory->>FTS5: DROP VIRTUAL TABLE IF EXISTS memories_fts
    Memory->>FTS5: CREATE VIRTUAL TABLE memories_fts
    
    loop For each memory chunk
        Memory->>Embeddings: Check if embedding exists
        alt Missing Embedding
            Memory->>Provider: embed(content)
            Provider-->>Memory: embedding vector
            Memory->>Embeddings: INSERT embedding
        end
        
        Memory->>FTS5: INSERT INTO memories_fts
    end
    
    Memory->>Memory: COMMIT TRANSACTION
    Memory-->>User: Reindex Complete

Reindex Operations

FTS5 Rebuild:
- Drops and recreates FTS5 virtual table
- Re-inserts all content for fresh indexing
- Updates BM25 term statistics
Re-embedding:
- Scans all memory chunks
- Identifies missing or null embeddings
- Calls EmbeddingProvider.embed() for each
- Updates embedding BLOBs atomically
Atomicity:
- Entire operation runs within a SQLite transaction
- If any step fails, all changes rollback
- No partial or corrupt state

Triggering Reindex

Reindex is typically triggered:

Manually via CLI: zeroclaw memory reindex
Automatically on provider change (if configured)
On database health check failure

Sources: README.md:346-347

Configuration Reference

Complete hybrid search configuration options:

[memory]
# Backend type (must be "sqlite" for hybrid search)
backend = "sqlite"

# Auto-save conversation context to memory
auto_save = true

# Embedding provider for vector search
# Options: "none", "openai", "custom:https://..."
embedding_provider = "none"

# Hybrid merge weights (must sum to <= 1.0)
vector_weight = 0.7      # Weight for vector similarity scores
keyword_weight = 0.3     # Weight for keyword BM25 scores

# Optional: SQLite open timeout when file is locked
sqlite_open_timeout_secs = 30

# Optional: Override storage provider
# [storage.provider.config]
# provider = "postgres"    # Use PostgreSQL instead of SQLite
# db_url = "postgres://..."

Performance Tuning

Parameter	Impact	Recommendation
`vector_weight`	Higher = more semantic	0.7-0.9 for conversational queries
`keyword_weight`	Higher = more exact	0.3-0.5 for technical/code queries
`embedding_provider`	API latency	Use caching; consider local model
`sqlite_open_timeout_secs`	Lock contention	Increase for high concurrency

Keyword-Only Mode

To disable vector search entirely (zero API calls, lower latency):

[memory]
backend = "sqlite"
embedding_provider = "none"
vector_weight = 0.0
keyword_weight = 1.0

This mode uses FTS5 BM25 only, suitable for:

Exact keyword recall scenarios
No API key / offline environments
Performance-critical deployments

Sources: README.md:346-377

Integration with Agent Core

The hybrid search system is invoked automatically during the agent turn cycle:

sequenceDiagram
    participant Agent as Agent Turn Loop
    participant Memory as Memory Backend
    participant Hybrid as Hybrid Search
    participant Provider as LLM Provider
    
    Agent->>Memory: Recall relevant context<br/>recall(query, limit)
    Memory->>Hybrid: Execute hybrid search
    
    par Vector Search
        Hybrid->>Hybrid: Generate query embedding
        Hybrid->>Hybrid: Cosine similarity scan
    and Keyword Search
        Hybrid->>Hybrid: Tokenize query
        Hybrid->>Hybrid: FTS5 BM25 search
    end
    
    Hybrid->>Hybrid: Merge & rank results
    Hybrid-->>Memory: Ranked chunks
    Memory-->>Agent: Recalled context
    
    Agent->>Agent: Append context to<br/>system prompt
    Agent->>Provider: chat(messages + context)
    Provider-->>Agent: Response
    
    alt auto_save enabled
        Agent->>Memory: Store conversation<br/>store(content)
    end

Recall Tool

The agent can explicitly trigger memory recall via the recall tool:

{
  "tool_calls": [{
    "id": "call_abc123",
    "function": {
      "name": "recall",
      "arguments": "{\"query\": \"authentication implementation\", \"limit\": 5}"
    }
  }]
}

The hybrid search executes and returns top-ranked results to the LLM context.

Sources: README.md:330-345

Summary

ZeroClaw's hybrid search provides production-grade memory retrieval with:

Zero external dependencies: All components (vector storage, keyword search, merge logic) implemented in SQLite
Configurable weights: Tune vector vs. keyword emphasis per deployment
LRU caching: Minimize embedding API calls for frequent queries
Atomic operations: Safe reindexing with full transaction support
Multiple embedding providers: OpenAI, custom endpoints, or none (keyword-only mode)

The system balances semantic understanding (vector search) with exact matching (keyword search), delivering robust recall across diverse query types without requiring heavyweight infrastructure like Pinecone or Elasticsearch.

Sources: README.md:330-347

Home

09.2 Hybrid Search

Hybrid Search

Purpose and Scope

Architecture Overview

Vector Search Component

Embedding Generation

Storage Format

Cosine Similarity Search

Keyword Search Component

FTS5 Virtual Tables

BM25 Ranking

Hybrid Merge Algorithm

Merge Function (vector.rs)

Configuration

Embedding Provider System

Provider Implementations

API Key Resolution

Embedding Cache

Cache Table Structure

Cache Operations

LRU Eviction

Text Chunking

Line-Based Markdown Chunker

Chunking Strategy

Why Line-Based?

Reindexing

Safe Atomic Reindex

Reindex Operations

Triggering Reindex

Configuration Reference

Performance Tuning

Keyword-Only Mode

Integration with Agent Core

Recall Tool

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!