Skip to content

feat: configurable AI Embedding Model#124

Open
bibhupradhanofficial wants to merge 5 commits into
aviralgarg05:mainfrom
bibhupradhanofficial:main
Open

feat: configurable AI Embedding Model#124
bibhupradhanofficial wants to merge 5 commits into
aviralgarg05:mainfrom
bibhupradhanofficial:main

Conversation

@bibhupradhanofficial

@bibhupradhanofficial bibhupradhanofficial commented Feb 10, 2026

Copy link
Copy Markdown

🚀 OSCG'26 Contributor Notice

NexumDB participates in the Open Source Contributor Games 2026! High-quality PRs earn you points, recognition, and networking opportunities. Please follow our contribution guidelines for maximum impact and ensure your submission meets OSCG quality standards.

Summary

Closes #(issue_number)

What Changed?

Why?

Type of Change

  • Bug fix (non-breaking change fixing an issue)
  • New feature (non-breaking change adding functionality)
  • Breaking change (fix or feature causing existing functionality to change)
  • Documentation update
  • Refactoring (code improvement without behavior change)
  • Style/UI changes
  • Performance improvement
  • Test addition or update
  • Configuration/build changes
  • Security fix

Testing

Local Testing Checklist

  • cargo fmt --all -- --check - Code formatting
  • cargo clippy --workspace --all-targets -- -D warnings - Linting
  • cargo test --workspace -- --test-threads=1 - All tests pass
  • cargo audit - No security vulnerabilities
  • python -m ruff check nexum_ai/ - Python linting (if applicable)
  • python -m compileall nexum_ai - Python compilation (if applicable)
  • Manual testing performed (describe below)

Testing Details

Screenshots/Examples

Performance Impact

  • No performance impact
  • Performance improvement (describe below)
  • Potential performance concern (describe below)

Security Considerations

  • No security concerns
  • Security improvement (describe below)
  • Requires security review (describe below)

Documentation

  • Updated inline code comments/documentation
  • Updated README.md (if applicable)
  • Updated CHANGELOG.md (will be auto-updated on release)
  • Updated TESTING.md (if test procedures changed)
  • Added/updated examples in demo.sh (if applicable)

Pre-Merge Checklist

  • Self-reviewed my own code thoroughly
  • Added comprehensive tests covering my changes
  • All existing tests pass locally
  • Updated relevant documentation
  • No uncommitted debug code or console logs
  • Commit messages follow Conventional Commits
  • Follows Code of Conduct
  • Branch is up to date with main

🎯 OSCG'26 Quality Standards

By submitting this PR, I confirm:

  • I have read and follow the CONTRIBUTING.md guidelines
  • My contribution meets OSCG'26 quality standards for production-ready code
  • I have tested my changes thoroughly across different scenarios
  • I understand the zero-tolerance policy for low-quality submissions
  • My code follows established patterns and maintains consistency
  • I have considered performance, security, and maintainability impacts
  • I acknowledge this project's participation in OSCG'26 and uphold its reputation

Additional Notes

Reviewers: Please check that all checkboxes are ticked before approving

Summary by CodeRabbit

  • New Features

    • SQLite-backed persistent cache with automatic migration from legacy files.
    • Optional HuggingFace embedding support with automatic selection.
  • Improvements

    • Mean-pooled HF embeddings, guarded fallback to lightweight vectorizer, and memory-lean cache loading.
    • Default cache now uses a .sqlite file.
  • Tests

    • Integration tests updated to verify SQLite-backed persistence.
  • Chores

    • Added a small demo script and updated examples to reflect SQLite usage.

@coderabbitai

coderabbitai Bot commented Feb 10, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

Replaces the SemanticCache pickle/JSON backend with SQLite (default), adding automatic migration of legacy JSON/PKL caches. Adds optional HuggingFace AutoTokenizer/AutoModel embedding path alongside SentenceTransformer, with guarded fallbacks and in-memory fallback if SQLite is unavailable.

Changes

Cohort / File(s) Summary
Core optimizer & cache
nexum_ai/optimizer.py
Switches persistence default to SQLite, implements SQLite schema, WAL mode, meta table, legacy JSON/PKL migration, DB read/write/migration helpers, memory-lean loading, expiration/optimization, and in-memory fallback. Adds hf_model and hf_tokenizer attributes and HF embedding path with mean-pooling and guarded fallbacks.
Rust bridge default
nexum_core/src/bridge/mod.rs
Updates SemanticCache::new default cache filename from semantic_cache.pkl to semantic_cache.sqlite.
Examples & repro
examples/cache_persistence_demo.py, reproduce_issue.py
Example and new script updated to use .sqlite cache path and to instantiate/initialize the SemanticCache to exercise new behavior.
Tests
nexum_ai/tests/test_cache_integration.py
Adjusted expectations to assert .sqlite actual cache files exist/remove and to reflect DB-backed persistence while preserving existing test logic.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant Cache as SemanticCache
    participant ST as SentenceTransformer
    participant HF as HuggingFace (AutoTokenizer/AutoModel)
    participant SV as SimpleVectorizer

    Caller->>Cache: initialize_model()
    Cache->>ST: attempt import & load (env-preferred)
    alt ST loads
        ST-->>Cache: ST model ready
    else ST fails
        Cache->>HF: attempt AutoTokenizer & AutoModel load
        alt HF loads
            HF-->>Cache: tokenizer + model ready
        else HF fails
            SV-->>Cache: fallback vectorizer ready
        end
    end

    Caller->>Cache: vectorize(text)
    alt using SentenceTransformer
        Cache->>ST: encode(text)
        ST-->>Caller: embedding
    else using HuggingFace
        Cache->>HF: tokenize -> model -> last_hidden_state + attention_mask
        HF-->>Cache: last_hidden_state + attention_mask
        Cache->>Cache: mean-pool -> embedding
        Cache-->>Caller: embedding
    else fallback
        Cache->>SV: vectorize(text)
        SV-->>Caller: embedding
    end
Loading
sequenceDiagram
    participant Caller as Caller
    participant Cache as SemanticCache
    participant DB as SQLite DB
    participant Legacy as Legacy JSON/PKL

    Caller->>Cache: initialize_cache(cache_file)
    Cache->>DB: attempt open/init (WAL, meta table)
    alt DB available
        DB-->>Cache: DB ready
        Cache->>Legacy: check legacy cache files
        alt legacy present
            Legacy-->>Cache: legacy entries
            Cache->>DB: migrate legacy entries -> insert as blobs/meta
        end
        Cache-->>Caller: cache initialized (SQLite)
    else DB unavailable
        Cache->>Cache: initialize in-memory cache
        Cache-->>Caller: cache initialized (in-memory)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

tests, rust, documentation, OSCG26, size/L

Suggested reviewers

  • aviralgarg05
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.84% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: configurable AI Embedding Model' is partially related to the changeset. While the PR does add optional HuggingFace embedding support alongside SentenceTransformer, the primary changes are a migration from pickle/JSON cache files to SQLite with automatic legacy migration. The title focuses on embedding configurability but overlooks the more substantial storage backend rework.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nexum_ai/optimizer.py (1)

114-119: ⚠️ Potential issue | 🔴 Critical

Fallback vector dimension (384) is hardcoded and may mismatch the configured model's output.

_fallback_vectorize always returns a 384-dimensional vector. With the new NEXUM_EMBEDDING_MODEL env var, users can select models that produce different dimensions (e.g., 768 for all-mpnet-base-v2). If the fallback is triggered after vectors from a different-dimensioned model are already cached, cosine_similarity will raise a ValueError from np.dot due to shape mismatch.

Consider dynamically setting the fallback dimension based on the configured model's known output size, or storing the expected dimension as an instance attribute during initialize_model.

♻️ Suggested approach
 def initialize_model(self) -> None:
     ...
+    self._embedding_dim = 384  # default
     try:
         from sentence_transformers import SentenceTransformer
         self.model = SentenceTransformer(model_name)
+        # Probe dimension
+        self._embedding_dim = self.model.get_sentence_embedding_dimension()
         ...
     ...

 def _fallback_vectorize(self, text: str) -> List[float]:
-    vec = [0.0] * 384
+    vec = [0.0] * self._embedding_dim
     for i, char in enumerate(text[:len(vec)]):
         vec[i] = float(ord(char)) / 128.0
     return vec
🤖 Fix all issues with AI agents
In `@nexum_ai/optimizer.py`:
- Around line 88-110: The HF vectorization branch in vectorize (the elif that
checks self.hf_model and self.hf_tokenizer) re-imports torch unnecessarily;
update initialize_model() to stash the imported module (e.g., assign torch to
self._torch when loading HF model) and in the vectorize branch use that cached
reference (self._torch) for tensor ops and no_grad() context, replace local
import torch and any torch.* calls with self._torch.* to avoid repeated imports
and to make the code rely on the already-initialized runtime.
- Around line 36-47: The current initialize_model path (used by SemanticCache
instances created per Rust call via SemanticCache.call0()) can re-load heavy
models (especially the AutoModel.from_pretrained fallback), causing large
overhead; change the design to reuse a single Python-side SemanticCache by
caching the instance or its loaded models across calls — e.g., implement a
module-level singleton (module-level variable) or attach/persist the
SemanticCache instance or its loaded attributes (self.model, self.hf_model,
self.hf_tokenizer) on the PythonBridge struct so initialize_model first checks
and returns early if models are already loaded; ensure the code paths
referencing SentenceTransformer, AutoModel.from_pretrained and
SemanticCache.call0() use the shared instance to avoid repeated heavyweight
loads.
- Around line 44-51: Replace the print-based diagnostics in the semantic cache
initializer with the logging module: add an import for logging and get a module
logger (logger = logging.getLogger(__name__)), then change the prints to
logger.info(...) for successful initialization (the line that logs "Semantic
cache initialized with SentenceTransformer: {model_name}") and to
logger.warning(...) for the fallback and exception branches (the messages around
"sentence-transformers not installed" and "Failed to load with
SentenceTransformer ({e})"); keep the assignments to self.hf_model and
self.hf_tokenizer and the early return behavior intact so functionality of the
initializer (including model_name, self.hf_model, self.hf_tokenizer) is
unchanged.
- Around line 82-112: The HF branch currently swallows errors and returns
self._fallback_vectorize, which corrupts cache semantics; instead, in the HF
vectorization path (the block that uses self.hf_model / self.hf_tokenizer and
torch in optimizer.py) change the error handling to either re-raise the
exception (so callers see vectorization failure) or return a distinguishable
sentinel/error object (not a character-hash) and log at error level;
additionally, augment cache write/read (the put() and get() paths that store
embeddings) to persist a backend identifier (e.g., "backend_id" derived from
self.model/hf_model package+version or a constant like "hf-v{model_name}")
alongside each vector and make get() validate the stored backend_id against the
current backend, evicting or refusing incompatible entries so HF-produced
vectors are never compared to fallback hashes; ensure references to
self._fallback_vectorize, initialize_model, self.hf_model, self.hf_tokenizer,
put(), and get() are updated accordingly.
- Around line 69-73: Fix the PEP8 indentation in the except ImportError block in
optimizer.py: the line with the print statement has one extra leading space (13
vs expected 12); align the print and subsequent assignments (self.model,
self.hf_model, self.hf_tokenizer) to the same 12-space indentation level as the
except block body so all lines inside that except block are consistently
indented.
- Around line 21-23: Add explicit Optional type annotations for the attributes
initialized to None: annotate self.model, self.hf_model, and self.hf_tokenizer
using typing.Optional with their expected types (e.g., Optional[YourModelClass]
or Optional[Any] / Optional[transformers.PreTrainedModel] and
Optional[transformers.PreTrainedTokenizerFast] as appropriate). Update the class
__init__ so the declarations read with the type hints (referencing the attribute
names self.model, self.hf_model, self.hf_tokenizer) and import Optional from
typing at the top of the module to enable static analysis and IDE assistance.
- Around line 62-68: Validate and restrict the environment-controlled model_name
before calling AutoTokenizer.from_pretrained and AutoModel.from_pretrained:
implement an allow-list of trusted model identifiers and check the incoming
model_name (and normalize entries like 'all-MiniLM-L6-v2' →
'sentence-transformers/all-MiniLM-L6-v2') against it, require or inject a pinned
revision parameter (revision="...") and pass use_safetensors=True (or
equivalent) to from_pretrained calls so only safetensors are loaded; if the
model_name is not on the allow-list, raise or fall back to a safe default
instead of calling AutoTokenizer.from_pretrained / AutoModel.from_pretrained.

Comment thread nexum_ai/optimizer.py
Comment on lines 21 to +23
self.model = None
self.hf_model = None
self.hf_tokenizer = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Missing type annotations for model attributes.

self.model, self.hf_model, and self.hf_tokenizer are all initialized to None without type hints. Adding Optional[...] annotations would clarify the expected types and improve IDE/static-analysis support.

♻️ Suggested type annotations
+        from typing import TYPE_CHECKING
+        # At module level or behind TYPE_CHECKING guard:
+        # from sentence_transformers import SentenceTransformer
+        # from transformers import AutoModel, AutoTokenizer
+
         self.model = None  # Optional[SentenceTransformer]
-        self.hf_model = None
-        self.hf_tokenizer = None
+        self.hf_model = None  # Optional[AutoModel]
+        self.hf_tokenizer = None  # Optional[AutoTokenizer]
🤖 Prompt for AI Agents
In `@nexum_ai/optimizer.py` around lines 21 - 23, Add explicit Optional type
annotations for the attributes initialized to None: annotate self.model,
self.hf_model, and self.hf_tokenizer using typing.Optional with their expected
types (e.g., Optional[YourModelClass] or Optional[Any] /
Optional[transformers.PreTrainedModel] and
Optional[transformers.PreTrainedTokenizerFast] as appropriate). Update the class
__init__ so the declarations read with the type hints (referencing the attribute
names self.model, self.hf_model, self.hf_tokenizer) and import Optional from
typing at the top of the module to enable static analysis and IDE assistance.

Comment thread nexum_ai/optimizer.py
Comment thread nexum_ai/optimizer.py
Comment on lines +62 to +68
if model_name == 'all-MiniLM-L6-v2':
model_name = 'sentence-transformers/all-MiniLM-L6-v2'

self.hf_tokenizer = AutoTokenizer.from_pretrained(model_name)
self.hf_model = AutoModel.from_pretrained(model_name)
self.model = None
print(f"Semantic cache initialized with HuggingFace transformers: {model_name}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Security: environment-controlled model name passed to from_pretrained enables arbitrary model loading.

An attacker who controls the NEXUM_EMBEDDING_MODEL environment variable can point AutoModel.from_pretrained / AutoTokenizer.from_pretrained at an arbitrary HuggingFace Hub repository. Models hosted on the Hub may contain malicious payloads (e.g., pickle-based weight files, custom code). Consider:

  1. Validating model_name against an allow-list of trusted models.
  2. Pinning model revisions (the revision parameter).
  3. Forcing safetensors-only loading (use_safetensors=True if your transformers version supports it).
🛡️ Example allow-list validation
+        ALLOWED_MODELS = {
+            'all-MiniLM-L6-v2',
+            'sentence-transformers/all-MiniLM-L6-v2',
+            'sentence-transformers/paraphrase-MiniLM-L6-v2',
+        }
+        if model_name not in ALLOWED_MODELS:
+            print(f"Warning: model '{model_name}' is not in the allow-list, falling back to default")
+            model_name = 'sentence-transformers/all-MiniLM-L6-v2'
+
         self.hf_tokenizer = AutoTokenizer.from_pretrained(model_name)
         self.hf_model = AutoModel.from_pretrained(model_name)
🤖 Prompt for AI Agents
In `@nexum_ai/optimizer.py` around lines 62 - 68, Validate and restrict the
environment-controlled model_name before calling AutoTokenizer.from_pretrained
and AutoModel.from_pretrained: implement an allow-list of trusted model
identifiers and check the incoming model_name (and normalize entries like
'all-MiniLM-L6-v2' → 'sentence-transformers/all-MiniLM-L6-v2') against it,
require or inject a pinned revision parameter (revision="...") and pass
use_safetensors=True (or equivalent) to from_pretrained calls so only
safetensors are loaded; if the model_name is not on the allow-list, raise or
fall back to a safe default instead of calling AutoTokenizer.from_pretrained /
AutoModel.from_pretrained.

Comment thread nexum_ai/optimizer.py
Comment on lines +69 to +73
except ImportError:
print("Warning: transformers not installed, using simple fallback")
self.model = None
self.hf_model = None
self.hf_tokenizer = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

PEP 8 indentation violation: extra leading space.

Line 70 appears to have 13 spaces of indentation instead of the expected 12 (three levels of 4-space indent). This will still run in Python, but it's inconsistent with the rest of the file.

         except ImportError:
-             print("Warning: transformers not installed, using simple fallback")
-             self.model = None
-             self.hf_model = None
-             self.hf_tokenizer = None
+            print("Warning: transformers not installed, using simple fallback")
+            self.model = None
+            self.hf_model = None
+            self.hf_tokenizer = None

As per coding guidelines, **/*.py: Check Python code quality, type hints usage, and PEP 8 compliance.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except ImportError:
print("Warning: transformers not installed, using simple fallback")
self.model = None
self.hf_model = None
self.hf_tokenizer = None
except ImportError:
print("Warning: transformers not installed, using simple fallback")
self.model = None
self.hf_model = None
self.hf_tokenizer = None
🤖 Prompt for AI Agents
In `@nexum_ai/optimizer.py` around lines 69 - 73, Fix the PEP8 indentation in the
except ImportError block in optimizer.py: the line with the print statement has
one extra leading space (13 vs expected 12); align the print and subsequent
assignments (self.model, self.hf_model, self.hf_tokenizer) to the same 12-space
indentation level as the except block body so all lines inside that except block
are consistently indented.

Comment thread nexum_ai/optimizer.py
Comment on lines +82 to 112
if self.model is None and self.hf_model is None:
self.initialize_model()

if self.model is not None:
embedding = self.model.encode(text)
return embedding.tolist()
elif self.hf_model is not None and self.hf_tokenizer is not None:
try:
import torch
# Tokenize and compute embedding
inputs = self.hf_tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
outputs = self.hf_model(**inputs)

# Mean pooling
# attention_mask shape: (batch, seq_len)
# last_hidden_state shape: (batch, seq_len, hidden_dim)
attention_mask = inputs['attention_mask']
token_embeddings = outputs.last_hidden_state

input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)

embedding = sum_embeddings / sum_mask
return embedding[0].tolist()
except Exception as e:
print(f"Error during HF vectorization: {e}, using fallback")
return self._fallback_vectorize(text)
else:
return self._fallback_vectorize(text)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silent fallback to character-hash vectorizer can corrupt cache semantics.

If the HF model successfully vectorizes text during put() but later fails during get() (e.g., OOM, torch error), the code silently falls back to _fallback_vectorize (line 110). The fallback produces a character-hash vector whose geometry is completely unrelated to the HF embedding space. Cosine similarity between an HF-produced vector and a fallback vector is meaningless — leading to guaranteed cache misses (or, worse, spurious hits).

This same concern applies if the model backend changes between process restarts (e.g., sentence-transformers is installed, vectors are cached, then later only transformers is available). The cached vectors become incompatible with newly produced ones.

Consider:

  1. Raising the exception in the HF path instead of silently falling back, or at minimum logging at error level and returning a sentinel that the caller can distinguish.
  2. Storing the embedding backend identifier alongside cached vectors so incompatible entries can be detected or evicted.
🤖 Prompt for AI Agents
In `@nexum_ai/optimizer.py` around lines 82 - 112, The HF branch currently
swallows errors and returns self._fallback_vectorize, which corrupts cache
semantics; instead, in the HF vectorization path (the block that uses
self.hf_model / self.hf_tokenizer and torch in optimizer.py) change the error
handling to either re-raise the exception (so callers see vectorization failure)
or return a distinguishable sentinel/error object (not a character-hash) and log
at error level; additionally, augment cache write/read (the put() and get()
paths that store embeddings) to persist a backend identifier (e.g., "backend_id"
derived from self.model/hf_model package+version or a constant like
"hf-v{model_name}") alongside each vector and make get() validate the stored
backend_id against the current backend, evicting or refusing incompatible
entries so HF-produced vectors are never compared to fallback hashes; ensure
references to self._fallback_vectorize, initialize_model, self.hf_model,
self.hf_tokenizer, put(), and get() are updated accordingly.

Comment thread nexum_ai/optimizer.py
Comment on lines +88 to +110
elif self.hf_model is not None and self.hf_tokenizer is not None:
try:
import torch
# Tokenize and compute embedding
inputs = self.hf_tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
outputs = self.hf_model(**inputs)

# Mean pooling
# attention_mask shape: (batch, seq_len)
# last_hidden_state shape: (batch, seq_len, hidden_dim)
attention_mask = inputs['attention_mask']
token_embeddings = outputs.last_hidden_state

input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)

embedding = sum_embeddings / sum_mask
return embedding[0].tolist()
except Exception as e:
print(f"Error during HF vectorization: {e}, using fallback")
return self._fallback_vectorize(text)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Re-importing torch on every vectorize call is unnecessary.

torch was already successfully imported during initialize_model() (line 56) if the HF path was taken. The guarded import on line 90 is redundant in this branch since self.hf_model being non-None already implies torch was importable. You could capture the torch module reference as self._torch during init instead.

🤖 Prompt for AI Agents
In `@nexum_ai/optimizer.py` around lines 88 - 110, The HF vectorization branch in
vectorize (the elif that checks self.hf_model and self.hf_tokenizer) re-imports
torch unnecessarily; update initialize_model() to stash the imported module
(e.g., assign torch to self._torch when loading HF model) and in the vectorize
branch use that cached reference (self._torch) for tensor ops and no_grad()
context, replace local import torch and any torch.* calls with self._torch.* to
avoid repeated imports and to make the code rely on the already-initialized
runtime.

@github-actions

Copy link
Copy Markdown
Contributor

Merge Conflict Detected

This PR has merge conflicts with the main branch that must be resolved before it can be merged.

How to Resolve:

Option 1: Update from main

git checkout main
git pull origin main
git checkout main
git merge main
# Resolve conflicts in your editor
git add .
git commit -m "chore: resolve merge conflicts with main"
git push

Option 2: Rebase (advanced)

git checkout main
git fetch origin
git rebase origin/main
# Resolve conflicts in your editor
git add .
git rebase --continue
git push --force-with-lease

Need help? Check out GitHub's guide on resolving merge conflicts.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@reproduce_issue.py`:
- Around line 1-7: reproduce_issue.py is a temporary debug/demo script and
should be removed from the repo or converted into a proper automated test; if
you choose to keep it, eliminate the unused import os, wrap the top-level print
and cache.initialize_model() calls inside an if __name__ == "__main__": guard,
and replace bare side-effecting calls with a test that instantiates
SemanticCache and calls initialize_model() with assertions (or try/except
checks) to validate expected behavior instead of printing.

Comment thread reproduce_issue.py
Comment on lines +1 to +7
from nexum_ai.optimizer import SemanticCache
import os

print("--- Default Behavior ---")
cache = SemanticCache()
# Force initialization
cache.initialize_model()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This debug script should not be committed to the repository.

reproduce_issue.py appears to be a temporary debugging/demo artifact. It was likely used locally to exercise the new initialize_model() dual-path flow but adds no value as a permanent file in the repo. Either remove it entirely or convert it into a proper test under the test suite with assertions.

Additional issues if the file is kept:

  • os (line 2) is imported but never used.
  • Top-level side effects (printing, model initialization) should be guarded with if __name__ == "__main__":.
  • No error handling or assertions — it doesn't verify behavior.
🤖 Prompt for AI Agents
In `@reproduce_issue.py` around lines 1 - 7, reproduce_issue.py is a temporary
debug/demo script and should be removed from the repo or converted into a proper
automated test; if you choose to keep it, eliminate the unused import os, wrap
the top-level print and cache.initialize_model() calls inside an if __name__ ==
"__main__": guard, and replace bare side-effecting calls with a test that
instantiates SemanticCache and calls initialize_model() with assertions (or
try/except checks) to validate expected behavior instead of printing.

@coderabbitai coderabbitai Bot mentioned this pull request Feb 11, 2026
17 tasks
samarjeet818 pushed a commit to samarjeet818/NexumDB that referenced this pull request Feb 11, 2026
- Update PR reference from aviralgarg05#124 to aviralgarg05#131 in docs/EXPLAIN_FEATURE.md
- Fixes: Incorrect PR number reference causing confusion about feature provenance
- This is the correct PR for the EXPLAIN query plan command feature
aviralgarg05 pushed a commit that referenced this pull request Feb 11, 2026
* feat: Add EXPLAIN query plan command

Closes #66

## Implementation Details

### New Features
- Add EXPLAIN command to analyze and explain SQL query execution plans
- Parse SQL query structure with complexity analysis (0-10 scale)
- Detect query clauses (WHERE, JOIN, ORDER BY, GROUP BY)
- Show semantic cache analysis with similarity scoring (0.0-1.0)
- Display RL agent Q-values and optimization recommendations
- Provide execution strategy suggestions (CACHE_HIT, CACHE_MISS, INDEX_SCAN, FULL_SCAN)

### Changes Made

#### nexum_ai/optimizer.py
- Enhanced explain_query_plan() with:
  - Input validation for empty/None queries
  - Try-except error handling with graceful fallback
  - Comprehensive docstring with Args/Returns
  - Support for all 5 SQL query types (SELECT, INSERT, UPDATE, DELETE, CREATE)

- Enhanced format_explain_output() with:
  - Dictionary validation
  - Field width limits (60 chars for queries, 20 chars for actions)
  - Defensive rounding to 4 decimal places

- Enhanced SemanticCache.explain_query() with:
  - Input validation
  - Exception handling for vectorization
  - Fallback when transformer unavailable

- Enhanced QueryOptimizer.explain_action() with:
  - Parameter validation
  - Epsilon clamping to [0.0, 1.0]
  - Safe Q-value filtering and rounding

#### nexum_core/src/bridge/mod.rs
- Added 5 new test methods for EXPLAIN functionality:
  - test_query_explainer_select_queries()
  - test_query_explainer_mutation_queries()
  - test_query_explainer_raw_output()
  - test_query_explainer_q_values_present()

#### nexum_ai/tests/test_explain_integration.py
- Added 17 comprehensive integration tests covering:
  - Query type detection (SELECT, INSERT, UPDATE, DELETE, CREATE)
  - Cache scenarios (hit, miss, consistency)
  - Clause detection (WHERE, JOIN, ORDER BY, GROUP BY)
  - Output formatting completeness
  - Query complexity estimation
  - Long query truncation
  - RL agent Q-values display
  - Custom optimizer support

### Testing & Verification
- ✅ All 8 core verification tests PASSED
- ✅ All 17 integration tests PASSED
- ✅ All 40 existing regression tests PASSED
- ✅ Zero regressions detected
- ✅ 100% test pass rate (65/65 tests)

### Features
- Query parsing and analysis
- Complexity calculation
- Clause detection
- Cache analysis with similarity scoring
- RL agent integration
- Comprehensive error handling
- Graceful fallback for missing dependencies

### Quality Assurance
- Error handling for invalid input
- Defensive programming patterns
- Comprehensive docstrings
- Fallback mechanisms for optional dependencies
- All tests passing with no regressions

* fix: Correct PR number reference in documentation

- Update PR reference from #124 to #131 in docs/EXPLAIN_FEATURE.md
- Fixes: Incorrect PR number reference causing confusion about feature provenance
- This is the correct PR for the EXPLAIN query plan command feature

* fix: Address all 14 reviewer comments from PR #131

- Issue #1-4: Fix PR reference, markdown linting, test counts, stale dates
- Issue #5: Implement module-level SemanticCache for performance
- Issue #6: Fix epsilon mismatch in RL agent error fallback
- Issue #7: Standardize error handling to graceful fallback pattern
- Issue #8: Add defensive error message truncation for width constraint
- Issue #9: Add test setup/teardown for cache artifact cleanup
- Issue #10: Add boundary case test for query complexity clamping
- Issue #11: Add comprehensive medication query test validating RL Agent
- Issue #12: Remove unnecessary super:: prefixes from Rust bridge tests
- Issue #13: Strengthen raw output assertion with structural validation
- Issue #14: Add proper RL Agent and strategy section assertions

* fix: Correct action name truncation width in documentation

- Update docs to match actual implementation (15 chars, not 20)
- Fixes CodeRabbit review comment about documentation discrepancy
- Ensures consistency between docs and code behavior

* fix: Address all 6 CodeRabbit review issues

- Issue #1: Correct action name truncation width (20 → 15 chars) in docs
- Issue #2: Accept tuples in available_actions validation (list | tuple)
- Issue #3: Fix epsilon default mismatch (0.2 → 0.1) with QueryOptimizer
- Issue #4: Add implementation notes for cache-hit test fragility
- Issue #5: Call setup/teardown methods in run_all_tests for test isolation
- Issue #6: Check for actual cache_analysis key instead of 'optimization'

* fix: Resolve all CI test failures

- Remove unused 'json' import from test_explain_integration.py (Python linting)
- Reformat multi-line assert macro in Rust bridge/mod.rs for fmt compliance

All 18 Python integration tests passing locally. This completes fixes for all 6 CodeRabbit review issues plus CI failures.

* fix: resolve explain review issues and unblock CI checks

* fix: address latest coderabbit nits in explain formatting and tests

---------

Co-authored-by: unknown <010samarjeet@gamil.com>

@aviralgarg05 aviralgarg05 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve the conflicts pls

@bibhupradhanofficial bibhupradhanofficial changed the title feat: Configurable AI Embedding Model feat: configurable AI Embedding Model Feb 11, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@nexum_ai/optimizer.py`:
- Around line 78-89: Rust bridge creates a new SemanticCache for every vectorize
call causing redundant model reloads; add and expose a module-level getter
(e.g., get_or_create_cache) that returns the existing singleton (the same
instance used by _get_default_cache/explain_query_plan) so Rust can call the
getter instead of constructing SemanticCache each time; implement
get_or_create_cache() to return the module-level singleton and ensure it returns
the same object that initialize_model/configure uses so SentenceTransformer or
HF AutoModel weights are loaded once and reused by vectorize invoked from Rust.

Comment thread nexum_ai/optimizer.py
Comment on lines 78 to +89
def initialize_model(self) -> None:
"""Initialize local embedding model - deferred to avoid import errors"""
model_name = os.environ.get('NEXUM_EMBEDDING_MODEL', 'all-MiniLM-L6-v2')

# Try SentenceTransformer first (preferred for embedding models)
try:
from sentence_transformers import SentenceTransformer
self.model = SentenceTransformer('all-MiniLM-L6-v2')
logger.info("Semantic cache initialized with all-MiniLM-L6-v2")
self.model = SentenceTransformer(model_name)
print(f"Semantic cache initialized with SentenceTransformer: {model_name}")
self.hf_model = None
self.hf_tokenizer = None
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, explore the repository structure to understand the layout
find . -type f -name "mod.rs" -o -name "optimizer.py" | head -20

Repository: aviralgarg05/NexumDB

Length of output: 243


🏁 Script executed:

# Check the Rust bridge code at the mentioned location
cat -n nexum_core/src/bridge/mod.rs 2>/dev/null || echo "File not found, searching for it..."
fd "mod.rs" | grep bridge

Repository: aviralgarg05/NexumDB

Length of output: 18691


🏁 Script executed:

# Check Python optimizer.py file
cat -n nexum_ai/optimizer.py 2>/dev/null | head -100

Repository: aviralgarg05/NexumDB

Length of output: 4629


🏁 Script executed:

# Search for SemanticCache instantiation patterns in Rust code
rg -n 'SemanticCache' --type rust -C 3

Repository: aviralgarg05/NexumDB

Length of output: 4390


🏁 Script executed:

# Search for _get_default_cache in Python
rg -n '_get_default_cache' --type py -C 5

Repository: aviralgarg05/NexumDB

Length of output: 1467


🏁 Script executed:

# Check how vectorize is called in the codebase - frequency and usage patterns
rg -n 'vectorize' --type rust -B 2 -A 2 | head -40

Repository: aviralgarg05/NexumDB

Length of output: 1327


🏁 Script executed:

# Verify if PyModule.import caches or re-imports
rg -n 'explain_query_plan' nexum_ai/optimizer.py -A 20 | head -50

Repository: aviralgarg05/NexumDB

Length of output: 2216


Rust bridge creates new SemanticCache on every vectorize call, triggering redundant model loading.

The module-level _get_default_cache() singleton (line 23) only benefits explain_query_plan(). The Rust bridge (nexum_core/src/bridge/mod.rs:41) calls semantic_cache.call0() on every vectorize invocation, creating a fresh Python instance each time. With the HF AutoModel.from_pretrained() fallback, this means model weights are re-downloaded and re-loaded for every vectorization.

Expose the singleton to Rust by adding a module-level getter function (e.g., get_or_create_cache()) that returns the shared instance, and have the Rust bridge call it instead of instantiating SemanticCache() directly.

🤖 Prompt for AI Agents
In `@nexum_ai/optimizer.py` around lines 78 - 89, Rust bridge creates a new
SemanticCache for every vectorize call causing redundant model reloads; add and
expose a module-level getter (e.g., get_or_create_cache) that returns the
existing singleton (the same instance used by
_get_default_cache/explain_query_plan) so Rust can call the getter instead of
constructing SemanticCache each time; implement get_or_create_cache() to return
the module-level singleton and ensure it returns the same object that
initialize_model/configure uses so SentenceTransformer or HF AutoModel weights
are loaded once and reused by vectorize invoked from Rust.

@aviralgarg05 aviralgarg05 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the issue it solves

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
nexum_core/src/bridge/mod.rs (1)

278-306: ⚠️ Potential issue | 🟡 Minor

Test uses .pkl path after SQLite migration — misleading naming

Both with_cache_file("test_rust_cache.pkl") calls rely on the Python backend silently redirecting to cache/test_rust_cache.sqlite. The test passes but the file name contradicts the PR's stated goal of switching to SQLite. Update to use .sqlite directly for clarity.

🔧 Suggested fix
-        let cache = SemanticCache::with_cache_file("test_rust_cache.pkl").unwrap();
+        let cache = SemanticCache::with_cache_file("test_rust_cache.sqlite").unwrap();
 ...
-        let cache2 = SemanticCache::with_cache_file("test_rust_cache.pkl").unwrap();
+        let cache2 = SemanticCache::with_cache_file("test_rust_cache.sqlite").unwrap();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nexum_core/src/bridge/mod.rs` around lines 278 - 306, Update the test to use
a .sqlite filename instead of .pkl to reflect the SQLite backend change: replace
uses of SemanticCache::with_cache_file("test_rust_cache.pkl") with a .sqlite
name (e.g., "test_rust_cache.sqlite") so SemanticCache::with_cache_file,
cache.save_cache(), cache.clear_cache(), and the persistence verification logic
operate on a clearly named SQLite file and avoid misleading .pkl naming.
nexum_ai/optimizer.py (1)

568-587: ⚠️ Potential issue | 🟠 Major

load_cache_json() data is in-memory only — lost on restart

load_cache_json() populates self.cache but never writes into the SQLite backend. save_cache() is now a WAL checkpoint rather than a JSON-writer, so after calling load_cache_json() there is no code path that persists the imported entries to SQLite. A process restart will call load_cache() → reads from SQLite → empty → all imported data is silently discarded.

Either insert the imported entries into SQLite immediately, or add a prominent warning to the docstring that callers must follow with _migrate_legacy_cache_if_needed() / direct DB inserts.

🔧 Suggested fix — persist JSON entries into SQLite after loading
 def load_cache_json(self, filepath: Optional[str] = None) -> None:
     """Load cache from JSON format"""
     if filepath is None:
         filepath = str(self.cache_path.with_suffix('.json'))
     
     if os.path.exists(filepath):
         try:
             with open(filepath, 'r') as f:
                 data = json.load(f)
             
-            self.cache = data.get('cache', [])
-            self.similarity_threshold = data.get('similarity_threshold', self.similarity_threshold)
+            entries = data.get('cache', [])
+            self.similarity_threshold = data.get('similarity_threshold', self.similarity_threshold)
+
+            if self._db_ok and entries:
+                now = int(time.time())
+                with self._db() as conn:
+                    conn.executemany(
+                        "INSERT INTO cache_entries(query, vector, result, created_at) VALUES(?,?,?,?)",
+                        [
+                            (str(e['query']), self._vector_to_blob(list(e['vector'])), str(e['result']), now)
+                            for e in entries
+                            if isinstance(e, dict) and all(k in e for k in ('query', 'vector', 'result'))
+                        ],
+                    )
+                self._load_recent_entries_into_memory()
+            else:
+                self.cache = entries
             
             logger.info(f"Semantic cache loaded from JSON: {filepath} ({len(self.cache)} entries)")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nexum_ai/optimizer.py` around lines 568 - 587, load_cache_json currently
loads entries into self.cache but does not persist them to the SQLite backend,
so imported data is lost on restart; update load_cache_json (or call a helper)
to insert or upsert the loaded entries into the SQLite store immediately after
setting self.cache (or call the existing _migrate_legacy_cache_if_needed() / the
same persistence path used by save_cache) so JSON-imported entries are written
into the DB and survive restarts; ensure you reference the same schema/columns
and use the module's DB insertion functions (or the logic used by save_cache)
and preserve similarity_threshold handling when persisting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nexum_ai/optimizer.py`:
- Around line 488-497: The save_cache method currently ignores its filepath
parameter; update save_cache (and its docstring) so callers aren't silently
misled: either remove the filepath parameter from save_cache signature,
docstring and any external calls, or (preferably for backward compatibility)
detect if filepath is not None and emit a DeprecationWarning via
warnings.warn("...deprecated...", DeprecationWarning) before proceeding with the
current SQLite logic; keep existing behavior using self._db(), self._db_ok,
self._write_meta('similarity_threshold', str(self.similarity_threshold)) and the
PRAGMA checkpoint, and ensure the docstring notes that filepath is deprecated
when you add the warning.
- Around line 470-486: The clear() method currently unlinks only the main SQLite
file but leaves the WAL/SHM sidecar files and leaves self._db_ok True; update
clear() to also attempt to remove the two sidecar files named
self.cache_path.name + "-wal" and self.cache_path.name + "-shm" (or build them
via self.cache_path.with_name/with_suffix pattern), wrapping each unlink in its
own try/except and logging success/failure via logger; after successfully
removing the files (or regardless if we fall back to deleting tables), ensure
self._db_ok is set to False so the next put/get will recreate a fresh DB instead
of replaying orphaned WAL content; reference the existing clear(),
self.cache_path, self._db(), self._db_ok, and logger symbols when making the
changes.
- Around line 698-703: The code reads cur.rowcount after the DB connection is
closed; move the access to cur.rowcount inside the with self._db() as conn:
block to avoid relying on cursor behavior after close. Specifically, inside the
block where cur = conn.execute("DELETE FROM cache_entries WHERE created_at < ?",
(cutoff,)), compute removed = int(cur.rowcount) if cur and cur.rowcount is not
None else 0 and then log the info (logger.info(...)) before exiting the with;
keep the subsequent call to self._load_recent_entries_into_memory() outside the
with as-is.
- Around line 141-143: The explicit index creation statement using
conn.execute("CREATE INDEX IF NOT EXISTS idx_cache_entries_id ON
cache_entries(id)") is redundant because id is defined as PRIMARY KEY (which
SQLite already indexes); remove that conn.execute call (the CREATE INDEX for
idx_cache_entries_id) from nexum_ai/optimizer.py so the schema setup no longer
emits this no-op DDL. Ensure any surrounding transaction or
schema-initialization logic (where conn.execute is called) remains intact after
deleting that single statement.
- Around line 343-344: The code currently re-invokes initialize_model() on every
vectorize() call when both self.model and self.hf_model remain None; add a
sentinel attribute (e.g., self._model_init_attempted or self._model_load_failed)
checked in vectorize before calling initialize_model so the import chain is only
attempted once, set that flag to True at the start or end of initialize_model()
(and update it to reflect success/failure as needed) and ensure initialize_model
still assigns self.model/self.hf_model on success so subsequent vectorize()
calls skip re-initialization.

In `@nexum_ai/tests/test_cache_integration.py`:
- Around line 51-52: Replace the brittle string replacement when computing
actual_cache_file with Path.with_suffix: instead of using
cache_file.replace('.pkl', '.sqlite'), convert cache_file to a Path and call
.with_suffix('.sqlite') (e.g., Path(cache_file).with_suffix('.sqlite')) so the
extension swap is safe; update both occurrences that currently set
actual_cache_file (and any similar replacements around lines referenced) to use
Path.with_suffix and keep the os.path.exists check as-is.

---

Outside diff comments:
In `@nexum_ai/optimizer.py`:
- Around line 568-587: load_cache_json currently loads entries into self.cache
but does not persist them to the SQLite backend, so imported data is lost on
restart; update load_cache_json (or call a helper) to insert or upsert the
loaded entries into the SQLite store immediately after setting self.cache (or
call the existing _migrate_legacy_cache_if_needed() / the same persistence path
used by save_cache) so JSON-imported entries are written into the DB and survive
restarts; ensure you reference the same schema/columns and use the module's DB
insertion functions (or the logic used by save_cache) and preserve
similarity_threshold handling when persisting.

In `@nexum_core/src/bridge/mod.rs`:
- Around line 278-306: Update the test to use a .sqlite filename instead of .pkl
to reflect the SQLite backend change: replace uses of
SemanticCache::with_cache_file("test_rust_cache.pkl") with a .sqlite name (e.g.,
"test_rust_cache.sqlite") so SemanticCache::with_cache_file, cache.save_cache(),
cache.clear_cache(), and the persistence verification logic operate on a clearly
named SQLite file and avoid misleading .pkl naming.

---

Duplicate comments:
In `@nexum_ai/optimizer.py`:
- Around line 297-339: Replace all print() calls in initialize_model with the
module logger (e.g., logger.info/logger.warning) so messages go through the
configured logging system and update the messages at the SentenceTransformer and
transformers branches (references: initialize_model, self.model, self.hf_model,
self.hf_tokenizer). Also fix the PEP8 indentation inside the second except
ImportError block (remove the extra leading space before the print/assignment
lines) so the except block aligns with the other exception handlers; ensure the
fallback assignments (self.model = None, self.hf_model = None, self.hf_tokenizer
= None) remain in the corrected block and use logger.warning for warning
messages.
- Around line 349-371: The HF branch in vectorize() re-imports torch every call
and silently falls back to _fallback_vectorize(), which corrupts embedding
space; instead, import and store the module once in initialize_model() as
self._torch and use self._torch here (replace local import torch and torch.*
references), and change the except block in vectorize() (around
self.hf_model/self.hf_tokenizer usage) to not return _fallback_vectorize(text)
on failure—log the error and re-raise the exception (or raise a specific
VectorizationError) so callers can decide how to handle a failed HF encoding
rather than mixing incompatible vector spaces.

Comment thread nexum_ai/optimizer.py
Comment on lines +141 to +143
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_cache_entries_id ON cache_entries(id)"
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

idx_cache_entries_id index is redundant — SQLite already indexes the PRIMARY KEY

id INTEGER PRIMARY KEY AUTOINCREMENT implicitly creates a B-tree index in SQLite. The explicit CREATE INDEX … ON cache_entries(id) is a no-op that adds DDL noise.

♻️ Suggested fix
-                conn.execute(
-                    "CREATE INDEX IF NOT EXISTS idx_cache_entries_id ON cache_entries(id)"
-                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_cache_entries_id ON cache_entries(id)"
)
# Removed redundant index creation - SQLite automatically indexes INTEGER PRIMARY KEY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nexum_ai/optimizer.py` around lines 141 - 143, The explicit index creation
statement using conn.execute("CREATE INDEX IF NOT EXISTS idx_cache_entries_id ON
cache_entries(id)") is redundant because id is defined as PRIMARY KEY (which
SQLite already indexes); remove that conn.execute call (the CREATE INDEX for
idx_cache_entries_id) from nexum_ai/optimizer.py so the schema setup no longer
emits this no-op DDL. Ensure any surrounding transaction or
schema-initialization logic (where conn.execute is called) remains intact after
deleting that single statement.

Comment thread nexum_ai/optimizer.py
Comment on lines +343 to 344
if self.model is None and self.hf_model is None:
self.initialize_model()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

initialize_model() re-attempted on every vectorize() call when all model loads failed

If both sentence-transformers and transformers are unavailable, initialize_model() leaves self.model = None and self.hf_model = None. Because the guard condition if self.model is None and self.hf_model is None is still True on the next call, every subsequent vectorize() invocation re-attempts the full import chain (two ImportErrors each time). Add a sentinel flag to short-circuit after the first failed attempt.

🔧 Suggested fix
+        self._model_init_attempted: bool = False
         ...

 def vectorize(self, text: str) -> List[float]:
     """Convert text to embedding vector"""
-    if self.model is None and self.hf_model is None:
+    if not self._model_init_attempted and self.model is None and self.hf_model is None:
         self.initialize_model()

 def initialize_model(self) -> None:
     """Initialize local embedding model - deferred to avoid import errors"""
+    self._model_init_attempted = True
     model_name = os.environ.get('NEXUM_EMBEDDING_MODEL', 'all-MiniLM-L6-v2')
     ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nexum_ai/optimizer.py` around lines 343 - 344, The code currently re-invokes
initialize_model() on every vectorize() call when both self.model and
self.hf_model remain None; add a sentinel attribute (e.g.,
self._model_init_attempted or self._model_load_failed) checked in vectorize
before calling initialize_model so the import chain is only attempted once, set
that flag to True at the start or end of initialize_model() (and update it to
reflect success/failure as needed) and ensure initialize_model still assigns
self.model/self.hf_model on success so subsequent vectorize() calls skip
re-initialization.

Comment thread nexum_ai/optimizer.py
Comment on lines 470 to +486
def clear(self) -> None:
"""Clear the cache"""
self.cache.clear()
# Remove cache file when clearing
if self.cache_path.exists():
self.cache_path.unlink()
logger.info("Cache file deleted")
try:
self.cache_path.unlink()
logger.info("Cache file deleted")
except Exception:
logger.exception("Failed to delete cache file; attempting to clear tables")
if self._db_ok:
try:
with self._db() as conn:
conn.execute("DELETE FROM cache_entries")
conn.execute("DELETE FROM meta")
except Exception:
logger.exception("Failed to clear SQLite cache tables")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

clear() leaves orphaned .sqlite-wal / .sqlite-shm files after deleting the main database

WAL mode creates {cache_path}-wal and {cache_path}-shm alongside the main file. self.cache_path.unlink() removes only the .sqlite file, leaving the auxiliary files behind. SQLite recovers gracefully on the next open, but the stale files waste disk space and can confuse monitoring/tooling. Additionally self._db_ok remains True after clear(), so the next put() / get() transparently re-creates the DB—potentially replaying data from an orphaned WAL.

🔧 Suggested fix
 if self.cache_path.exists():
     try:
         self.cache_path.unlink()
+        for aux_suffix in ('-wal', '-shm'):
+            aux = self.cache_path.parent / (self.cache_path.name + aux_suffix)
+            try:
+                aux.unlink(missing_ok=True)
+            except Exception:
+                logger.warning("Could not remove SQLite auxiliary file: %s", aux)
         logger.info("Cache file deleted")
+        self._db_ok = False
     except Exception:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nexum_ai/optimizer.py` around lines 470 - 486, The clear() method currently
unlinks only the main SQLite file but leaves the WAL/SHM sidecar files and
leaves self._db_ok True; update clear() to also attempt to remove the two
sidecar files named self.cache_path.name + "-wal" and self.cache_path.name +
"-shm" (or build them via self.cache_path.with_name/with_suffix pattern),
wrapping each unlink in its own try/except and logging success/failure via
logger; after successfully removing the files (or regardless if we fall back to
deleting tables), ensure self._db_ok is set to False so the next put/get will
recreate a fresh DB instead of replaying orphaned WAL content; reference the
existing clear(), self.cache_path, self._db(), self._db_ok, and logger symbols
when making the changes.

Comment thread nexum_ai/optimizer.py
Comment on lines 488 to +497
def save_cache(self, filepath: Optional[str] = None) -> None:
"""Save cache to disk using JSON format (secure default)"""
if filepath is None:
filepath = str(self.cache_path)

# Use JSON format by default for security
json_filepath = filepath.replace('.pkl', '.json') if filepath.endswith('.pkl') else filepath
self.save_cache_json(json_filepath)
"""Persist cache to SQLite backend (no-op for in-memory only mode)"""
if not self._db_ok:
return
self._write_meta('similarity_threshold', str(self.similarity_threshold))
try:
with self._db() as conn:
conn.execute("PRAGMA wal_checkpoint(TRUNCATE);")
except Exception:
logger.exception("SQLite cache checkpoint failed")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

save_cache(filepath) silently ignores its filepath parameter

The method signature preserves filepath: Optional[str] = None from the old pickle-based API, but the new implementation ignores it entirely. Any external caller that passed a custom filepath expecting data to be saved there will see a silent no-op. Either remove the parameter or document clearly that it is deprecated and raise a DeprecationWarning.

🔧 Suggested fix
-def save_cache(self, filepath: Optional[str] = None) -> None:
-    """Persist cache to SQLite backend (no-op for in-memory only mode)"""
+def save_cache(self, filepath: Optional[str] = None) -> None:
+    """Persist cache to SQLite backend (no-op for in-memory only mode).
+
+    .. deprecated::
+        The ``filepath`` parameter is no longer used; the SQLite backend is
+        written incrementally. Pass ``filepath`` to :meth:`save_cache_json`
+        instead for a portable export.
+    """
+    if filepath is not None:
+        import warnings
+        warnings.warn(
+            "The filepath argument to save_cache() is ignored; use save_cache_json(filepath) instead.",
+            DeprecationWarning,
+            stacklevel=2,
+        )
     if not self._db_ok:
         return
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nexum_ai/optimizer.py` around lines 488 - 497, The save_cache method
currently ignores its filepath parameter; update save_cache (and its docstring)
so callers aren't silently misled: either remove the filepath parameter from
save_cache signature, docstring and any external calls, or (preferably for
backward compatibility) detect if filepath is not None and emit a
DeprecationWarning via warnings.warn("...deprecated...", DeprecationWarning)
before proceeding with the current SQLite logic; keep existing behavior using
self._db(), self._db_ok, self._write_meta('similarity_threshold',
str(self.similarity_threshold)) and the PRAGMA checkpoint, and ensure the
docstring notes that filepath is deprecated when you add the warning.

Comment thread nexum_ai/optimizer.py
Comment on lines +698 to +703
try:
with self._db() as conn:
cur = conn.execute("DELETE FROM cache_entries WHERE created_at < ?", (cutoff,))
removed = int(cur.rowcount) if cur and cur.rowcount is not None else 0
logger.info(f"Cache expiration removed {removed} entries older than {max_age_hours} hours")
self._load_recent_entries_into_memory()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

cur.rowcount accessed after the connection is closed — move it inside the with block

cur is obtained inside with self._db() as conn:, which calls conn.close() on exit. cur.rowcount is then read after conn.close(). While CPython's sqlite3 caches rowcount as a plain Python integer on the cursor object, reading it after connection close is an implementation-detail reliance. Move the read inside the with block.

🔧 Suggested fix
         try:
             with self._db() as conn:
                 cur = conn.execute("DELETE FROM cache_entries WHERE created_at < ?", (cutoff,))
-            removed = int(cur.rowcount) if cur and cur.rowcount is not None else 0
+                removed = cur.rowcount if cur.rowcount >= 0 else 0
             logger.info(f"Cache expiration removed {removed} entries older than {max_age_hours} hours")
             self._load_recent_entries_into_memory()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nexum_ai/optimizer.py` around lines 698 - 703, The code reads cur.rowcount
after the DB connection is closed; move the access to cur.rowcount inside the
with self._db() as conn: block to avoid relying on cursor behavior after close.
Specifically, inside the block where cur = conn.execute("DELETE FROM
cache_entries WHERE created_at < ?", (cutoff,)), compute removed =
int(cur.rowcount) if cur and cur.rowcount is not None else 0 and then log the
info (logger.info(...)) before exiting the with; keep the subsequent call to
self._load_recent_entries_into_memory() outside the with as-is.

Comment on lines +51 to 52
actual_cache_file = cache_file.replace('.pkl', '.sqlite')
assert os.path.exists(actual_cache_file), f"Cache file {actual_cache_file} should exist"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use Path.with_suffix() instead of str.replace() for the extension swap

str.replace('.pkl', '.sqlite') replaces the first occurrence, so if temp_dir ever contains .pkl somewhere in its path (e.g., a system-temp path that includes .pkl), the result will be incorrect. Use the idiomatic Path API instead.

🔧 Suggested fix
-        actual_cache_file = cache_file.replace('.pkl', '.sqlite')
+        actual_cache_file = str(Path(cache_file).with_suffix('.sqlite'))
-            actual_custom_cache = custom_cache.replace('.pkl', '.sqlite')
+            actual_custom_cache = str(Path(custom_cache).with_suffix('.sqlite'))

Path is already imported at line 11.

Also applies to: 123-124

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nexum_ai/tests/test_cache_integration.py` around lines 51 - 52, Replace the
brittle string replacement when computing actual_cache_file with
Path.with_suffix: instead of using cache_file.replace('.pkl', '.sqlite'),
convert cache_file to a Path and call .with_suffix('.sqlite') (e.g.,
Path(cache_file).with_suffix('.sqlite')) so the extension swap is safe; update
both occurrences that currently set actual_cache_file (and any similar
replacements around lines referenced) to use Path.with_suffix and keep the
os.path.exists check as-is.

@aviralgarg05

Copy link
Copy Markdown
Owner

Mention the issue it solves

Mention it in the description here

@aviralgarg05 aviralgarg05 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants