Semantic Search Implementation The QueryWise application implements semantic search through several key components:

Vector Embeddings with Jina AI Uses JinaEmbeddings with the jina-embeddings-v2-base-en model Converts document text into high-dimensional vector representations that capture semantic meaning Configured in backend/core/processing.py
FAISS Vector Store Implements Facebook AI Similarity Search (FAISS) for efficient vector similarity search Stores and indexes document embeddings for fast retrieval Supports persistent caching with SHA256 hashing for performance
RAG (Retrieval-Augmented Generation) Pipeline Combines semantic search with LLM generation Uses a retriever that finds the top 7 most semantically similar document chunks (search_kwargs={"k": 7}) Integrates with Google Gemini 2.0 Flash Lite for answer generation
Document Processing Pipeline

Text is split into chunks

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150) chunks = text_splitter.split_documents(documents)

Chunks are converted to embeddings and stored in FAISS

vector_store = FAISS.from_documents(chunks, embeddings_client)

Semantic retrieval setup

retriever = vector_store.as_retriever(search_kwargs={"k": 7}) 5. Key Features Semantic Understanding: Goes beyond keyword matching to understand meaning and context Persistent Indexes: Caches vector embeddings to disk for reuse Concurrent Processing: Handles multiple questions simultaneously Context-Aware Answers: Retrieves relevant document sections based on semantic similarity This is a sophisticated semantic search implementation that allows users to ask natural language questions about documents and get contextually relevant answers, rather than just keyword-based search results.

Revert

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text is split into chunks

Chunks are converted to embeddings and stored in FAISS

Semantic retrieval setup

FilesExpand file tree

semantic_search.md

Latest commit

History

semantic_search.md

File metadata and controls

Text is split into chunks

Chunks are converted to embeddings and stored in FAISS

Semantic retrieval setup