Skip to content

🔥(search) remove embedding/hybrid search, keep BM25 only#107

Merged
StephanMeijer merged 1 commit intomainfrom
feat/remove-embeddings
May 5, 2026
Merged

🔥(search) remove embedding/hybrid search, keep BM25 only#107
StephanMeijer merged 1 commit intomainfrom
feat/remove-embeddings

Conversation

@StephanMeijer
Copy link
Copy Markdown
Collaborator

Summary

  • Remove all embedding/hybrid search functionality, keeping only BM25 full-text search
  • This is a permanent removal with breaking API changes (search_type parameter removed)
  • All 75 tests pass; Django check passes

Changes

Deleted files:

  • core/services/embedding.py
  • core/management/commands/create_search_pipeline.py
  • core/management/commands/reindex_with_embedding.py
  • 10 embedding-related test files

Modified files:

  • core/services/search.py - removed hybrid search branches
  • core/services/indexing.py - removed chunking logic
  • core/services/opensearch.py - removed check_hybrid_search_enabled()
  • core/services/opensearch_configuration.py - removed chunks/embedding mappings
  • core/enums.py - removed SearchTypeEnum
  • core/schemas.py - removed search_type field
  • core/views.py - removed search_type parameter
  • core/apps.py - simplified ready()
  • core/admin.py - removed pipeline admin action
  • find/settings.py - removed 9 embedding/hybrid settings
  • pyproject.toml - removed langchain-text-splitters
  • Documentation and env templates updated

Breaking Changes

  • The search_type API parameter has been removed
  • All searches now use BM25 full-text search exclusively
  • Environment variables EMBEDDING_* and HYBRID_SEARCH_* are no longer used

Preserved

  • BM25 full-text search (unchanged)
  • Language analyzers (fr, en, de, nl, und)
  • Trigram settings (TRIGRAMS_BOOST, TRIGRAMS_MINIMUM_SHOULD_MATCH)
  • py3langid dependency (used for language detection)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the backend’s embedding/hybrid-search stack and standardizes the search subsystem on BM25 full-text search only. It touches the runtime search/indexing code, API schema, operational settings, docs, and a large set of embedding-related tests/helpers.

Changes:

  • Removed hybrid-search runtime pieces: embedding service, search pipeline/reindex commands, SearchTypeEnum, and search_type request handling.
  • Simplified OpenSearch search/indexing/mapping/configuration to BM25-only behavior and cleaned related settings/admin/docs.
  • Deleted embedding/hybrid-specific tests, mocks, and evaluation helpers.

Reviewed changes

Copilot reviewed 30 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/backend/uv.lock Lockfile cleanup for removed hybrid/embedding dependencies.
src/backend/pyproject.toml Removed langchain-text-splitters dependency.
src/backend/find/settings.py Dropped hybrid/embedding environment settings.
src/backend/evaluation/tests/test_evaluate_search_engine.py Deleted evaluation command tests.
src/backend/evaluation/management/commands/evaluate_search_engine.py Switched evaluation command to BM25-only search path.
src/backend/core/views.py Removed search_type plumbing from search endpoint.
src/backend/core/utils.py Removed search-pipeline deletion helper; kept index utilities.
src/backend/core/tests/utils.py Removed hybrid-search test helpers/imports.
src/backend/core/tests/test_search.py Deleted search-service hybrid/full-text tests.
src/backend/core/tests/test_indexing.py Deleted indexing/analyzer tests.
src/backend/core/tests/test_embedding.py Deleted embedding service tests.
src/backend/core/tests/test_api_documents_search.py Deleted search endpoint tests.
src/backend/core/tests/test_api_documents_search_access_control.py Deleted search access-control tests.
src/backend/core/tests/test_api_documents_index_single.py Deleted single-document indexing tests.
src/backend/core/tests/mock/albert_embedding_response.py Deleted embedding API mock payload.
src/backend/core/tests/commands/test_reindex_with_embedding.py Deleted reindex-with-embedding command tests.
src/backend/core/tests/commands/test_create_search_pipeline.py Deleted search-pipeline command tests.
src/backend/core/services/search.py Removed hybrid/vector query branches; kept BM25 query building.
src/backend/core/services/opensearch.py Removed hybrid-search configuration check helper.
src/backend/core/services/opensearch_configuration.py Removed vector/chunk mappings from index schema.
src/backend/core/services/indexing.py Removed chunking/embedding logic from document preparation.
src/backend/core/services/embedding.py Deleted embedding client implementation.
src/backend/core/schemas.py Removed search_type from request schema.
src/backend/core/management/commands/reindex_with_embedding.py Deleted embedding reindex command.
src/backend/core/management/commands/create_search_pipeline.py Deleted hybrid search-pipeline command.
src/backend/core/enums.py Removed SearchTypeEnum.
src/backend/core/apps.py Removed startup pipeline bootstrap.
src/backend/core/admin.py Removed admin action for ensuring search pipeline.
env.d/development/common.dist Removed hybrid/embedding dev env examples.
docs/setup-indexer.md Removed semantic/hybrid-search setup docs.
docs/env.md Removed hybrid/embedding env var documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/core/services/opensearch_configuration.py
Comment thread src/backend/core/schemas.py
Comment thread src/backend/evaluation/management/commands/evaluate_search_engine.py Outdated
Comment thread src/backend/core/views.py
Comment thread src/backend/evaluation/management/commands/evaluate_search_engine.py Outdated
Comment thread src/backend/core/services/indexing.py
@StephanMeijer StephanMeijer force-pushed the feat/remove-embeddings branch 2 times, most recently from bdc578f to 7619343 Compare May 5, 2026 10:44
Signed-off-by: Stephan Meijer <me@stephanmeijer.com>
@StephanMeijer StephanMeijer force-pushed the feat/remove-embeddings branch from 7619343 to cb7d1e8 Compare May 5, 2026 13:24
@StephanMeijer StephanMeijer merged commit cb7d1e8 into main May 5, 2026
11 checks passed
@StephanMeijer StephanMeijer deleted the feat/remove-embeddings branch May 5, 2026 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants