🔥(search) remove embedding/hybrid search, keep BM25 only#107
Merged
StephanMeijer merged 1 commit intomainfrom May 5, 2026
Merged
🔥(search) remove embedding/hybrid search, keep BM25 only#107StephanMeijer merged 1 commit intomainfrom
StephanMeijer merged 1 commit intomainfrom
Conversation
ca57e79 to
1ae870d
Compare
There was a problem hiding this comment.
Pull request overview
This PR removes the backend’s embedding/hybrid-search stack and standardizes the search subsystem on BM25 full-text search only. It touches the runtime search/indexing code, API schema, operational settings, docs, and a large set of embedding-related tests/helpers.
Changes:
- Removed hybrid-search runtime pieces: embedding service, search pipeline/reindex commands,
SearchTypeEnum, andsearch_typerequest handling. - Simplified OpenSearch search/indexing/mapping/configuration to BM25-only behavior and cleaned related settings/admin/docs.
- Deleted embedding/hybrid-specific tests, mocks, and evaluation helpers.
Reviewed changes
Copilot reviewed 30 out of 31 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/backend/uv.lock |
Lockfile cleanup for removed hybrid/embedding dependencies. |
src/backend/pyproject.toml |
Removed langchain-text-splitters dependency. |
src/backend/find/settings.py |
Dropped hybrid/embedding environment settings. |
src/backend/evaluation/tests/test_evaluate_search_engine.py |
Deleted evaluation command tests. |
src/backend/evaluation/management/commands/evaluate_search_engine.py |
Switched evaluation command to BM25-only search path. |
src/backend/core/views.py |
Removed search_type plumbing from search endpoint. |
src/backend/core/utils.py |
Removed search-pipeline deletion helper; kept index utilities. |
src/backend/core/tests/utils.py |
Removed hybrid-search test helpers/imports. |
src/backend/core/tests/test_search.py |
Deleted search-service hybrid/full-text tests. |
src/backend/core/tests/test_indexing.py |
Deleted indexing/analyzer tests. |
src/backend/core/tests/test_embedding.py |
Deleted embedding service tests. |
src/backend/core/tests/test_api_documents_search.py |
Deleted search endpoint tests. |
src/backend/core/tests/test_api_documents_search_access_control.py |
Deleted search access-control tests. |
src/backend/core/tests/test_api_documents_index_single.py |
Deleted single-document indexing tests. |
src/backend/core/tests/mock/albert_embedding_response.py |
Deleted embedding API mock payload. |
src/backend/core/tests/commands/test_reindex_with_embedding.py |
Deleted reindex-with-embedding command tests. |
src/backend/core/tests/commands/test_create_search_pipeline.py |
Deleted search-pipeline command tests. |
src/backend/core/services/search.py |
Removed hybrid/vector query branches; kept BM25 query building. |
src/backend/core/services/opensearch.py |
Removed hybrid-search configuration check helper. |
src/backend/core/services/opensearch_configuration.py |
Removed vector/chunk mappings from index schema. |
src/backend/core/services/indexing.py |
Removed chunking/embedding logic from document preparation. |
src/backend/core/services/embedding.py |
Deleted embedding client implementation. |
src/backend/core/schemas.py |
Removed search_type from request schema. |
src/backend/core/management/commands/reindex_with_embedding.py |
Deleted embedding reindex command. |
src/backend/core/management/commands/create_search_pipeline.py |
Deleted hybrid search-pipeline command. |
src/backend/core/enums.py |
Removed SearchTypeEnum. |
src/backend/core/apps.py |
Removed startup pipeline bootstrap. |
src/backend/core/admin.py |
Removed admin action for ensuring search pipeline. |
env.d/development/common.dist |
Removed hybrid/embedding dev env examples. |
docs/setup-indexer.md |
Removed semantic/hybrid-search setup docs. |
docs/env.md |
Removed hybrid/embedding env var documentation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bdc578f to
7619343
Compare
Signed-off-by: Stephan Meijer <me@stephanmeijer.com>
7619343 to
cb7d1e8
Compare
jmaupetit
approved these changes
May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
Deleted files:
core/services/embedding.pycore/management/commands/create_search_pipeline.pycore/management/commands/reindex_with_embedding.pyModified files:
core/services/search.py- removed hybrid search branchescore/services/indexing.py- removed chunking logiccore/services/opensearch.py- removedcheck_hybrid_search_enabled()core/services/opensearch_configuration.py- removed chunks/embedding mappingscore/enums.py- removedSearchTypeEnumcore/schemas.py- removedsearch_typefieldcore/views.py- removedsearch_typeparametercore/apps.py- simplifiedready()core/admin.py- removed pipeline admin actionfind/settings.py- removed 9 embedding/hybrid settingspyproject.toml- removedlangchain-text-splittersBreaking Changes
search_typeAPI parameter has been removedEMBEDDING_*andHYBRID_SEARCH_*are no longer usedPreserved
TRIGRAMS_BOOST,TRIGRAMS_MINIMUM_SHOULD_MATCH)py3langiddependency (used for language detection)