Skip to content

[Cosmos] Hybrid Search query pipeline#38275

Merged
simorenoh merged 39 commits intoAzure:mainfrom
simorenoh:fts-query
Nov 19, 2024
Merged

[Cosmos] Hybrid Search query pipeline#38275
simorenoh merged 39 commits intoAzure:mainfrom
simorenoh:fts-query

Conversation

@simorenoh
Copy link
Copy Markdown
Member

@simorenoh simorenoh commented Nov 1, 2024

Adds support for performing full text search queries through the introduction of the hybrid search query pipeline. This consists of the newly added hybrid_search_aggregator, which performs the necessary query steps to obtain the needed results.

With these changes, the SDK can now interpret queries utilizing key functions like FullTextContains(), FullTextContainsAll(), FullTextContainsAny(), Order By Rank <FullTextFunction>(), and Order By Rank RRF().

The design doc for the implementation can be found here: Hybrid Search Doc.
The new README in this PR also has additional information.

Still missing in this PR at the moment:

  • Samples for both sync and async
  • Tests for both sync and async
  • Additional README information on how to run these queries.
  • Cleaning up/ aggregating some of the repeated logic.
  • Possible optimizations to query plan fetching - will probably be addressed in a separate PR. Issue: [Cosmos] make queries fetch query plan in every query #38577

@github-actions github-actions Bot added the Cosmos label Nov 1, 2024
@azure-sdk
Copy link
Copy Markdown
Collaborator

API change check

API changes are not detected in this pull request.

@simorenoh simorenoh marked this pull request as ready for review November 5, 2024 21:25
@simorenoh simorenoh requested review from a team and annatisch as code owners November 5, 2024 21:25
Comment thread sdk/cosmos/azure-cosmos/README.md Outdated
Comment thread sdk/cosmos/azure-cosmos/README.md Outdated
Comment thread sdk/cosmos/azure-cosmos/test/test_query_hybrid_search.py
Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall - main comment is about the test matrix/coverage.

Copy link
Copy Markdown
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@FabianMeiswinkel
Copy link
Copy Markdown
Member

LGTM Thanks

Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now - thanks!

@simorenoh simorenoh merged commit 3de7a4c into Azure:main Nov 19, 2024
@simorenoh simorenoh deleted the fts-query branch November 19, 2024 03:04
l0lawrence pushed a commit to l0lawrence/azure-sdk-for-python that referenced this pull request Feb 19, 2025
* Create hybrid_search_aggregator.py

* others

* Update execution_dispatcher.py

* Update execution_dispatcher.py

* sync changes, need to look at vector + FTS/ skip + take

* async pipeline

* account for skip/take and simplify logics

* small hack for now

* fixing top/limit logic

* return only payload

* fix hack

* pylint

* simplifying further

* small changes

* adds readme, buffer limit, simplifies

* simplify async, CI green

* Update hybrid_search_aggregator.py

* Update sdk/cosmos/azure-cosmos/README.md

Co-authored-by: Anna Tisch <antisch@microsoft.com>

* update variable name

* add sync and async tests

* Update README.md

* simplifications, test fixes

* add wrong query tests

* pylint/cspell

* Update CHANGELOG.md

* small changes

* test updates

* Update hybrid_search_data.py

* cspell, samples

* change tops

* address comments

* Update hybrid_search_aggregator.py

* update pipeline description

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: Anna Tisch <antisch@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants