Skip to content

feat: support search aggregation API(#3437)#3438

Open
MrPresent-Han wants to merge 1 commit intomilvus-io:masterfrom
MrPresent-Han:support-search-agg-master
Open

feat: support search aggregation API(#3437)#3438
MrPresent-Han wants to merge 1 commit intomilvus-io:masterfrom
MrPresent-Han:support-search-agg-master

Conversation

@MrPresent-Han
Copy link
Copy Markdown
Contributor

support search aggregation on pymilvus

issue: #3437

@sre-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MrPresent-Han
To complete the pull request process, please assign tedxu after the PR has been reviewed.
You can assign the PR to them by writing /assign @tedxu in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@MrPresent-Han
Copy link
Copy Markdown
Contributor Author

/hold

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 95.37037% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.92%. Comparing base (1a94cc5) to head (7ae73d0).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
pymilvus/client/search_aggregation.py 94.47% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3438      +/-   ##
==========================================
+ Coverage   90.81%   90.92%   +0.11%     
==========================================
  Files          64       65       +1     
  Lines       13969    14194     +225     
==========================================
+ Hits        12686    12906     +220     
- Misses       1283     1288       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mergify mergify Bot added needs-dco and removed dco-passed labels Apr 23, 2026
@MrPresent-Han MrPresent-Han force-pushed the support-search-agg-master branch from 61cc355 to 3d997b7 Compare April 24, 2026 10:00
@mergify mergify Bot added dco-passed and removed needs-dco labels Apr 24, 2026
@MrPresent-Han MrPresent-Han force-pushed the support-search-agg-master branch from 3d997b7 to 12ff621 Compare April 24, 2026 10:11
@MrPresent-Han
Copy link
Copy Markdown
Contributor Author

/unhold

support search aggregation on pymilvus

issue: milvus-io#3437

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
@MrPresent-Han MrPresent-Han force-pushed the support-search-agg-master branch from 47a6394 to 7ae73d0 Compare April 24, 2026 10:20
@mergify mergify Bot added the ci-passed label Apr 24, 2026
Copy link
Copy Markdown
Contributor

@XuanYang-cn XuanYang-cn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase-1 search-aggregation API is well-scoped, mirrors issue #3437, and has strong unit coverage for request-side validation and response-side parsing. Public API changes are default-safe. Three blockers before a clean LGTM — see inline comments.

Blocking

  1. schema_pb2.pyi — silent rename of SearchResultData.order_by_field_valuesgroup_by_field_values from the proto submodule bump. Undocumented breaking surface change. Call out in PR body or split proto bump into its own PR so downstream consumers aren't surprised.
  2. prepare.pyaggregation_safe_factor has two ingestion paths (kwargs vs inside the search_params dict) that send different wire encodings and only the kwargs path is validated. The example itself uses the un-validated search_params dict path (examples/search_aggregation.py:240). Invalid values (floats/bools/negatives) ship silently. Unify on one canonical form or validate both.
  3. search_result.py:_split_agg_buckets — never checks sum(topks) == len(parsed) and collapses multi-query responses (nq > 1) into an nq=1 shape when agg_topks is absent. Add at least a warning + nq-aware fallback so bucket drops or shape collapses don't go silent.

Nits (inline): JSON-path rejection catches [...] only (not dotted meta.region); _MAX_NESTING_DEPTH = 4 magic; unused _oneof_value helper; bucket-key vs hit-field missing-name fallback inconsistency; docstring claims limit is ignored with no in-client warn; no end-to-end test through MilvusClient.search / AsyncMilvusClient.search / Collection.search — only Prepare is exercised.

HIGHLIGHT_RESULTS_FIELD_NUMBER: _ClassVar[int]
ELEMENT_INDICES_FIELD_NUMBER: _ClassVar[int]
ORDER_BY_FIELD_VALUES_FIELD_NUMBER: _ClassVar[int]
GROUP_BY_FIELD_VALUES_FIELD_NUMBER: _ClassVar[int]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking — silent breaking change. The proto submodule bump renames SearchResultData.order_by_field_valuesgroup_by_field_values. Any downstream consumer that handcrafted against this .pyi breaks with no warning. Either call this rename out explicitly in the PR description, or split the proto bump into its own PR so it lands with its own changelog entry.

)
)
search_params[AGGREGATION_SAFE_FACTOR] = aggregation_safe_factor

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking — dual ingestion path, one validated. aggregation_safe_factor passed as a kwarg hits this validator and becomes a top-level search_params KV. But users can also put it inside search_params["aggregation_safe_factor"] — which is what examples/search_aggregation.py:240 does — and it then flows through get_params (utils.py:278-293) into the serialized params blob, unvalidated, with a different wire shape. Invalid values (floats, negatives, bools, strings) ship silently via that path. Pick one canonical form and either reject the other or validate both.

# If the server didn't set agg_topks (older responses or nq==1), fall
# back to putting everything under a single query.
topks = list(res.agg_topks) if len(res.agg_topks) > 0 else [len(parsed)]
out: List[List[AggregationBucket]] = []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking — silent bucket drops / shape collapse.

  1. No check that sum(topks) == len(parsed). A buggy/future server that sends agg_topks=[2,1] with 4 buckets silently loses the trailing bucket.
  2. The else [len(parsed)] fallback triggers whenever agg_topks is empty — including an nq > 1 query where middleware drops the field. A 3-vector query then gets a one-element outer list shaped like nq=1.

Add at minimum a logger.warning + assert offset == len(parsed) after the loop, and an nq-aware fallback (e.g. use res.num_queries to emit the right number of empty inner lists).


_VALID_METRIC_OPS = {"avg", "sum", "count", "min", "max"}
_VALID_DIRECTIONS = {"asc", "desc"}
_SPECIAL_ORDER_KEYS = {"_count", "_key"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit — _MAX_NESTING_DEPTH = 4 is a client-side magic number. Does it mirror a server cap, a proto constraint, or is it arbitrary? Either source it from proto/config or add a one-line comment with the rationale so it's not brittle when the server limit changes.

message=(
f"SearchAggregation.fields does not yet support JSON path "
f"expressions, got {f!r}"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit — bracket-only JSON-path rejection. This catches meta['region'] and a[0] but lets meta.region through silently. Either widen the regex to cover dotted JSON paths, or narrow the error message to say "bracketed JSON path expressions not yet supported" so the rejection surface is explicit.

)
if self._sub_aggregation is not None:
self._sub_aggregation._check_depth(current + 1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit — _oneof_value is dead code. Neither AggregationHit nor AggregationBucket calls it; each inlines its own WhichOneof loop. Either use it for consistency across the two call sites, or drop it.

"""Resolve the populated oneof branch on a proto message; returns None if none set."""
which = msg.WhichOneof("value") if msg.DESCRIPTOR.oneofs_by_name.get("value") else None
return getattr(msg, which) if which else None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit — inconsistent missing-name fallback. AggregationHit falls back to str(field_id) when the proto has no field_name, but AggregationBucket.key falls back to empty string (verified in tests/test_search_aggregation.py:394 vs :450-454). Pick one behavior — the asymmetry will trip users who iterate both sides of the response.

ids (Optional[Union[List[int], List[str], str, int]]): The ids to use for the search.
Defaults to None.
search_aggregation (SearchAggregation, optional): Hierarchical bucket aggregation spec.
Mutually exclusive with group_by_field. When set, `limit` is ignored and the root
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit — docstring says "limit is ignored" when search_aggregation is set, but nothing in the client enforces or warns. A user passing both will silently have limit dropped at the server. Consider at least logger.warning when a non-default limit collides with search_aggregation, so the silent drop isn't a debugging black hole.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants