Skip to content

[Feature Request] Coordinator-level shard pruning using timestamp range metadata #21148

@aallawala

Description

@aallawala

Problem

When a search request targets multiple indices (via wildcard patterns, aliases, or data streams), the coordinating node fans out a can_match probe to every shard across all matching indices, even when the query contains a narrow @timestamp range filter that clearly excludes most of them.

For time-series workloads with high index counts (e.g., daily indices with months of history, or data streams with many backing indices), this means:

  • Unnecessary network round-trips to shards that cannot possibly match the query's time range
  • Increased coordinator overhead assembling and waiting on responses from shards that will all return "no match"
  • Higher tail latency on time-bounded queries, especially when older indices are on slower storage tiers (e.g., warm/UltraWarm nodes backed by S3)

The can_match phase mitigates this at the shard level using BKD tree min/max stats, but the coordinating node still has to send the probe and wait for the response from every shard. For a query spanning 1 hour across 365 days of daily indices, that's ~364 unnecessary network round-trips.

Proposed Solution

Store per-index @timestamp min/max bounds in IndexMetadata within cluster state. Since cluster state is already replicated to every node, the coordinating node can check these bounds locally — a pure memory read — and rewrite range queries on @timestamp into MatchNone for indices whose bounds don't overlap the query range. These shards are skipped entirely without any network probe.

Implementation outline

  1. Track @timestamp range in IndexMetadata — maintain a min/max field range (e.g., an IndexLongFieldRange structure) that is updated as documents are indexed. Propagate via cluster state updates.

  2. Coordinator rewrite phase in CanMatchPreFilterSearchPhase — before fanning out can_match probes, iterate over candidate shards and check the owning index's timestamp bounds against the query's range filter. Mark non-overlapping shards as pre-filtered and skip them.

  3. Query rewrite integration — for shards that can be pruned, rewrite the query into MatchNoneQueryBuilder so existing search infrastructure handles them cleanly.

Scope considerations

  • Indices without populated timestamp bounds (e.g., non-time-series indices) would fall through to the existing can_match path — no behavioral change.
  • The event.ingested field could also benefit from the same treatment.
  • This optimization is complementary to existing shard-level can_match and the Context Aware Segments RFC ([RFC] Context Aware Segments #18576).

Impact

This would significantly reduce search latency for the most common time-series query pattern: a narrow time range against a large index history. The benefit scales with index count — the more backing indices or daily indices that exist, the more shards are pruned without network cost.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions