Skip to content

HNSW range_search iterator stuck at ~50% recall due to directed level-0 graph reachability #1565

@foxspy

Description

@foxspy

Description

HNSW range_search's iterator path has a hard recall ceiling of ~50% on
certain structured datasets, caused by directed reachability of the
level-0 graph — forward BFS from entry_point along out-edges covers only
about half of the nodes, and no parameter tuning (ef, range_search_level,
seed choice) changes this.

Measurements

On the dataset attached to milvus-io/milvus#44854 (15000 vec, 40 dim,
M=128, efConstruction=10000):

ntotal               = 15000
fwd_reach_from_entry = 7695        # iterator reaches ~51% of nodes
bwd_reach_to_entry   = 15000       # reverse BFS covers every node
scc_count            = 7306
largest_scc          = 7695        # only SCC with size > 1
scc_size_eq_1        = 7305        # 7305 singleton SCCs

The level-0 graph decomposes into one SCC of 7695 nodes (containing
entry_point) plus 7305 source-only singletons. Each singleton has
out-edges pointing into the big SCC, but no node in the big SCC has an
out-edge pointing back, so the iterator cannot visit them.

Sampled forward BFS from 20 random seed nodes: min=7695, max=7696 —
the ceiling is invariant to starting seed.

Related

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions