Description
HNSW range_search's iterator path has a hard recall ceiling of ~50% on
certain structured datasets, caused by directed reachability of the
level-0 graph — forward BFS from entry_point along out-edges covers only
about half of the nodes, and no parameter tuning (ef, range_search_level,
seed choice) changes this.
Measurements
On the dataset attached to milvus-io/milvus#44854 (15000 vec, 40 dim,
M=128, efConstruction=10000):
ntotal = 15000
fwd_reach_from_entry = 7695 # iterator reaches ~51% of nodes
bwd_reach_to_entry = 15000 # reverse BFS covers every node
scc_count = 7306
largest_scc = 7695 # only SCC with size > 1
scc_size_eq_1 = 7305 # 7305 singleton SCCs
The level-0 graph decomposes into one SCC of 7695 nodes (containing
entry_point) plus 7305 source-only singletons. Each singleton has
out-edges pointing into the big SCC, but no node in the big SCC has an
out-edge pointing back, so the iterator cannot visit them.
Sampled forward BFS from 20 random seed nodes: min=7695, max=7696 —
the ceiling is invariant to starting seed.
Related
Description
HNSW
range_search's iterator path has a hard recall ceiling of ~50% oncertain structured datasets, caused by directed reachability of the
level-0 graph — forward BFS from
entry_pointalong out-edges covers onlyabout half of the nodes, and no parameter tuning (
ef,range_search_level,seed choice) changes this.
Measurements
On the dataset attached to milvus-io/milvus#44854 (15000 vec, 40 dim,
M=128, efConstruction=10000):The level-0 graph decomposes into one SCC of 7695 nodes (containing
entry_point) plus 7305 source-only singletons. Each singleton hasout-edges pointing into the big SCC, but no node in the big SCC has an
out-edge pointing back, so the iterator cannot visit them.
Sampled forward BFS from 20 random seed nodes: min=7695, max=7696 —
the ceiling is invariant to starting seed.
Related
ef >= ntotal * 0.5