Skip to content

Feature request: TurboQuant distance metric for HNSW vector search #36327

@andrewluetgers

Description

@andrewluetgers

Is your feature request related to a problem? Please describe.
At high vector dimensionality (1536–5120+), float32 storage is expensive and existing quantization options (int8, binary/hamming) either sacrifice too much recall or offer limited compression. There is no native sub-int8 quantization option with correctness guarantees for HNSW traversal.

Describe the solution you'd like Native support for
TurboQuant as a distance-metric option.

TurboQuant is a new online vector quantization algorithm from Google Research that compresses vectors to 3–4 bits with provably near-optimal distortion, no training phase, and superior recall vs. Product Quantization in nearest neighbor search benchmarks. For a 5120-dim float32 vector this means ~6x memory reduction with near-lossless retrieval quality.

More details: https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

Describe alternatives you've considered
int8 and binary (hamming) quantization are the current best options in Vespa, but both involve significant recall degradation or storage overhead at high dimensionality compared to what TurboQuant demonstrates.

Additional context
The algorithm is data-oblivious (no dataset-specific calibration), making it well-suited for Vespa's real-time indexing model. A reference implementation of the QJL component — the 1-bit residual correction stage that makes the inner product estimator unbiased — is available at https://github.com/amirzandieh/QJL (Apache 2.0). The paper is arXiv:2504.19874, to be presented at ICLR 2026.

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions