Skip to content

Commit 77d76ab

Browse files
feat: add VectorChord benchmark support (#745)
* feat: add VectorChord support and VCHORDRQ index type * feat: add VectorChordRQ command to CLI * feat: add VectorChord support to README * feat: add VectorChordGraph support and configuration * feat: add max_scan_tuples parameter to VectorChordGraph * feat: enhance VectorChord with improved type safety and search functionality * feat: add vectorchord extension creation on connection Co-authored-by: edgar-p <edgar.p@kakaocorp.com>
1 parent f0a8d03 commit 77d76ab

File tree

8 files changed

+838
-2
lines changed

8 files changed

+838
-2
lines changed

README.md

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ All the database client supported
4141
| pinecone | `pip install vectordb-bench[pinecone]` |
4242
| weaviate | `pip install vectordb-bench[weaviate]` |
4343
| elastic, aliyun_elasticsearch| `pip install vectordb-bench[elastic]` |
44-
| pgvector, pgvectorscale, pgdiskann, alloydb | `pip install vectordb-bench[pgvector]` |
44+
| pgvector, pgvectorscale, pgdiskann, alloydb, vectorchord | `pip install vectordb-bench[pgvector]` |
4545
| pgvecto.rs | `pip install vectordb-bench[pgvecto_rs]` |
4646
| redis | `pip install vectordb-bench[redis]` |
4747
| memorydb | `pip install vectordb-bench[memorydb]` |
@@ -86,6 +86,7 @@ Options:
8686
Commands:
8787
pgvectorhnsw
8888
pgvectorivfflat
89+
vectorchordrq
8990
test
9091
weaviate
9192
```
@@ -179,6 +180,34 @@ Options:
179180
--help Show this message and exit.
180181
```
181182

183+
### Run VectorChord (vchordrq) from command line
184+
185+
VectorChord is a PostgreSQL extension for scalable vector similarity search using IVF + RaBitQ indexing.
186+
It is fully compatible with pgvector data types and provides faster queries and index builds.
187+
188+
```shell
189+
vectordbbench vectorchordrq \
190+
--user-name postgres --password '<password>' \
191+
--host localhost --port 5432 --db-name vectordb \
192+
--case-type Performance1536D50K \
193+
--lists 1000 --probes 10 --epsilon 1.9 \
194+
--spherical-centroids --build-threads 8 \
195+
--max-parallel-workers 15
196+
```
197+
198+
Key VectorChord-specific options:
199+
| Option | Description |
200+
|--------|-------------|
201+
| `--lists` | Number of IVF lists for vchordrq index |
202+
| `--probes` | Number of probes during search (default: 10) |
203+
| `--epsilon` | Reranking precision factor, 0.0-4.0 (default: 1.9) |
204+
| `--residual-quantization` | Enable residual quantization |
205+
| `--spherical-centroids` | L2-normalize centroids (recommended for cosine/IP) |
206+
| `--build-threads` | Number of threads for index building (1-255) |
207+
| `--degree-of-parallelism` | Degree of parallelism for index build (1-256) |
208+
| `--max-parallel-workers` | Sets max_parallel_workers & max_parallel_maintenance_workers |
209+
| `--max-scan-tuples` | Max tuples to scan before stopping (-1 for unlimited) |
210+
182211
### Run awsopensearch from command line
183212

184213
```shell
@@ -756,7 +785,7 @@ Now we can only run one task at the same time.
756785
### Code Structure
757786
![image](https://github.com/zilliztech/VectorDBBench/assets/105927039/8c06512e-5419-4381-b084-9c93aed59639)
758787
### Client
759-
Our client module is designed with flexibility and extensibility in mind, aiming to integrate APIs from different systems seamlessly. As of now, it supports Milvus, Zilliz Cloud, Elastic Search, Pinecone, Qdrant Cloud, Weaviate Cloud, PgVector, Redis, Chroma, CockroachDB, etc. Stay tuned for more options, as we are consistently working on extending our reach to other systems.
788+
Our client module is designed with flexibility and extensibility in mind, aiming to integrate APIs from different systems seamlessly. As of now, it supports Milvus, Zilliz Cloud, Elastic Search, Pinecone, Qdrant Cloud, Weaviate Cloud, PgVector, VectorChord, Redis, Chroma, CockroachDB, etc. Stay tuned for more options, as we are consistently working on extending our reach to other systems.
760789
### Benchmark Cases
761790
We've developed lots of comprehensive benchmark cases to test vector databases' various capabilities, each designed to give you a different piece of the puzzle. These cases are categorized into four main types:
762791
#### Capacity Case

vectordb_bench/backend/clients/__init__.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ class DB(Enum):
5959
Zvec = "Zvec"
6060
Endee = "Endee"
6161
Lindorm = "Lindorm"
62+
VectorChord = "VectorChord"
6263
PolarDB = "PolarDB"
6364

6465
@property
@@ -247,6 +248,10 @@ def init_cls(self) -> type[VectorDB]: # noqa: PLR0911, PLR0912, C901, PLR0915
247248

248249
return LindormVector
249250

251+
if self == DB.VectorChord:
252+
from .vectorchord.vectorchord import VectorChord
253+
254+
return VectorChord
250255
if self == DB.PolarDB:
251256
from .polardb.polardb import PolarDB
252257

@@ -441,6 +446,10 @@ def config_cls(self) -> type[DBConfig]: # noqa: PLR0911, PLR0912, C901, PLR0915
441446

442447
return LindormConfig
443448

449+
if self == DB.VectorChord:
450+
from .vectorchord.config import VectorChordConfig
451+
452+
return VectorChordConfig
444453
if self == DB.PolarDB:
445454
from .polardb.config import PolarDBConfig
446455

@@ -617,6 +626,11 @@ def case_config_cls( # noqa: C901, PLR0911, PLR0912, PLR0915
617626

618627
return _lindorm_vector_case_config.get(index_type)
619628

629+
if self == DB.VectorChord:
630+
from .vectorchord.config import _vectorchord_case_config
631+
632+
return _vectorchord_case_config.get(index_type)
633+
620634
# DB.Pinecone, DB.Redis
621635
return EmptyDBCaseConfig
622636

vectordb_bench/backend/clients/api.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ class IndexType(StrEnum):
4242
GPU_IVF_PQ = "GPU_IVF_PQ"
4343
GPU_CAGRA = "GPU_CAGRA"
4444
SCANN = "scann"
45+
VCHORDRQ = "vchordrq"
46+
VCHORDG = "vchordg"
4547
SCANN_MILVUS = "SCANN_MILVUS"
4648
SVS_VAMANA = "SVS_VAMANA"
4749
SVS_VAMANA_LVQ = "SVS_VAMANA_LVQ"

vectordb_bench/backend/clients/vectorchord/__init__.py

Whitespace-only changes.
Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
import os
2+
from typing import Annotated, Unpack
3+
4+
import click
5+
from pydantic import SecretStr
6+
7+
from vectordb_bench.backend.clients import DB
8+
9+
from ....cli.cli import (
10+
CommonTypedDict,
11+
cli,
12+
click_parameter_decorators_from_typed_dict,
13+
run,
14+
)
15+
16+
17+
class VectorChordTypedDict(CommonTypedDict):
18+
user_name: Annotated[
19+
str,
20+
click.option("--user-name", type=str, help="Db username", required=True),
21+
]
22+
password: Annotated[
23+
str,
24+
click.option(
25+
"--password",
26+
type=str,
27+
help="Postgres database password",
28+
default=lambda: os.environ.get("POSTGRES_PASSWORD", ""),
29+
show_default="$POSTGRES_PASSWORD",
30+
),
31+
]
32+
33+
host: Annotated[str, click.option("--host", type=str, help="Db host", required=True)]
34+
port: Annotated[
35+
int,
36+
click.option(
37+
"--port",
38+
type=int,
39+
help="Postgres database port",
40+
default=5432,
41+
show_default=True,
42+
required=False,
43+
),
44+
]
45+
db_name: Annotated[str, click.option("--db-name", type=str, help="Db name", required=True)]
46+
max_parallel_workers: Annotated[
47+
int | None,
48+
click.option(
49+
"--max-parallel-workers",
50+
type=int,
51+
help="Sets the maximum number of parallel workers for index creation",
52+
required=False,
53+
),
54+
]
55+
quantization_type: Annotated[
56+
str | None,
57+
click.option(
58+
"--quantization-type",
59+
type=click.Choice(["vector", "halfvec", "rabitq8", "rabitq4"]),
60+
help="Quantization type for vectors",
61+
default="vector",
62+
show_default=True,
63+
),
64+
]
65+
66+
67+
class VectorChordRQTypedDict(VectorChordTypedDict):
68+
lists: Annotated[
69+
int | None,
70+
click.option(
71+
"--lists",
72+
type=int,
73+
help="Number of IVF lists for vchordrq index",
74+
),
75+
]
76+
probes: Annotated[
77+
int | None,
78+
click.option(
79+
"--probes",
80+
type=int,
81+
help="Number of probes during search",
82+
default=10,
83+
show_default=True,
84+
),
85+
]
86+
epsilon: Annotated[
87+
float | None,
88+
click.option(
89+
"--epsilon",
90+
type=float,
91+
help="Reranking precision factor (0.0-4.0, higher is more accurate but slower)",
92+
default=1.9,
93+
show_default=True,
94+
),
95+
]
96+
residual_quantization: Annotated[
97+
bool,
98+
click.option(
99+
"--residual-quantization/--no-residual-quantization",
100+
type=bool,
101+
help="Enable residual quantization for improved accuracy",
102+
default=False,
103+
show_default=True,
104+
),
105+
]
106+
rerank_in_table: Annotated[
107+
bool,
108+
click.option(
109+
"--rerank-in-table/--no-rerank-in-table",
110+
type=bool,
111+
help="Read vectors from table instead of storing in index (saves storage, degrades query performance)",
112+
default=False,
113+
show_default=True,
114+
),
115+
]
116+
spherical_centroids: Annotated[
117+
bool,
118+
click.option(
119+
"--spherical-centroids/--no-spherical-centroids",
120+
type=bool,
121+
help="L2-normalize centroids during K-means (recommended for cosine/IP)",
122+
default=False,
123+
show_default=True,
124+
),
125+
]
126+
build_threads: Annotated[
127+
int | None,
128+
click.option(
129+
"--build-threads",
130+
type=int,
131+
help="Number of threads for index building (range: 1-255)",
132+
),
133+
]
134+
degree_of_parallelism: Annotated[
135+
int | None,
136+
click.option(
137+
"--degree-of-parallelism",
138+
type=int,
139+
help="Degree of parallelism for index build (range: 1-256, default: 32)",
140+
),
141+
]
142+
max_scan_tuples: Annotated[
143+
int | None,
144+
click.option(
145+
"--max-scan-tuples",
146+
type=int,
147+
help="Max tuples to scan before stopping (-1 for unlimited)",
148+
),
149+
]
150+
151+
152+
@cli.command()
153+
@click_parameter_decorators_from_typed_dict(VectorChordRQTypedDict)
154+
def VectorChordRQ(
155+
**parameters: Unpack[VectorChordRQTypedDict],
156+
):
157+
from .config import VectorChordConfig, VectorChordRQConfig
158+
159+
run(
160+
db=DB.VectorChord,
161+
db_config=VectorChordConfig(
162+
db_label=parameters["db_label"],
163+
user_name=SecretStr(parameters["user_name"]),
164+
password=SecretStr(parameters["password"]),
165+
host=parameters["host"],
166+
port=parameters["port"],
167+
db_name=parameters["db_name"],
168+
),
169+
db_case_config=VectorChordRQConfig(
170+
quantization_type=parameters["quantization_type"],
171+
lists=parameters["lists"],
172+
probes=parameters["probes"],
173+
epsilon=parameters["epsilon"],
174+
residual_quantization=parameters["residual_quantization"],
175+
rerank_in_table=parameters["rerank_in_table"],
176+
spherical_centroids=parameters["spherical_centroids"],
177+
build_threads=parameters["build_threads"],
178+
degree_of_parallelism=parameters["degree_of_parallelism"],
179+
max_scan_tuples=parameters["max_scan_tuples"],
180+
max_parallel_workers=parameters["max_parallel_workers"],
181+
),
182+
**parameters,
183+
)
184+
185+
186+
class VectorChordGraphTypedDict(VectorChordTypedDict):
187+
m: Annotated[
188+
int | None,
189+
click.option(
190+
"--m",
191+
type=int,
192+
help="Max neighbors per vertex (default: 32)",
193+
),
194+
]
195+
ef_construction: Annotated[
196+
int | None,
197+
click.option(
198+
"--ef-construction",
199+
type=int,
200+
help="Dynamic list size during insertion (default: 64)",
201+
),
202+
]
203+
bits: Annotated[
204+
int | None,
205+
click.option(
206+
"--bits",
207+
type=int,
208+
help="RaBitQ quantization ratio (1 or 2, default: 2)",
209+
),
210+
]
211+
ef_search: Annotated[
212+
int | None,
213+
click.option(
214+
"--ef-search",
215+
type=int,
216+
help="Dynamic list size for search (default: 64)",
217+
default=64,
218+
show_default=True,
219+
),
220+
]
221+
beam_search: Annotated[
222+
int | None,
223+
click.option(
224+
"--beam-search",
225+
type=int,
226+
help="Batch vertex access width during search (default: 1)",
227+
),
228+
]
229+
max_scan_tuples: Annotated[
230+
int | None,
231+
click.option(
232+
"--max-scan-tuples",
233+
type=int,
234+
help="Max tuples to scan before stopping (-1 for unlimited)",
235+
),
236+
]
237+
238+
239+
@cli.command()
240+
@click_parameter_decorators_from_typed_dict(VectorChordGraphTypedDict)
241+
def VectorChordGraph(
242+
**parameters: Unpack[VectorChordGraphTypedDict],
243+
):
244+
from .config import VectorChordConfig, VectorChordGraphConfig
245+
246+
run(
247+
db=DB.VectorChord,
248+
db_config=VectorChordConfig(
249+
db_label=parameters["db_label"],
250+
user_name=SecretStr(parameters["user_name"]),
251+
password=SecretStr(parameters["password"]),
252+
host=parameters["host"],
253+
port=parameters["port"],
254+
db_name=parameters["db_name"],
255+
),
256+
db_case_config=VectorChordGraphConfig(
257+
quantization_type=parameters["quantization_type"],
258+
m=parameters["m"],
259+
ef_construction=parameters["ef_construction"],
260+
bits=parameters["bits"],
261+
ef_search=parameters["ef_search"],
262+
beam_search=parameters["beam_search"],
263+
max_parallel_workers=parameters["max_parallel_workers"],
264+
max_scan_tuples=parameters["max_scan_tuples"],
265+
),
266+
**parameters,
267+
)

0 commit comments

Comments
 (0)