Hi, I'm currently researching vector performance across various PostgreSQL vector extensions.
There are several extensions that support vector types in PostgreSQL, such as pgvector, pgvectorscale and vectorchord (which currently working on)
However, Citus is not a vector extension itself - it is a database distribution technique. Therefore it operates at a higher layer rather than within vector extensions.
To integrate Citus into the current benchmark, I think it would be better to handle it at a higher abstraction layer, which I'm calling PostgreSQLConfig.
This layer would contain the basic PostgreSQL configuration, along with optional distribution-related settings.
vectordb_bench/backend/clients/
├── pg_base/ # NEW: common layer for PostgreSQL
│ ├── config.py # PostgreSQLConfig (host, port, user, password, db_name)
│ ├── pg_base.py # PostgreSQLVectorDB base class
│
├── pgvector/ # pgvector
│ ├── config.py # PgVectorHNSWConfig, PgVectorIVFFlatConfig (index parameters)
│ ├── pgvector.py
│ └── cli.py
├── pgdiskann/ # pgdiskann
│ ├── ...
# pg_base/config.py
class PostgreSQLConfig(DBConfig):
"""Common configuration for all PostgreSQL-based clients"""
user_name: SecretStr = "postgres"
password: SecretStr
host: str = "localhost"
port: int = 5432
db_name: str = "vectordb"
table_name: str = "vdbbench_table_test"
# Optional Citus distribution settings
distributed: bool = False
distribution_column: str = "id"
shard_count: int | None = None
# pg_base/pg_base.py
class PostgreSQLVectorDB(VectorDB):
"""Shared PostgreSQL logic"""
def __init__(self, ..., extensions: list[str] = None):
for ext in (extensions or []):
self.cursor.execute(f"CREATE EXTENSION IF NOT EXISTS {ext}")
def _create_table(self, dim):
if self.distributed:
self.cursor.execute(
f"SELECT create_distributed_table('{self.table_name}', '{self.distribution_column}')"
)
At the moment, I'm thinking of supporting only Citus for sharding.
If you have any suggestions or additional ideas, I'd really appreciate your feedback.
Thanks!
Hi, I'm currently researching vector performance across various PostgreSQL vector extensions.
There are several extensions that support vector types in PostgreSQL, such as pgvector, pgvectorscale and vectorchord (which currently working on)
However, Citus is not a vector extension itself - it is a database distribution technique. Therefore it operates at a higher layer rather than within vector extensions.
To integrate Citus into the current benchmark, I think it would be better to handle it at a higher abstraction layer, which I'm calling
PostgreSQLConfig.This layer would contain the basic PostgreSQL configuration, along with optional distribution-related settings.
At the moment, I'm thinking of supporting only Citus for sharding.
If you have any suggestions or additional ideas, I'd really appreciate your feedback.
Thanks!