Add support Citus for PostgreSQL vector extensions

Hi, I'm currently researching vector performance across various PostgreSQL vector extensions.

There are several extensions that support vector types in PostgreSQL, such as pgvector, pgvectorscale and vectorchord (which currently working on)

However, Citus is not a vector extension itself - it is a database distribution technique. Therefore it operates at a higher layer rather than within vector extensions.

To integrate Citus into the current benchmark, I think it would be better to handle it at a higher abstraction layer, which I'm calling `PostgreSQLConfig`.

This layer would contain the basic PostgreSQL configuration, along with optional distribution-related settings.

```
vectordb_bench/backend/clients/
├── pg_base/                          # NEW: common layer for PostgreSQL
│   ├── config.py                     # PostgreSQLConfig (host, port, user, password, db_name)
│   ├── pg_base.py                    # PostgreSQLVectorDB base class
│
├── pgvector/                         # pgvector
│   ├── config.py                     # PgVectorHNSWConfig, PgVectorIVFFlatConfig (index parameters)
│   ├── pgvector.py                 
│   └── cli.py                      
├── pgdiskann/                        # pgdiskann
│   ├── ...
```

```python
# pg_base/config.py
class PostgreSQLConfig(DBConfig):
    """Common configuration for all PostgreSQL-based clients"""
    user_name: SecretStr = "postgres"
    password: SecretStr
    host: str = "localhost"
    port: int = 5432
    db_name: str = "vectordb"
    table_name: str = "vdbbench_table_test"
    
    # Optional Citus distribution settings
    distributed: bool = False
    distribution_column: str = "id"
    shard_count: int | None = None

# pg_base/pg_base.py  
class PostgreSQLVectorDB(VectorDB):
    """Shared PostgreSQL logic"""
    
    def __init__(self, ..., extensions: list[str] = None):
        for ext in (extensions or []):
            self.cursor.execute(f"CREATE EXTENSION IF NOT EXISTS {ext}")
    
    def _create_table(self, dim):
        if self.distributed:
            self.cursor.execute(
                f"SELECT create_distributed_table('{self.table_name}', '{self.distribution_column}')"
            )
```

At the moment, I'm thinking of supporting only Citus for sharding.
If you have any suggestions or additional ideas, I'd really appreciate your feedback.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support Citus for PostgreSQL vector extensions #752

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support Citus for PostgreSQL vector extensions #752

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions