Skip to content

Add support Citus for PostgreSQL vector extensions #752

@R3gardless

Description

@R3gardless

Hi, I'm currently researching vector performance across various PostgreSQL vector extensions.

There are several extensions that support vector types in PostgreSQL, such as pgvector, pgvectorscale and vectorchord (which currently working on)

However, Citus is not a vector extension itself - it is a database distribution technique. Therefore it operates at a higher layer rather than within vector extensions.

To integrate Citus into the current benchmark, I think it would be better to handle it at a higher abstraction layer, which I'm calling PostgreSQLConfig.

This layer would contain the basic PostgreSQL configuration, along with optional distribution-related settings.

vectordb_bench/backend/clients/
├── pg_base/                          # NEW: common layer for PostgreSQL
│   ├── config.py                     # PostgreSQLConfig (host, port, user, password, db_name)
│   ├── pg_base.py                    # PostgreSQLVectorDB base class
│
├── pgvector/                         # pgvector
│   ├── config.py                     # PgVectorHNSWConfig, PgVectorIVFFlatConfig (index parameters)
│   ├── pgvector.py                 
│   └── cli.py                      
├── pgdiskann/                        # pgdiskann
│   ├── ...
# pg_base/config.py
class PostgreSQLConfig(DBConfig):
    """Common configuration for all PostgreSQL-based clients"""
    user_name: SecretStr = "postgres"
    password: SecretStr
    host: str = "localhost"
    port: int = 5432
    db_name: str = "vectordb"
    table_name: str = "vdbbench_table_test"
    
    # Optional Citus distribution settings
    distributed: bool = False
    distribution_column: str = "id"
    shard_count: int | None = None

# pg_base/pg_base.py  
class PostgreSQLVectorDB(VectorDB):
    """Shared PostgreSQL logic"""
    
    def __init__(self, ..., extensions: list[str] = None):
        for ext in (extensions or []):
            self.cursor.execute(f"CREATE EXTENSION IF NOT EXISTS {ext}")
    
    def _create_table(self, dim):
        if self.distributed:
            self.cursor.execute(
                f"SELECT create_distributed_table('{self.table_name}', '{self.distribution_column}')"
            )

At the moment, I'm thinking of supporting only Citus for sharding.
If you have any suggestions or additional ideas, I'd really appreciate your feedback.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions