Skip to content

refactor(clickhouse): harden identifier quoting and audit SQL command paths #512

@yodakanohoshi

Description

@yodakanohoshi

Summary

ClickHouseDestination could use a quoting pass on its SQL command paths:

  1. get_row_count builds a backtick-quoted identifier in a way that doesn't render correctly for schema-qualified table names — the join string is .` where it should be `.`.
  2. Most client.command(...) paths interpolate config.table without any quoting. This works fine for plain lowercase names but is fragile for reserved words, mixed case, or database-qualified names.

Surfaced during review of #442 / PR #498. The Postgres equivalent was completed in #498; the ClickHouse audit was listed in #442's "Proposed Approach" but out of scope there.

get_row_count: quoting doesn't render as intended

drt/destinations/clickhouse.py L205-L210:

escaped_table = (
    ".`".join(config.table.split("."))
    if "." in config.table
    else config.table
)
result = client.query(f"SELECT COUNT(*) FROM `{escaped_table}`")

For config.table = "db.scores":

>>> ".`".join("db.scores".split("."))
'db.`scores'
>>> f"`{_}`"
'`db.`scores`'

→ The intended output is `db`.`scores` (4 backticks, two quoted parts). The actual output has 3 backticks and isn't a valid identifier reference.

There is currently no test coverage for get_row_count with a schema-qualified table, which is why this hasn't surfaced in practice.

Audit scope (all command(...) call sites)

These interpolate config.table / shadow / etc. with no quoting:

Method Line SQL
load (non-swap replace) L79 TRUNCATE TABLE {config.table}
_load_replace_swap L124 DROP TABLE IF EXISTS {shadow}
_load_replace_swap L125 CREATE TABLE {shadow} AS {table}
_load_replace_swap L150 DROP TABLE IF EXISTS {shadow} (cleanup)
finalize_sync L179 EXCHANGE TABLES {table} AND {shadow}
finalize_sync L181 DROP TABLE {shadow}
get_row_count L210 (see above)

client.insert(table, ...) (L86, L132) goes through the driver's typed path and is probably safer — worth confirming whether clickhouse-connect quotes table names internally.

Design questions to settle before coding

ClickHouse identifier handling differs enough from Postgres / MySQL that this benefits from some design discussion first:

  1. Quoting policy: ClickHouse accepts both `name` and "name" and unquoted lowercase names. Always-quote is safer and matches what we do elsewhere — confirm this is the direction we want.
  2. Helper location: extract a _quote_ident(table) similar to MySQL's, or inline?
  3. client.insert quoting: does clickhouse-connect quote the table parameter? If so, client.insert(shadow, ...) is already safe and can be left alone.
  4. Server-side parameter binding: clickhouse-connect supports parameters= for values but (as far as we know) not for identifiers. Worth confirming.

These open questions are why this isn't flagged good first issue — it benefits from someone with ClickHouse familiarity to make the call.

Acceptance Criteria

  • get_row_count produces the intended SQL for schema-qualified table names; regression test covers db.scores
  • All client.command(...) interpolations of identifiers go through a consistent quoting path
  • Decision on client.insert documented (either confirmed safe, or quoted)
  • Unit tests cover schema-qualified table names for replace, swap (including EXCHANGE TABLES), and get_row_count
  • No new type: ignore; ruff check and mypy drt pass

Out of Scope

Reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    destinationDestination connectors (REST API, Slack, HubSpot…)refactorCode change that neither fixes a bug nor adds a feature

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions