Summary
ClickHouseDestination could use a quoting pass on its SQL command paths:
get_row_count builds a backtick-quoted identifier in a way that doesn't render correctly for schema-qualified table names — the join string is .` where it should be `.`.
- Most
client.command(...) paths interpolate config.table without any quoting. This works fine for plain lowercase names but is fragile for reserved words, mixed case, or database-qualified names.
Surfaced during review of #442 / PR #498. The Postgres equivalent was completed in #498; the ClickHouse audit was listed in #442's "Proposed Approach" but out of scope there.
get_row_count: quoting doesn't render as intended
drt/destinations/clickhouse.py L205-L210:
escaped_table = (
".`".join(config.table.split("."))
if "." in config.table
else config.table
)
result = client.query(f"SELECT COUNT(*) FROM `{escaped_table}`")
For config.table = "db.scores":
>>> ".`".join("db.scores".split("."))
'db.`scores'
>>> f"`{_}`"
'`db.`scores`'
→ The intended output is `db`.`scores` (4 backticks, two quoted parts). The actual output has 3 backticks and isn't a valid identifier reference.
There is currently no test coverage for get_row_count with a schema-qualified table, which is why this hasn't surfaced in practice.
Audit scope (all command(...) call sites)
These interpolate config.table / shadow / etc. with no quoting:
| Method |
Line |
SQL |
load (non-swap replace) |
L79 |
TRUNCATE TABLE {config.table} |
_load_replace_swap |
L124 |
DROP TABLE IF EXISTS {shadow} |
_load_replace_swap |
L125 |
CREATE TABLE {shadow} AS {table} |
_load_replace_swap |
L150 |
DROP TABLE IF EXISTS {shadow} (cleanup) |
finalize_sync |
L179 |
EXCHANGE TABLES {table} AND {shadow} |
finalize_sync |
L181 |
DROP TABLE {shadow} |
get_row_count |
L210 |
(see above) |
client.insert(table, ...) (L86, L132) goes through the driver's typed path and is probably safer — worth confirming whether clickhouse-connect quotes table names internally.
Design questions to settle before coding
ClickHouse identifier handling differs enough from Postgres / MySQL that this benefits from some design discussion first:
- Quoting policy: ClickHouse accepts both
`name` and "name" and unquoted lowercase names. Always-quote is safer and matches what we do elsewhere — confirm this is the direction we want.
- Helper location: extract a
_quote_ident(table) similar to MySQL's, or inline?
client.insert quoting: does clickhouse-connect quote the table parameter? If so, client.insert(shadow, ...) is already safe and can be left alone.
- Server-side parameter binding:
clickhouse-connect supports parameters= for values but (as far as we know) not for identifiers. Worth confirming.
These open questions are why this isn't flagged good first issue — it benefits from someone with ClickHouse familiarity to make the call.
Acceptance Criteria
Out of Scope
Reference
Summary
ClickHouseDestinationcould use a quoting pass on its SQL command paths:get_row_countbuilds a backtick-quoted identifier in a way that doesn't render correctly for schema-qualified table names — the join string is.`where it should be`.`.client.command(...)paths interpolateconfig.tablewithout any quoting. This works fine for plain lowercase names but is fragile for reserved words, mixed case, or database-qualified names.Surfaced during review of #442 / PR #498. The Postgres equivalent was completed in #498; the ClickHouse audit was listed in #442's "Proposed Approach" but out of scope there.
get_row_count: quoting doesn't render as intended
drt/destinations/clickhouse.pyL205-L210:For
config.table = "db.scores":→ The intended output is
`db`.`scores`(4 backticks, two quoted parts). The actual output has 3 backticks and isn't a valid identifier reference.There is currently no test coverage for
get_row_countwith a schema-qualified table, which is why this hasn't surfaced in practice.Audit scope (all
command(...)call sites)These interpolate
config.table/shadow/ etc. with no quoting:load(non-swap replace)TRUNCATE TABLE {config.table}_load_replace_swapDROP TABLE IF EXISTS {shadow}_load_replace_swapCREATE TABLE {shadow} AS {table}_load_replace_swapDROP TABLE IF EXISTS {shadow}(cleanup)finalize_syncEXCHANGE TABLES {table} AND {shadow}finalize_syncDROP TABLE {shadow}get_row_countclient.insert(table, ...)(L86, L132) goes through the driver's typed path and is probably safer — worth confirming whetherclickhouse-connectquotes table names internally.Design questions to settle before coding
ClickHouse identifier handling differs enough from Postgres / MySQL that this benefits from some design discussion first:
`name`and"name"and unquoted lowercase names. Always-quote is safer and matches what we do elsewhere — confirm this is the direction we want._quote_ident(table)similar to MySQL's, or inline?client.insertquoting: doesclickhouse-connectquote thetableparameter? If so,client.insert(shadow, ...)is already safe and can be left alone.clickhouse-connectsupportsparameters=for values but (as far as we know) not for identifiers. Worth confirming.These open questions are why this isn't flagged
good first issue— it benefits from someone with ClickHouse familiarity to make the call.Acceptance Criteria
get_row_countproduces the intended SQL for schema-qualified table names; regression test coversdb.scoresclient.command(...)interpolations of identifiers go through a consistent quoting pathclient.insertdocumented (either confirmed safe, or quoted)EXCHANGE TABLES), andget_row_counttype: ignore;ruff checkandmypy drtpassOut of Scope
.,`, etc.)Reference