feat(clickhouse): sync.mode: mirror — differential delete (#340 Step 3)#598
Merged
Conversation
ClickHouse counterpart to the Postgres / MySQL mirror mode shipped in #596 / #597. Same application-side diff semantics: accumulate upsert_key tuples across batches in load(), then issue a single ALTER TABLE ... DELETE WHERE key NOT IN (collected) mutation from finalize_sync() with mutations_sync=1 so the call blocks until the DELETE completes. clickhouse_connect's native {name:Type} parameter substitution accepts Array(String) (single column) and Array(Tuple(String, ...)) (composite) directly — so unlike Postgres / MySQL where we assembled placeholders manually, the call site is one parameter dict. Both column references and parameter values are coerced via toString() so the comparison works regardless of source column type — at the cost of not using any index on upsert_key. Mirror mode is intended for small/medium reference tables; high-volume fact tables should use the temp-table strategy follow-up (#340 follow-up). ClickHouseDestinationConfig.upsert_key is list[str] | None (it's informational only for the existing INSERT path where dedup is handled by ReplacingMergeTree at merge time), so the runtime guard in load() raises ValueError early when mirror mode is requested without a populated key — fail-fast before any INSERT touches the table. New _quote_ident helper added for db-qualified table identifiers (`db`.`table`), matching the v0.7.4-hardened MySQL pattern. 12 unit tests in tests/unit/test_clickhouse_mirror_mode.py cover key accumulation, dedupe across overlapping batches, database-qualified DELETE shape, single + composite key DELETE structure, the empty-source safety path, state reset, the missing-upsert_key ValueError, row-error skip path, and coexistence with the existing EXCHANGE TABLES swap- finalize path. Snowflake follows in the next PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sync.mode: mirrorfor ClickHouse — Step 3 of #340, follow-up to #596 (Postgres) + #597 (MySQL)upsert_keytuples inload(), issue a single mutation fromfinalize_sync()upsert_keyValueError)tests/unit/test_clickhouse_mirror_mode.pyclickhouse_connect translation notes
ClickHouse-specific differences from the Postgres / MySQL pattern:
DELETE FROM ... WHEREALTER TABLE ... DELETE WHEREmutationmutations_sync=1insettings=%splaceholders{name:Type}withArray(...)typestoString()cast skips column indexThe
toString()cast on both sides of the comparison means the same code path works for any column type, but loses index hits. Mirror mode is intended for small/medium reference tables, not high-volume fact tables — the docstring + CHANGELOG entry call this out explicitly so misuse is hard.upsert_key handling
ClickHouseDestinationConfig.upsert_keyislist[str] | None(informational only for the INSERT path, where dedup is handled byReplacingMergeTreeat merge time). Unlike Postgres / MySQL where the field is required at the config layer, ClickHouse mirror mode does its own runtime guard inload()that raisesValueErrorearly — before any INSERT touches the table — when mirror mode is requested without a populated key.Test plan
tests/unit/test_clickhouse_mirror_mode.py— 12 tests pass locallytests/unit/test_clickhouse_destination.py— 38 existing tests still pass (swap path coexistence verified)tests/unit/test_mysql_mirror_mode.py+test_postgres_mirror_mode.py— Step 1 + 2 unchanged, all 23 tests passruff check— clean ondrt/destinations/clickhouse.py+ new test filemypy— clean ondrt/destinations/clickhouse.pyFollow-ups (this issue)
🤖 Generated with Claude Code