This file gives AI agents (Claude Code, Cursor, etc.) the context needed to work effectively in this codebase.
drt (data reverse tool) is a CLI tool that syncs data from a data warehouse (BigQuery) to external services via declarative YAML configuration. Think of it as the reverse of dlt: dlt loads data into a DWH; drt activates data out of a DWH.
Tagline: "Reverse ETL for the code-first data stack."
Config Parser → Source (BigQuery) → Sync Engine → Destination (REST API)
↓
State Manager
Key design principle: module boundaries are drawn for future Rust rewrite (PyO3). The engine/sync.py module is the primary Rust candidate — keep it pure (no I/O side effects beyond protocol calls). Logging, state persistence, OTel spans, and any other observability/persistence side effect MUST flow through drt.engine.observer.SyncObserver. Direct logger.*, state_manager.save_sync(...), or watermark_storage.save(...) calls inside engine/sync.py are guarded by tests/unit/test_engine_observer.py boundary checks and will fail CI.
drt/
├── cli/ # Typer CLI commands
├── config/ # Pydantic models + YAML parser
├── connectors/ # Connector registry — auto-discovery of sources/destinations
├── sources/ # Source Protocol + BigQuery impl
├── destinations/ # Destination Protocol + REST API impl
├── engine/ # Sync orchestration (future Rust core)
├── state/ # Local JSON state persistence
└── templates/ # Jinja2 renderer (future MiniJinja/Rust)
Source.extract(query: str, config: ProfileConfig) -> Iterator[dict]Destination.load(records: list[dict], config: DestinationConfig, sync_options: SyncOptions) -> SyncResultStateManager.get_last_sync / save_sync
Connector dispatch uses a centralized registry (drt/connectors/registry.py) — adding a new connector requires registering it there, not editing main.py. Implementations use assert isinstance(config, SpecificConfig) for type narrowing. type: ignore is only allowed for external library issues.
make dev # install with dev + bigquery extras
make test # pytest
make lint # ruff + mypy
make fmt # ruff format + fix- v0.7.8 released — community-driven follow-up patch. Two contributor PRs accumulated since v0.7.7: new Mixpanel destination (PR #608 by @Pawansingh3889 —
people_set(/engage) +import_events(/import) endpoints, EU residency viaregion: eu→api-eu.mixpanel.com, deterministic$insert_idfor idempotent re-runs, closes #417) and ClickHouse_quote_identidentifier fix (PR #610 by @yodakanohoshi — closes the ClickHouse leg of the qualified-identifier fix family alongside Postgres #498 / MySQL #514; v0.7.7 users withdatabase.tableClickHouse syntax were hitting a server-sideCode: 62fromget_row_count's malformed identifier rendering). Also completes the empty-batch contract suite (PRs #604–#606 — 25 of 25 registered destinations), which surfaced + fixed a real bug instaged_upload.finalize()(it ran the full upload/trigger/poll lifecycle on empty input). Shipssync.mode: mirroruser-facing documentation (PR #607 —docs/connectors/postgres.mdsection + runnableexamples/postgres_to_postgres_mirror/+ skill option). BigQuery is in flight via contributor PR #584 and will trigger v0.7.9. No breaking changes — drop-in upgrade from v0.7.7. - v0.7.7 —
sync.mode: mirroracross the SQL destination set. New differential-delete sync mode (#340) that upserts source rows and DELETEs destination rows whoseupsert_keywas not observed in the source — no TRUNCATE / re-insert overhead ofreplacemode. Lands across Postgres (PR #596), MySQL (PR #597), ClickHouse (PR #598 —ALTER TABLE ... DELETEmutation w/mutations_sync=1), Snowflake (PR #599 — MERGE-path forcing + first-everfinalize_syncon Snowflake). Also lands thecli/main.pysplit completion — Phase 2b PR (a) + PR (b) + tighten (PRs #579 / #587 / #591) finish the 1706 → 164 LOC (-90%) split begun in v0.7.5 — plusFakeSource+ destination contract test framework (#592–#595), CIcheck-changelog-requiredwarn-only guard (#590), GCS storage import mypy fix (#588), and CI install line extension that unlocked ~102 silently-skipped SQL destination tests (raised total coverage 82.68 → 85.29). No breaking changes — drop-in upgrade from v0.7.6. - v0.7.6 — Small follow-up. Adds the Amplitude destination (#574, Identify API + HTTP V2 events API) and the
tojson_safeJinja2 filter (#580 / PR #581) that unblocksdatetime/Decimal/UUIDcolumns flowing into REST APIbody_templaterendering withoutCAST(... AS STRING)workarounds in model SQL. Also lands a CLI--log-formattyper 0.26.1 compatibility fix (#577 / PR #578), a retrofit ofErrorFormatterstage detection to an engine-emitted attribute (PR #571, supersedes #544's traceback-walk heuristic), and Phase 2a of thecli/main.pysplit (PR #572, continues #565's Phase 1). No breaking changes — drop-in upgrade from v0.7.5. - v0.7.5 — Production Ready follow-up #3 + Tech Foundation Hardening (Epic #538 closed, 11 child issues). CI hardened (nightly + publish gate + CodeQL + pip-audit + SBOM); functional reverse-ETL E2E coverage established via DuckDB harness + boundary tests; CLI/UX polished (
ErrorFormatter,drt sources/destinations --detailed,drt init --template); load-bearing refactors landed (SyncObserverengine I/O boundary, destinations serializer consolidation,BaseSqlDestinationConfig,cli/main.pysplit Phase 1). Also ships the accumulated work since v0.7.4 — REST API source polish, sync catalog (#499 P1+P2),drt_run_testMCP tool, OpenTelemetry Phase 1 config, hardcoded secret detection, lookup ambiguity warning, orphan shadow cleanup. No new connectors, no breaking changes — drop-in upgrade from v0.7.2 / v0.7.3 / v0.7.4. - v0.7.4 — Patch release for MySQL schema-qualified identifier handling (#511, PR #514). MySQL counterpart to the Postgres
Identifier()fix that shipped in v0.7.3; the_quote_identhelper is now applied consistently across replace / insert / upsert / row-count paths somydb.scorescorrectly quotes as`mydb`.`scores`. PR #514 actually landed onmaintwo days after the v0.7.3 tag was cut, so the wheel published asdrt-core==0.7.3did not contain it; v0.7.4 is the release that actually delivers it. - v0.7.3 — Patch release for Postgres schema-qualified identifier handling (#442, PR #498). Cherry-pick of the qualified
Identifier()composition fix on top of the v0.7.2 line —marketing.eventsand similarschema.tableconfigs no longer fail at SQL execution. No new features, no breaking changes. - v0.7.2 — Production Ready follow-up #2: opt-in anonymous telemetry (#263, PostHog Cloud EU), deprecation warnings in
drt validate(#467), Postgrespsycopg2.sqlSQL composition hardening (#442). Telemetry is off by default +DO_NOT_TRACKhonored; release-time API key injection workflow (#481) ships with the wheel. - v0.7.1 — Production Ready follow-up:
drt run --dry-run --difffor record-level preview (#413), tz-aware cursor stringification fix (#475),on_error=failalignment for Notion / REST API / Email SMTP (#463),VERSIONING.mdpolicy doc (#457). - v0.7.0 — Production Ready theme: graceful shutdown on SIGTERM/SIGINT (#279), per-destination retry override (#277), sync execution history (#276), zero-downtime replace via staging table swap (#338), FK existence check via
lookups.check_only(#354),json_columnsconfig (#316),drt doctor(#264),--quietflag (#265), Slack/webhook failure alerts (#414). Plus first DWH destination (Snowflake #353), Codespaces playground (#407), andOPEN_CORE.md. - v0.6.2 —
watermark.default_value+--cursor-valueCLI + watermark observability (#390, #391) - v0.6.1 —
${VAR}env substitution in all sync YAML string fields (#385) - v0.6.0 — Notion/Twilio/Intercom/Email SMTP/Salesforce Bulk/Google Ads destinations,
--threadsparallel execution,--log-format json,--select tag:, JSON Schema validation, freshness/unique/accepted_values tests,drt sources/drt destinations,--dry-runrow count diff, StagedDestination Protocol, destination_lookup, GOVERNANCE.md - CLI fully wired:
init,run,list,validate,status,test,mcp run,serve,sources,destinations,doctor,cloud push(stub) - Sources: BigQuery, DuckDB, PostgreSQL, Redshift, SQLite, ClickHouse, Snowflake, MySQL, Databricks, SQL Server
- Destinations: REST API, Slack, Discord, Microsoft Teams, GitHub Actions, HubSpot, Google Sheets, PostgreSQL, MySQL, ClickHouse, Snowflake, Parquet, CSV/JSON/JSONL, Jira, Linear, SendGrid, Notion, Twilio, Intercom, Email SMTP, Salesforce Bulk, Google Ads, Staged Upload, Amplitude
- Integrations: MCP Server (
drt-core[mcp]), dagster-drt, Airflow, Prefect, dbt manifest reader - 833+ tests, integration tests use
pytest-httpserver
- Do not add a GUI or web UI — this is a CLI-first tool
- Do not add RBAC or multi-tenancy — small team / personal use
- Do not add
type: ignore— only allowed for external library issues (no-untyped-call,import-untyped) - Do not add heavy dependencies to core — extras (
[bigquery],[mcp]) exist for a reason
SSoT for upcoming releases: ROADMAP.md — each version has Theme / Scope / Out of scope / Target / Progress link.
- Shipped releases: see CHANGELOG.md or GitHub Releases
- Issue-level tracking: GitHub Milestones
- Good First Issues: https://github.com/drt-hub/drt/issues?q=is%3Aopen+label%3A%22good+first+issue%22
When scope shifts between versions, update ROADMAP.md first, then re-label issues to match.