Summary
Extend the Databricks destination with Zerobus load job. This is similar to BigQuery streaming load (code, docs)
Context
Zerobus is a "no-infra" product for streaming / near-real time ingestion. When toggled on, it opens an HTTP endpoint where data can pushed to and Databricks ingest it in the associated table.
see Python SDK
see guide
Scope
- extend Databricks config to allow Zerobus settings (try to align with BigQuery streaming)
- support passing an acknowledgement callback to
dlt.destinations.databricks() (see guide)
- proper error handling: Zerobus will throw errors on invalid schemas; dlt is responsible for evolving the schema via DDL (see guide)
- Add zerobus logic unit tests
- Add to databricks destination with zerobus to parameterized destination tests
- allow the
destination_client to grant permission for Zerobus for the table (see here)
- this would be a great quality of life improvement; it's optional for v1 if it becomes very complicated (could encounter issues because toggling Zerobus requires more elevated priviledges than using it)
out of scope (for v1)
- asynchronous Zerobus SDK support
- Protobuf Zerobus SDK support
Summary
Extend the Databricks destination with Zerobus load job. This is similar to BigQuery streaming load (code, docs)
Context
Zerobus is a "no-infra" product for streaming / near-real time ingestion. When toggled on, it opens an HTTP endpoint where data can pushed to and Databricks ingest it in the associated table.
see Python SDK
see guide
Scope
dlt.destinations.databricks()(see guide)destination_clientto grant permission for Zerobus for the table (see here)out of scope (for v1)