Skip to content

feat(databricks): Support ingestion via Zerobus #3874

@zilto

Description

@zilto

Summary

Extend the Databricks destination with Zerobus load job. This is similar to BigQuery streaming load (code, docs)

Context

Zerobus is a "no-infra" product for streaming / near-real time ingestion. When toggled on, it opens an HTTP endpoint where data can pushed to and Databricks ingest it in the associated table.

see Python SDK
see guide

Scope

  • extend Databricks config to allow Zerobus settings (try to align with BigQuery streaming)
  • support passing an acknowledgement callback to dlt.destinations.databricks() (see guide)
  • proper error handling: Zerobus will throw errors on invalid schemas; dlt is responsible for evolving the schema via DDL (see guide)
  • Add zerobus logic unit tests
  • Add to databricks destination with zerobus to parameterized destination tests
  • allow the destination_client to grant permission for Zerobus for the table (see here)
    • this would be a great quality of life improvement; it's optional for v1 if it becomes very complicated (could encounter issues because toggling Zerobus requires more elevated priviledges than using it)

out of scope (for v1)

  • asynchronous Zerobus SDK support
  • Protobuf Zerobus SDK support

Metadata

Metadata

Labels

destinationIssue with a specific destinationenhancementNew feature or request

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions