[Enhancement] – Add native BigQuery GEOGRAPHY data type support via destination adapter#3855
Open
ugbotueferhire wants to merge 2 commits intodlt-hub:develfrom
Open
Conversation
Author
|
Hi @rudolfix, whenever you have a moment, could you please take a look at this PR for a review? Let me know if you need any changes. Thanks! |
Author
|
Hey @rudolfix , gentle ping on this PR, it's been sitting without a review for about 3 weeks. Would appreciate your eyes on it when you get a chance. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[Enhancement] – Add native BigQuery GEOGRAPHY data type support via destination adapter
Description
Context:
When loading geospatial data (e.g. from PostGIS) into BigQuery, dlt mapped geometry/geography columns to
STRING. Users who needed BigQuery's nativeGEOGRAPHYtype — which enables spatial queries viaST_DISTANCE,ST_CONTAINS, etc. — had to manually cast columns post-load with custom scripting. This was a friction point for geo-heavy workloads and a barrier to adoption (ref: #3847).dlt already solved this exact problem for the Postgres destination via
postgres_adapter(data, geometry="col"), which uses anx-postgres-geometrycolumn hint to emitgeometry(Geometry, <srid>)at DDL time. No equivalent existed for BigQuery.Approach:
Replicated the proven
x-hintadapter pattern for BigQuery. The data travels through the pipeline astext(WKT/GeoJSON strings), and only at the BigQuery destination is it materialized asGEOGRAPHY. This avoids any changes to the core type system, normalizer, coercion logic, or schema engine version.bigquery_adapter.py: AddedGEOGRAPHY_HINT("x-bigquery-geography") constant and a newgeography: TColumnNamesparameter tobigquery_adapter(). Accepts a single column name or list. Validation and hint-setting logic follows the established pattern used byclusterandpartition.factory.py: OverrodeBigQueryTypeMapper.to_destination_type()to check forGEOGRAPHY_HINTand return"GEOGRAPHY". Added"GEOGRAPHY": "text"todbt_to_sctreverse mapping and explicit handling infrom_destination_type()for round-trip completeness.bigquery.py: ImportedGEOGRAPHY_HINTalongside existing adapter hints for consistency and future use.test_bigquery_table_builder.py: Added 5 focused unit tests covering adapter configuration, DDL generation, and reverse type mapping.Impact:
bigquery_adapter(data, geography="location")— and dlt will create the column asGEOGRAPHYin BigQuery. Supports WKT (POINT(-118.4 33.9)) and GeoJSON string inputs in WGS84. Enables native spatial queries immediately after load.Usage
Tests
Unit Tests Summary:
test_adapter_geography_hint_configx-bigquery-geographyhinttest_adapter_geography_hint_multiple_columnstest_geography_column_sql_createCREATE TABLEgeneratesGEOGRAPHYcolumn typetest_geography_column_sql_alterALTER TABLEgeneratesADD COLUMN ... GEOGRAPHYtest_geography_from_destination_typeGEOGRAPHYdb type maps back totextdata type