Description
When using write_disposition: append (or merge), if a column is initially inferred as bigint (e.g., value 5) and a subsequent load sends a double (e.g., value 3.7), dlt creates a variant column (column__v_double) rather than promoting the column type from bigint to double.
Expected behavior
For safe type promotions (e.g., bigint → double), dlt could evolve the column type in-place (where the destination supports it) or at minimum provide a configuration option to prefer type promotion over variant columns.
Current behavior
A new variant column is created (e.g., score__v_double), leaving the original score column as bigint. This means downstream consumers need to handle two columns for what is logically one field.
Reproduction
import dlt
@dlt.resource(write_disposition="append")
def test_metrics():
yield {"team": "Alpha", "score": 5}
@dlt.resource(write_disposition="append", name="test_metrics")
def test_metrics_float():
yield {"team": "Beta", "score": 3.7}
pipeline = dlt.pipeline(
pipeline_name="schema_evolution_test",
destination="bigquery",
dataset_name="test_schema_evolution",
)
# Load 1: score inferred as bigint
pipeline.run(test_metrics())
# Load 2: score is now double → creates variant column
pipeline.run(test_metrics_float())
Workaround
Explicitly define columns with data_type: double in the resource config to prevent incorrect initial inference.
Environment
- Destination: BigQuery (but the question applies to any destination that supports safe type widening)
Question
Is there a plan to support safe type promotions (bigint → double) as part of schema evolution, instead of creating variant columns? Or is there a recommended configuration to handle this beyond pre-defining column types?
Description
When using
write_disposition: append(ormerge), if a column is initially inferred asbigint(e.g., value5) and a subsequent load sends adouble(e.g., value3.7), dlt creates a variant column (column__v_double) rather than promoting the column type frombiginttodouble.Expected behavior
For safe type promotions (e.g.,
bigint→double), dlt could evolve the column type in-place (where the destination supports it) or at minimum provide a configuration option to prefer type promotion over variant columns.Current behavior
A new variant column is created (e.g.,
score__v_double), leaving the originalscorecolumn asbigint. This means downstream consumers need to handle two columns for what is logically one field.Reproduction
Workaround
Explicitly define
columnswithdata_type: doublein the resource config to prevent incorrect initial inference.Environment
Question
Is there a plan to support safe type promotions (bigint → double) as part of schema evolution, instead of creating variant columns? Or is there a recommended configuration to handle this beyond pre-defining column types?