Skip to content

Coercion from double to decimal can produce unexpected values #3917

@rooperuu

Description

@rooperuu

dlt version

1.26.0

Describe the problem

The coercion logic converts Python float to decimal.Decimal directly using the constructor. This can convert seemingly acceptable values to ones that do not adhere to the precision and scale constraints:

from decimal import Decimal
Decimal(34.7)  # Decimal('34.7000000000000028421709430404007434844970703125')

In some destinations, such as DuckDB, this does not cause any issue as the values are rounded during loading. It seems that DuckDB loads only fail if a value has too large magnitude to fit in the given precision, so in the above case, the number would be rounded back to the original.

In BigQuery, however, the loads are strict, and the above case is rejected causing an error. The only way to prevent the error is rounding the values before loading.

Expected behavior

No response

Steps to reproduce

The error can be reproduced by trying to load the resource defined below to any destination that does not automatically round the values during a load, such as BigQuery.

Operating system

macOS, Linux

Runtime environment

Local

Python version

3.12

dlt data source

import dlt

@dlt.resource(columns={"some_number": {"data_type": "decimal"}})
def example():
    yield {"some_number": 34.7}

dlt destination

Google BigQuery

Other deployment details

No response

Additional information

This could be fixed in multiple ways.

One is to do the conversion via Decimal(str(float_value)), which fixes the example with 34.7 by giving Decimal('34.7'). This works because str(float_value) gives the shortest decimal representation that corresponds to the binary representation of the float value rather than the closest decimal value. This would still never round the values, so inputs like 0.1234567899 would still cause an error rather than using Decimal('0.123456790'). In some sense an error for invalid values is good, but these currently depend on the destination.

More advanced solutions could utilize the precision and scale hints. These could be enforced after the basic coercion by rounding to at most scale decimal places. Values that do not fit in the precision at all should maybe continue to fail.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdestinationIssue with a specific destination

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions