dlt version
1.26.0
Describe the problem
The coercion logic converts Python float to decimal.Decimal directly using the constructor. This can convert seemingly acceptable values to ones that do not adhere to the precision and scale constraints:
from decimal import Decimal
Decimal(34.7) # Decimal('34.7000000000000028421709430404007434844970703125')
In some destinations, such as DuckDB, this does not cause any issue as the values are rounded during loading. It seems that DuckDB loads only fail if a value has too large magnitude to fit in the given precision, so in the above case, the number would be rounded back to the original.
In BigQuery, however, the loads are strict, and the above case is rejected causing an error. The only way to prevent the error is rounding the values before loading.
Expected behavior
No response
Steps to reproduce
The error can be reproduced by trying to load the resource defined below to any destination that does not automatically round the values during a load, such as BigQuery.
Operating system
macOS, Linux
Runtime environment
Local
Python version
3.12
dlt data source
import dlt
@dlt.resource(columns={"some_number": {"data_type": "decimal"}})
def example():
yield {"some_number": 34.7}
dlt destination
Google BigQuery
Other deployment details
No response
Additional information
This could be fixed in multiple ways.
One is to do the conversion via Decimal(str(float_value)), which fixes the example with 34.7 by giving Decimal('34.7'). This works because str(float_value) gives the shortest decimal representation that corresponds to the binary representation of the float value rather than the closest decimal value. This would still never round the values, so inputs like 0.1234567899 would still cause an error rather than using Decimal('0.123456790'). In some sense an error for invalid values is good, but these currently depend on the destination.
More advanced solutions could utilize the precision and scale hints. These could be enforced after the basic coercion by rounding to at most scale decimal places. Values that do not fit in the precision at all should maybe continue to fail.
dlt version
1.26.0
Describe the problem
The coercion logic converts Python
floattodecimal.Decimaldirectly using the constructor. This can convert seemingly acceptable values to ones that do not adhere to the precision and scale constraints:In some destinations, such as DuckDB, this does not cause any issue as the values are rounded during loading. It seems that DuckDB loads only fail if a value has too large magnitude to fit in the given precision, so in the above case, the number would be rounded back to the original.
In BigQuery, however, the loads are strict, and the above case is rejected causing an error. The only way to prevent the error is rounding the values before loading.
Expected behavior
No response
Steps to reproduce
The error can be reproduced by trying to load the resource defined below to any destination that does not automatically round the values during a load, such as BigQuery.
Operating system
macOS, Linux
Runtime environment
Local
Python version
3.12
dlt data source
dlt destination
Google BigQuery
Other deployment details
No response
Additional information
This could be fixed in multiple ways.
One is to do the conversion via
Decimal(str(float_value)), which fixes the example with34.7by givingDecimal('34.7'). This works becausestr(float_value)gives the shortest decimal representation that corresponds to the binary representation of the float value rather than the closest decimal value. This would still never round the values, so inputs like0.1234567899would still cause an error rather than usingDecimal('0.123456790'). In some sense an error for invalid values is good, but these currently depend on the destination.More advanced solutions could utilize the precision and scale hints. These could be enforced after the basic coercion by rounding to at most
scaledecimal places. Values that do not fit in theprecisionat all should maybe continue to fail.