Skip to content

chore: complete coverage for primitive to string casts#4009

Closed
parthchandra wants to merge 1 commit intoapache:mainfrom
parthchandra:cast-primitive-string-2
Closed

chore: complete coverage for primitive to string casts#4009
parthchandra wants to merge 1 commit intoapache:mainfrom
parthchandra:cast-primitive-string-2

Conversation

@parthchandra
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Part of #286
Fixes: #1068

Rationale for this change

Comet had gaps in Decimal-to-String cast coverage:

Decimal -> String (LEGACY mode): cast_array was falling through to DataFusion's built-in cast for Decimal128 -> Utf8, which produces plain notation. Spark's LEGACY mode uses Java BigDecimal.toString(), which produces scientific notation when the adjusted exponent is less than -6. For example, Decimal(38,18) zero was cast as
"0.000000000000000000" by Comet but "0E-18" by Spark.

Decimal with negative scale: When spark.sql.legacy.allowNegativeScaleOfDecimal=true, Spark formats negative-scale decimals in scientific notation (e.g. Decimal(7,-2) unscaled=123 -> "1.23E+4"). Comet now matches this. When the config is disabled, CometCast.isSupported returns Incompatible since negative-scale decimals cannot be created
via normal SQL paths.

Other primitive types -> String: Boolean, integer, float, double, date, timestamp, and binary casts to String were already handled correctly by DataFusion's built-in cast. The existing castTest harness already exercises LEGACY, TRY, and ANSI modes for all of these types, so no new tests were needed.

What changes are included in this PR?

  • native/spark-expr/src/conversion_funcs/numeric.rs: adds cast_decimal128_to_utf8 and decimal128_to_java_string that replicate Java BigDecimal.toString() behavior (scientific notation when adj_exp < -6).
  • native/spark-expr/src/conversion_funcs/cast.rs: adds a LEGACY-only match arm for Decimal128 -> Utf8 that calls cast_decimal128_to_utf8; TRY and ANSI fall through to DataFusion's plain-notation cast, which already matches Spark.
  • spark/src/main/scala/org/apache/comet/expressions/CometCast.scala: marks all DecimalType -> StringType as Compatible across all eval modes, with a special Incompatible case for negative-scale decimals when the legacy config is disabled.
  • Tests: adds cast DecimalType(38,18) to StringType and cast DecimalType with negative scale to StringType tests. Extends generateDecimalsPrecision38Scale18 with small-magnitude values that trigger the scientific notation path.

How are these changes tested?

  • Unit tests in CometCastSuite covering Decimal(38,18), negative-scale decimals (with localTableScan.enabled to avoid Parquet), and the CometCast.isSupported boundary for negative-scale with config disabled.
  • Rust unit test test_decimal128_to_java_string covering all branches of the formatting logic (plain, leading zeros, scientific notation, positive/negative exponent, zero, negatives).
  • Existing castTest harness already exercises LEGACY, TRY, and ANSI modes for all other primitive-to-String casts (boolean, byte, short, int, long, float, double, date, timestamp, binary), confirming compatibility.

@parthchandra parthchandra marked this pull request as draft April 20, 2026 20:05
@parthchandra
Copy link
Copy Markdown
Contributor Author

This is already done in #3939

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make decimal to string cast fully compatible with Spark

1 participant