Skip to content

feat: add support for timestamp_seconds expression#3146

Open
andygrove wants to merge 12 commits intoapache:mainfrom
andygrove:feature/seconds-to-timestamp
Open

feat: add support for timestamp_seconds expression#3146
andygrove wants to merge 12 commits intoapache:mainfrom
andygrove:feature/seconds-to-timestamp

Conversation

@andygrove
Copy link
Copy Markdown
Member

Summary

  • Adds native Comet support for Spark timestamp_seconds function (also known as seconds_to_timestamp)
  • Converts seconds since Unix epoch to a timestamp
  • Supports Int32, Int64, and Float64 input types

Test Plan

  • Added unit tests in CometTemporalExpressionSuite
  • Tests include random data, literal values, nulls, negative values (before epoch), and fractional seconds
  • All existing tests pass

Note: This PR was generated with AI assistance.

Closes #3111

Adds native Comet support for Spark's timestamp_seconds (SecondsToTimestamp)
function, which converts seconds since Unix epoch to a timestamp.

Supports Int32, Int64, and Float64 inputs. NaN and Infinite float values
return null per Spark behavior.

Closes apache#3111

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove andygrove marked this pull request as draft January 14, 2026 23:49
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.96%. Comparing base (f09f8af) to head (3a7d8d9).
⚠️ Report is 907 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3146      +/-   ##
============================================
+ Coverage     56.12%   59.96%   +3.83%     
- Complexity      976     1473     +497     
============================================
  Files           119      175      +56     
  Lines         11743    16170    +4427     
  Branches       2251     2682     +431     
============================================
+ Hits           6591     9696    +3105     
- Misses         4012     5126    +1114     
- Partials       1140     1348     +208     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove marked this pull request as ready for review January 15, 2026 03:41
…mestamp

# Conflicts:
#	docs/source/user-guide/latest/configs.md
#	native/spark-expr/src/comet_scalar_funcs.rs
#	native/spark-expr/src/lib.rs
#	spark/src/main/scala/org/apache/comet/serde/datetime.scala
#	spark/src/test/scala/org/apache/comet/CometTemporalExpressionSuite.scala
@andygrove andygrove marked this pull request as draft January 30, 2026 01:48
@andygrove
Copy link
Copy Markdown
Member Author

Moving this to draft until #3328 is merged

andygrove and others added 2 commits February 10, 2026 12:11
…mestamp

# Conflicts:
#	native/spark-expr/src/comet_scalar_funcs.rs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@andygrove andygrove marked this pull request as ready for review February 11, 2026 01:03
@mbutrovich
Copy link
Copy Markdown
Contributor

Thanks @andygrove! Claude summarized my notes for me. Hopefully it didn't transcribe anything wrong or hallucinate :)

PR #3146: timestamp_seconds

Overflow behavior mismatch with Spark

This is the main concern. Spark's SecondsToTimestamp for integer inputs uses Math.multiplyExact, which throws ArithmeticException("long overflow") on overflow. The Rust implementation uses checked_mul and maps overflow to None (null). So Comet silently returns null where Spark would throw.

Either return an error on overflow to match Spark, or document the deviation and mark it Incompatible in the serde's getSupportLevel.

NULL buffer handling

All three input type paths build the output via .iter().map().collect(), which constructs the null buffer row-by-row through Arrow's builder. For Int32 and Int64, no new nulls are introduced. The output nulls are identical to the input nulls. The null buffer could be cloned directly and values computed with Arrow's unary kernel or direct buffer arithmetic. This avoids per-row null tracking overhead. Float64 is trickier since NaN/Infinity introduce new nulls, so iterative construction is more justifiable there.

Missing input types

Spark accepts NumericType (Byte, Short, Int, Long, Float, Double, Decimal). The PR only handles Int32, Int64, Float64. Float32 would be trivial to add (cast to f64 and use the Float64 path). Decimal is more complex and could be deferred.

Test gaps

A few cases worth adding:

  • NaN input: SELECT timestamp_seconds(CAST('NaN' AS DOUBLE)) (should return null)
  • Infinity input: SELECT timestamp_seconds(CAST('Infinity' AS DOUBLE)) (should return null)
  • Overflow: SELECT timestamp_seconds(9223372036854775807) (Spark throws, what does Comet do?)
  • Int32 column reference (currently only bigint column in the test table)
  • Float64 column reference (currently only literal casts)

Docs

docs/spark_expressions_support.md has timestamp_seconds marked [ ]. Should be updated to [x].

- Error on Int64 overflow instead of returning null, matching Spark's
  Math.multiplyExact which throws ArithmeticException
- Use try_unary kernel for Int32/Int64 paths to avoid per-row null
  buffer tracking overhead
- Add Float32 input type support
- Properly handle scalar inputs (scalar-in/scalar-out contract)
- Add test coverage for NaN, Infinity, Int32 column, Float64 column
- Mark timestamp_seconds as supported in spark_expressions_support.md
…mestamp

# Conflicts:
#	native/spark-expr/src/comet_scalar_funcs.rs
#	native/spark-expr/src/lib.rs
#	spark/src/main/scala/org/apache/comet/serde/datetime.scala
@andygrove
Copy link
Copy Markdown
Member Author

@mbutrovich this is ready for another look when you have time

@mbutrovich
Copy link
Copy Markdown
Contributor

This was the only nit:

Byte/Short/Decimal still unsupported — Comet will fall back to Spark for these, which is fine, but getSupportLevel doesn't check child types. If someone does
timestamp_seconds(CAST(x AS TINYINT)), it would hit a DataFusion planning error rather than a clean Comet fallback. An override like the one in
CometArrayPosition.getSupportLevel that checks input types would be more robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Spark expression: seconds_to_timestamp

3 participants