What is the problem the feature request solves?
Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark sort_array function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.
The SortArray expression sorts the elements of an array in either ascending or descending order. It returns a new array with the same elements sorted according to the specified ordering, with null values handled according to a consistent null-first or null-last policy.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
sort_array(array[, ascendingOrder])
sort_array(col("array_column"))
sort_array(col("array_column"), lit(false)) // descending order
Arguments:
| Argument |
Type |
Description |
base |
ArrayType |
The input array to be sorted |
ascendingOrder |
BooleanType |
Optional. If true (default), sorts in ascending order; if false, sorts in descending order. Must be a foldable expression (constant). |
Return Type: Returns an ArrayType with the same element type and nullability as the input array.
Supported Data Types:
Supports arrays containing any orderable data types:
- Numeric types (IntegerType, LongType, DoubleType, FloatType, etc.)
- String types (StringType)
- Date and timestamp types
- Binary types
- Does NOT support arrays containing non-orderable types like MapType or complex nested structures
Edge Cases:
- Null arrays: Returns null if the input array is null
- Empty arrays: Returns an empty array of the same type
- Arrays with all nulls: Returns an array with all nulls in the same positions (nulls are equal in comparison)
- Mixed null and non-null elements: Nulls are consistently placed at the beginning (ascending) or end (descending)
- Non-foldable ascendingOrder: Throws DataTypeMismatch error - the ordering parameter must be a constant
- Non-orderable element types: Throws DataTypeMismatch error during type checking
Examples:
-- Basic ascending sort (default)
SELECT sort_array(array(3, 1, 4, 1, 5)) AS sorted;
-- Result: [1, 1, 3, 4, 5]
-- Descending sort
SELECT sort_array(array('d', 'c', 'b', 'a', null), false) AS sorted_desc;
-- Result: ['d', 'c', 'b', 'a', null]
-- With null values (ascending)
SELECT sort_array(array(3, null, 1, null, 2)) AS sorted_with_nulls;
-- Result: [null, null, 1, 2, 3]
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(sort_array(col("numbers"))).show()
df.select(sort_array(col("strings"), lit(false))).show()
// With column reference for array
df.withColumn("sorted_values", sort_array(col("value_array"))).show()
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
- Scala Serde: Add expression handler in
spark/src/main/scala/org/apache/comet/serde/
- Register: Add to appropriate map in
QueryPlanSerde.scala
- Protobuf: Add message type in
native/proto/src/proto/expr.proto if needed
- Rust: Implement in
native/spark-expr/src/ (check if DataFusion has built-in support first)
Additional context
Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.SortArray
Related:
array_sort - Alternative function name in some Spark versions
array_max, array_min - For finding extremes without full sorting
shuffle - For randomizing array element order
reverse - For reversing array element order
This issue was auto-generated from Spark reference documentation.
What is the problem the feature request solves?
Comet does not currently support the Spark
sort_arrayfunction, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.The
SortArrayexpression sorts the elements of an array in either ascending or descending order. It returns a new array with the same elements sorted according to the specified ordering, with null values handled according to a consistent null-first or null-last policy.Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
Arguments:
baseascendingOrderReturn Type: Returns an
ArrayTypewith the same element type and nullability as the input array.Supported Data Types:
Supports arrays containing any orderable data types:
Edge Cases:
Examples:
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scalanative/proto/src/proto/expr.protoif needednative/spark-expr/src/(check if DataFusion has built-in support first)Additional context
Difficulty: Medium
Spark Expression Class:
org.apache.spark.sql.catalyst.expressions.SortArrayRelated:
array_sort- Alternative function name in some Spark versionsarray_max,array_min- For finding extremes without full sortingshuffle- For randomizing array element orderreverse- For reversing array element orderThis issue was auto-generated from Spark reference documentation.