Commit f7495da
[SPARK-51501][SQL] Disable ObjectHashAggregate for group by on collated columns
### What changes were proposed in this pull request?
Disabling `ObjectHashAggregate` when grouping on columns with collations.
### Why are the changes needed?
#45290 added support for sort based aggregation on collated columns and explicitly forbade the use of hash aggregate for collated columns. However, it did not consider the third type of aggregate, the object hash aggregate, which is only used when there are also TypedImperativeAggregate expressions present ([source](https://github.com/apache/spark/blob/f3b081066393e1568c364b6d3bc0bceabd1e7e9f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L1204)).
That means that if we group by a collated column and also have a TypedImperativeAggregate we will end up using the object has aggregate which can lead to incorrect results like in the example below:
```code
CREATE TABLE tbl(c1 STRING COLLATE UTF8_LCASE, c2 INT) USING PARQUET;
INSERT INTO tbl VALUES ('HELLO', 1), ('hello', 2), ('HeLlO', 3);
SELECT COLLECT_LIST(c2) as list FROM tbl GROUP BY c1;
```
where the result would have three rows with values [1], [2] and [3] instead of one row with value [1, 2, 3].
For this reason we should do the same thing as we did for the regular hash aggregate, make it so that it doesn't support grouping expressions on collated columns.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #50267 from stefankandic/fixObjectHashAgg.
Authored-by: Stefan Kandic <stefan.kandic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>1 parent 84b9848 commit f7495da
File tree
4 files changed
+104
-4
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst
- optimizer
- plans/logical
- core/src
- main/scala/org/apache/spark/sql/execution/aggregate
- test/scala/org/apache/spark/sql/collation
4 files changed
+104
-4
lines changedLines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
366 | 366 | | |
367 | 367 | | |
368 | 368 | | |
369 | | - | |
370 | | - | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
371 | 373 | | |
372 | 374 | | |
373 | 375 | | |
| |||
Lines changed: 8 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1199 | 1199 | | |
1200 | 1200 | | |
1201 | 1201 | | |
1202 | | - | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
1203 | 1210 | | |
1204 | 1211 | | |
1205 | 1212 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
97 | | - | |
| 97 | + | |
| 98 | + | |
98 | 99 | | |
99 | 100 | | |
100 | 101 | | |
| |||
Lines changed: 90 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
0 commit comments