Coerce Dictionary types for scalar functions by viirya · Pull Request #10077 · apache/datafusion

viirya · 2024-04-14T07:35:54Z

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jayzhan211 · 2024-04-14T08:06:35Z

-        _ => comparison_binary_numeric_coercion(type_into, type_from).and_then(
-            |coerced_type| {
+        _ => comparison_binary_numeric_coercion(type_into, type_from)
+            .or_else(|| dictionary_coercion(type_into, type_from, true))


Should we check the inner data type with coerced_from instead of comparison_coercion 🤔 ?

I couldn't quite tell the difference. What would be the benefit?

I always get a little confused with the type coercion logic -- that there are different rules for certain operations I think.

Hmm, not sure I understand the comment too. What you mean "check inner data type with coerced_from"?

"check inner data type with coerced_from"

Similar to the current implementation of Dict
https://github.com/apache/arrow-datafusion/blob/671cef85c550969ab2c86d644968a048cb181c0c/datafusion/expr/src/type_coercion/functions.rs#L316-L321
The above once checks if the inner type in Dict is coercible by the coerced_from function.
But dictionary_coercion checks the inner type of Dict with comparison_coercion.

The coerced_from and comparison_coercion are slightly different.
comparison_coercion cares about the scenario in comparison, so loss is allowed. For example, i64 and u64, we return i64, while we get None in coerced_from for casting u64 to i64.

I had tried to find one coercion for all but ended up with the conclusion that we keep these two coercion functions. #8302.

I suggest we don't mix the logic for coerce_from and comparison_coercion. It would be nice to avoid using comparison_binary_numeric_coercion in coerced_from too.

Ah, I don't notice coerced_from is updated to coerce dictionary type. I was working on a branch without update yet. The current update with dictionary coercion looks good now to fix the issue I encountered.

But I notice that the current implementation has a small issue.

alamb

Thanks @viirya and @jayzhan211 -- this looks like an improvement to me.

I don't fully understand the comment https://github.com/apache/arrow-datafusion/pull/10077/files#r1564548305 but it seems like something we could refine in a follow on PR as well

alamb · 2024-04-14T12:36:34Z

    }
+
+    #[test]
+    fn test_coalesce_return_types_dictionary() {


👍 we saw something similar in #9925

alamb · 2024-04-14T15:19:55Z

-        _ => comparison_binary_numeric_coercion(type_into, type_from).and_then(
-            |coerced_type| {
+        _ => comparison_binary_numeric_coercion(type_into, type_from)
+            .or_else(|| dictionary_coercion(type_into, type_from, true))


I couldn't quite tell the difference. What would be the benefit?

I always get a little confused with the type coercion logic -- that there are different rules for certain operations I think.

viirya · 2024-04-15T02:34:09Z

    match (type_into, type_from) {
        // coerced dictionary first
-        (cur_type, Dictionary(_, value_type)) | (Dictionary(_, value_type), cur_type)
-            if coerced_from(cur_type, value_type).is_some() =>


When coercing into dictionary type, the type_into and type_from parameters are in incorrect order.

That is an excellent find. Thanks @jayzhan211 for pointing that out

alamb · 2024-04-15T10:28:14Z

    match (type_into, type_from) {
        // coerced dictionary first
-        (cur_type, Dictionary(_, value_type)) | (Dictionary(_, value_type), cur_type)
-            if coerced_from(cur_type, value_type).is_some() =>


That is an excellent find. Thanks @jayzhan211 for pointing that out

jayzhan211

👍

andygrove

Thanks @viirya

viirya · 2024-04-15T16:57:41Z

Thank you @alamb @jayzhan211

viirya · 2024-04-15T16:58:04Z

Thank you @andygrove

* Coerce Dictionary types for scalar functions * Fix * Fix format * Add test

github-actions bot added the logical-expr Logical plan and expressions label Apr 14, 2024

jayzhan211 reviewed Apr 14, 2024

View reviewed changes

alamb approved these changes Apr 14, 2024

View reviewed changes

viirya added 2 commits April 14, 2024 19:00

Coerce Dictionary types for scalar functions

da9dc8a

Fix

4a3a6fd

viirya force-pushed the coercion_dict_scalar_func branch from 531bd1a to 4a3a6fd Compare April 15, 2024 02:26

Fix format

819b88c

viirya commented Apr 15, 2024

View reviewed changes

Add test

e5dcde3

alamb approved these changes Apr 15, 2024

View reviewed changes

jayzhan211 approved these changes Apr 15, 2024

View reviewed changes

andygrove approved these changes Apr 15, 2024

View reviewed changes

alamb merged commit 6ca9d10 into apache:main Apr 15, 2024

Omega359 pushed a commit to Omega359/arrow-datafusion that referenced this pull request Apr 16, 2024

Coerce Dictionary types for scalar functions (apache#10077)

d181e23

* Coerce Dictionary types for scalar functions * Fix * Fix format * Add test

appletreeisyellow pushed a commit to influxdata/arrow-datafusion that referenced this pull request Apr 22, 2024

Coerce Dictionary types for scalar functions (apache#10077)

00eb2d0

* Coerce Dictionary types for scalar functions * Fix * Fix format * Add test

appletreeisyellow mentioned this pull request Apr 22, 2024

test: non_negative_derivative influxdata/arrow-datafusion#9

Closed

Conversation

viirya commented Apr 14, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Apr 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayzhan211 Apr 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayzhan211 left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

viirya commented Apr 15, 2024

Uh oh!

viirya commented Apr 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

viirya Apr 14, 2024 •

edited

Loading

jayzhan211 Apr 15, 2024 •

edited

Loading