Skip to content

Coerce Dictionary types for scalar functions#10077

Merged
alamb merged 4 commits intoapache:mainfrom
viirya:coercion_dict_scalar_func
Apr 15, 2024
Merged

Coerce Dictionary types for scalar functions#10077
alamb merged 4 commits intoapache:mainfrom
viirya:coercion_dict_scalar_func

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented Apr 14, 2024

Which issue does this PR close?

Closes #10076.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Apr 14, 2024
_ => comparison_binary_numeric_coercion(type_into, type_from).and_then(
|coerced_type| {
_ => comparison_binary_numeric_coercion(type_into, type_from)
.or_else(|| dictionary_coercion(type_into, type_from, true))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check the inner data type with coerced_from instead of comparison_coercion 🤔 ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't quite tell the difference. What would be the benefit?

I always get a little confused with the type coercion logic -- that there are different rules for certain operations I think.

Copy link
Copy Markdown
Member Author

@viirya viirya Apr 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure I understand the comment too. What you mean "check inner data type with coerced_from"?

Copy link
Copy Markdown
Contributor

@jayzhan211 jayzhan211 Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"check inner data type with coerced_from"

Similar to the current implementation of Dict
https://github.com/apache/arrow-datafusion/blob/671cef85c550969ab2c86d644968a048cb181c0c/datafusion/expr/src/type_coercion/functions.rs#L316-L321
The above once checks if the inner type in Dict is coercible by the coerced_from function.
But dictionary_coercion checks the inner type of Dict with comparison_coercion.

The coerced_from and comparison_coercion are slightly different.
comparison_coercion cares about the scenario in comparison, so loss is allowed. For example, i64 and u64, we return i64, while we get None in coerced_from for casting u64 to i64.

I had tried to find one coercion for all but ended up with the conclusion that we keep these two coercion functions. #8302.

I suggest we don't mix the logic for coerce_from and comparison_coercion. It would be nice to avoid using comparison_binary_numeric_coercion in coerced_from too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I don't notice coerced_from is updated to coerce dictionary type. I was working on a branch without update yet. The current update with dictionary coercion looks good now to fix the issue I encountered.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I notice that the current implementation has a small issue.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya and @jayzhan211 -- this looks like an improvement to me.

I don't fully understand the comment https://github.com/apache/arrow-datafusion/pull/10077/files#r1564548305 but it seems like something we could refine in a follow on PR as well

}

#[test]
fn test_coalesce_return_types_dictionary() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 we saw something similar in #9925

_ => comparison_binary_numeric_coercion(type_into, type_from).and_then(
|coerced_type| {
_ => comparison_binary_numeric_coercion(type_into, type_from)
.or_else(|| dictionary_coercion(type_into, type_from, true))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't quite tell the difference. What would be the benefit?

I always get a little confused with the type coercion logic -- that there are different rules for certain operations I think.

@viirya viirya force-pushed the coercion_dict_scalar_func branch from 531bd1a to 4a3a6fd Compare April 15, 2024 02:26
match (type_into, type_from) {
// coerced dictionary first
(cur_type, Dictionary(_, value_type)) | (Dictionary(_, value_type), cur_type)
if coerced_from(cur_type, value_type).is_some() =>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When coercing into dictionary type, the type_into and type_from parameters are in incorrect order.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an excellent find. Thanks @jayzhan211 for pointing that out

match (type_into, type_from) {
// coerced dictionary first
(cur_type, Dictionary(_, value_type)) | (Dictionary(_, value_type), cur_type)
if coerced_from(cur_type, value_type).is_some() =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an excellent find. Thanks @jayzhan211 for pointing that out

Copy link
Copy Markdown
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya

@viirya
Copy link
Copy Markdown
Member Author

viirya commented Apr 15, 2024

Thank you @alamb @jayzhan211

@viirya
Copy link
Copy Markdown
Member Author

viirya commented Apr 15, 2024

Thank you @andygrove

@alamb alamb merged commit 6ca9d10 into apache:main Apr 15, 2024
Omega359 pushed a commit to Omega359/arrow-datafusion that referenced this pull request Apr 16, 2024
* Coerce Dictionary types for scalar functions

* Fix

* Fix format

* Add test
appletreeisyellow pushed a commit to influxdata/arrow-datafusion that referenced this pull request Apr 22, 2024
* Coerce Dictionary types for scalar functions

* Fix

* Fix format

* Add test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Coerce dictionary types for scalar functions

4 participants