Coerce Dictionary types for scalar functions#10077
Conversation
| _ => comparison_binary_numeric_coercion(type_into, type_from).and_then( | ||
| |coerced_type| { | ||
| _ => comparison_binary_numeric_coercion(type_into, type_from) | ||
| .or_else(|| dictionary_coercion(type_into, type_from, true)) |
There was a problem hiding this comment.
Should we check the inner data type with coerced_from instead of comparison_coercion 🤔 ?
There was a problem hiding this comment.
I couldn't quite tell the difference. What would be the benefit?
I always get a little confused with the type coercion logic -- that there are different rules for certain operations I think.
There was a problem hiding this comment.
Hmm, not sure I understand the comment too. What you mean "check inner data type with coerced_from"?
There was a problem hiding this comment.
"check inner data type with coerced_from"
Similar to the current implementation of Dict
https://github.com/apache/arrow-datafusion/blob/671cef85c550969ab2c86d644968a048cb181c0c/datafusion/expr/src/type_coercion/functions.rs#L316-L321
The above once checks if the inner type in Dict is coercible by the coerced_from function.
But dictionary_coercion checks the inner type of Dict with comparison_coercion.
The coerced_from and comparison_coercion are slightly different.
comparison_coercion cares about the scenario in comparison, so loss is allowed. For example, i64 and u64, we return i64, while we get None in coerced_from for casting u64 to i64.
I had tried to find one coercion for all but ended up with the conclusion that we keep these two coercion functions. #8302.
I suggest we don't mix the logic for coerce_from and comparison_coercion. It would be nice to avoid using comparison_binary_numeric_coercion in coerced_from too.
There was a problem hiding this comment.
Ah, I don't notice coerced_from is updated to coerce dictionary type. I was working on a branch without update yet. The current update with dictionary coercion looks good now to fix the issue I encountered.
There was a problem hiding this comment.
But I notice that the current implementation has a small issue.
alamb
left a comment
There was a problem hiding this comment.
Thanks @viirya and @jayzhan211 -- this looks like an improvement to me.
I don't fully understand the comment https://github.com/apache/arrow-datafusion/pull/10077/files#r1564548305 but it seems like something we could refine in a follow on PR as well
| } | ||
|
|
||
| #[test] | ||
| fn test_coalesce_return_types_dictionary() { |
| _ => comparison_binary_numeric_coercion(type_into, type_from).and_then( | ||
| |coerced_type| { | ||
| _ => comparison_binary_numeric_coercion(type_into, type_from) | ||
| .or_else(|| dictionary_coercion(type_into, type_from, true)) |
There was a problem hiding this comment.
I couldn't quite tell the difference. What would be the benefit?
I always get a little confused with the type coercion logic -- that there are different rules for certain operations I think.
531bd1a to
4a3a6fd
Compare
| match (type_into, type_from) { | ||
| // coerced dictionary first | ||
| (cur_type, Dictionary(_, value_type)) | (Dictionary(_, value_type), cur_type) | ||
| if coerced_from(cur_type, value_type).is_some() => |
There was a problem hiding this comment.
When coercing into dictionary type, the type_into and type_from parameters are in incorrect order.
There was a problem hiding this comment.
That is an excellent find. Thanks @jayzhan211 for pointing that out
| match (type_into, type_from) { | ||
| // coerced dictionary first | ||
| (cur_type, Dictionary(_, value_type)) | (Dictionary(_, value_type), cur_type) | ||
| if coerced_from(cur_type, value_type).is_some() => |
There was a problem hiding this comment.
That is an excellent find. Thanks @jayzhan211 for pointing that out
|
Thank you @alamb @jayzhan211 |
|
Thank you @andygrove |
* Coerce Dictionary types for scalar functions * Fix * Fix format * Add test
* Coerce Dictionary types for scalar functions * Fix * Fix format * Add test
Which issue does this PR close?
Closes #10076.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?