Skip to content

Add Coalesce function#1969

Merged
yjshen merged 11 commits intoapache:masterfrom
msathis:coalesce_fn
Apr 6, 2022
Merged

Add Coalesce function#1969
yjshen merged 11 commits intoapache:masterfrom
msathis:coalesce_fn

Conversation

@msathis
Copy link
Copy Markdown
Contributor

@msathis msathis commented Mar 9, 2022

Which issue does this PR close?

Closes #1890.

Rationale for this change

To support coalesce function in SQL

What changes are included in this PR?

Basic coalesce support. Now everything is treated as string, will have to work to support other data types properly.

Are there any user-facing changes?

Users can use coalesce function in SQL now

@msathis msathis marked this pull request as ready for review March 15, 2022 09:45
@msathis msathis changed the title Coalesce function - WIP Coalesce function Mar 15, 2022
@msathis
Copy link
Copy Markdown
Contributor Author

msathis commented Mar 15, 2022

@andygrove @xudong963 Please take a look & review when you find time.

for column_value in args {
match column_value {
ColumnarValue::Array(array_ref) => {
if array_ref.is_valid(i) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation can likely be optimized (I think it should be >10x faster) by not converting values to ScalarValue and using is_valid, matching per value, but operating directly on Arrays. That would also allow optimizations like skipping running the coalesce althogether if all items in the array already are non null, or skipping handling the array if every value is null, etc.

This will involve some more work / code though, but maybe a comment that there are some big optimizations possible could be useful.

The current implementation also could likely already be optimized a bit by moving the for loop to the innermost place instead of the outermost place and/or implementing one of the rules like mentioned.

@yjshen yjshen added the api change Changes the API exposed to users of the crate label Mar 16, 2022
@yjshen
Copy link
Copy Markdown
Member

yjshen commented Mar 16, 2022

@msathis Could you please also add this to Ballista, as #2008 did?

@msathis
Copy link
Copy Markdown
Contributor Author

msathis commented Mar 16, 2022

@yjshen Yup. Thanks for pointing out. I'll add it to Ballista with the optimization @Dandandan suggested

@xudong963 xudong963 added the enhancement New feature or request label Mar 17, 2022
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 21, 2022

@yjshen Yup. Thanks for pointing out. I'll add it to Ballista with the optimization @Dandandan suggested

@msathis do you plan to do this as a follow on PR or this PR?

In other words is this PR ready for review again or are you still working on it?

BuiltinScalarFunction::Chr => Ok(DataType::Utf8),
BuiltinScalarFunction::Coalesce => {
// COALESCE has multiple args and they might get coerced, get a preview of this
let coerced_types = data_types(input_expr_types, &signature(fun));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any doc or reference to guide for the common result type or the coercion type?

}

#[tokio::test]
async fn coalesce_sum_with_default_value() -> Result<()> {
Copy link
Copy Markdown
Contributor

@liukun4515 liukun4515 Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add other case with agg function from #2067 (comment)

select COALESCE(sum(c3),0) from table

All the value of c3 is null.
@msathis

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. i'll update the test case!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can try this case

select COALESCE(sum(c3),0) from table group by xxx

@msathis
Copy link
Copy Markdown
Contributor Author

msathis commented Mar 24, 2022

R ready for review again or are you still working on it?

Hi @alamb , Sorry, i got busy with some office work. I'll try to revise the PR in the next couple of days. I was planning to address all the performance optimisations @Dandandan suggested.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 24, 2022

Thank you @msathis -- makes total sense. Let us know if you need any help!

}
ColumnarValue::Scalar(scalar) => {
if !scalar.is_null() {
// TODO: Figure out how to set value at index
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alamb, I need your help here. I couldn't figure out how to set value to particular index of arrow::Array

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may want to call scalar. to_array_of_size(size) here to create the appropriate array:

https://github.com/alamb/arrow-datafusion/blob/41b4e49/datafusion/common/src/scalar.rs#L1075-L1076

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb I think that worked.

@liukun4515
Copy link
Copy Markdown
Contributor

@msathis If the pr is ready to review, please at me.
Thanks.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @msathis -- this is a great PR. I think it could be merged as is, though I have some small comments / suggestions.

I also updated this PR description as it said "everything was treated as a string" which I do not think applies any longer

"| coalesce(test.c1,test.c2) |",
"+---------------------------+",
"| 0 |",
"| 1 |",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread datafusion/core/tests/sql/functions.rs Outdated
}

#[tokio::test]
async fn coalesce_static_empty_value() -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test seems to be the same as coalesce_plan -- is that intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its an overlook, you're right. i'll remove coalesce_plan

Comment thread datafusion/core/tests/sql/functions.rs Outdated
}

#[tokio::test]
async fn coalesce_static_value_with_null() -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

#[tokio::test]
async fn coalesce_result_with_default_value() -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stylistically this might be better if it were right next to coalesce_result given its similarity.

For that matter, since it uses the same data setup, it might be cool to combine the two tests (and run two different queries in them)

"+---------------------------------------------+",
];
assert_batches_eq!(expected, &actual);
Ok(())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice testing 👍

Comment thread datafusion/expr/src/expr_fn.rs Outdated
Comment thread datafusion/physical-expr/src/conditional_expressions.rs
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 2, 2022

I will wait for @liukun4515 to review as well prior to merging

@msathis
Copy link
Copy Markdown
Contributor Author

msathis commented Apr 2, 2022

@alamb Updated the PR to address the review comments

@msathis
Copy link
Copy Markdown
Contributor Author

msathis commented Apr 2, 2022

@msathis If the pr is ready to review, please at me. Thanks.

@liukun4515 Your review will be great! I think the PR is ready. My knowledge is limited, i feel more optimisation is possible.

@alamb alamb changed the title Coalesce function Add Coalesce function Apr 3, 2022
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 3, 2022

Thanks @msathis -- I think this is looking great

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 5, 2022

@liukun4515 would you still like to review this prior to merging?

Copy link
Copy Markdown
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @msathis!

Copy link
Copy Markdown
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yjshen yjshen merged commit b890190 into apache:master Apr 6, 2022
ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 10, 2022
* feat: coalesce function. WIP

* fix: add license header

* fix: support all primitive data types

* fix: add test case and remove commented code

* fix: refactor code and support coalesce from ballista

* fix: support scalar values in coalesce

* fix: clippy

* fix: clippy by removing unneeded mut

* fix: clippy by removing unneeded mut

* fix: address review comments

Co-authored-by: Sathis Kumar <sathis.kumar@udemy.com>
ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 10, 2022
* feat: coalesce function. WIP

* fix: add license header

* fix: support all primitive data types

* fix: add test case and remove commented code

* fix: refactor code and support coalesce from ballista

* fix: support scalar values in coalesce

* fix: clippy

* fix: clippy by removing unneeded mut

* fix: clippy by removing unneeded mut

* fix: address review comments

Co-authored-by: Sathis Kumar <sathis.kumar@udemy.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for SQL coalesce function

6 participants