Skip to content

Support arithmetic scalar operation with DictionaryArray#5151

Merged
liukun4515 merged 3 commits intoapache:masterfrom
viirya:arithmetic_dyn_scalar
Feb 3, 2023
Merged

Support arithmetic scalar operation with DictionaryArray#5151
liukun4515 merged 3 commits intoapache:masterfrom
viirya:arithmetic_dyn_scalar

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented Feb 1, 2023

Which issue does this PR close?

Closes #5150.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates labels Feb 1, 2023
@viirya viirya force-pushed the arithmetic_dyn_scalar branch from 2a33923 to 8b7fcc7 Compare February 1, 2023 19:56
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 1, 2023

Thanks @viirya -- it is great to see someone else using Dictionary arrays ❤️

Looks like there are some compilation errors on this PR. Marking as draft until those are resolved.

error[E0425]: cannot find function `divide_decimal_scalar` in this scope
   --> datafusion/physical-expr/src/expressions/binary/kernels_arrow.rs:465:19
    |
[231](https://github.com/apache/arrow-datafusion/actions/runs/4068186587/jobs/7006440910#step:8:232) | / pub(crate) fn divide_decimal_dyn_scalar(
[232](https://github.com/apache/arrow-datafusion/actions/runs/4068186587/jobs/7006440910#step:8:233) | |     left: &dyn Array,
[233](https://github.com/apache/arrow-datafusion/actions/runs/4068186587/jobs/7006440910#step:8:234) | |     right: i128,
[234](https://github.com/apache/arrow-datafusion/actions/runs/4068186587/jobs/7006440910#step:8:235) | | ) -> Result<Decimal128Array> {
...   |
242 | |     Ok(decimal_array)
243 | | }
    | |_- similarly named function `divide_decimal_dyn_scalar` defined here
...
465 |           let err = divide_decimal_scalar(&left_decimal_array, 0).unwrap_err();
    |                     ^^^^^^^^^^^^^^^^^^^^^ help: a function with a similar name exists: `divide_decimal_dyn_scalar`

@alamb alamb marked this pull request as draft February 1, 2023 22:16
@viirya
Copy link
Copy Markdown
Member Author

viirya commented Feb 1, 2023

Thanks @alamb . Yea, I'm trying to fix them locally.

@viirya viirya force-pushed the arithmetic_dyn_scalar branch 2 times, most recently from e8669d4 to 699d7c2 Compare February 1, 2023 23:07
@viirya viirya marked this pull request as ready for review February 1, 2023 23:57
@viirya viirya force-pushed the arithmetic_dyn_scalar branch from 3a0d313 to 67cd4a7 Compare February 2, 2023 00:42
@liukun4515 liukun4515 self-requested a review February 2, 2023 04:19
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- it would be great to eventually be able to write sql level tests for this feature, but there is no good way to make dictionary arrays yet from SQL (but I am hoping to get #5016 which would allow it).

Thank you for the contribution @viirya

cc @liukun4515 as this also impacts Decimals (though I don't expect any behavior change)

(DataType::Dictionary(_, value_type), _) => {
is_numeric(value_type) && is_numeric(rhs_type)
}
(_, DataType::Dictionary(_, value_type)) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this need to handle the case when both lhs and rhs are Dictionaries 🤔

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm wondering what Dictionary for ScalaValue(rhs here) means although I have seen DataFusion has it.

It sounds reasonable to add the following:

(DataType::Dictionary(_, lhs_value_type), DataType::Dictionary(_, rhs_value_type2)) => {
  is_numeric(lhs_value_type) && is_numeric(rhs_value_type2)
}

Comment thread datafusion/physical-expr/src/expressions/binary.rs Outdated

let mut dict_builder = PrimitiveDictionaryBuilder::<Int8Type, Int32Type>::new();

dict_builder.append(2)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

)
}
Operator::Modulo => {
// todo: change to binary_primitive_array_op_dyn_scalar! once modulo is implemented
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo change: apache/arrow-rs#3649

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove binary_primitive_array_op_scalar once this is done?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, once modulo is ready to do.

@viirya
Copy link
Copy Markdown
Member Author

viirya commented Feb 2, 2023

Looks good to me -- it would be great to eventually be able to write sql level tests for this feature, but there is no good way to make dictionary arrays yet from SQL (but I am hoping to get #5016 which would allow it).

Thank you @alamb. Yea, we have internal e2e tests that use dictionary arrays but seems I don't find similar stuff on DataFusion so far. We can definitely expand test coverage for this in the future.

@viirya
Copy link
Copy Markdown
Member Author

viirya commented Feb 2, 2023

BTW, I think array op array is also not supported yet on dictionary arrays. I will work on it after this.

Comment thread datafusion/physical-expr/src/expressions/binary.rs Outdated
Comment thread datafusion/physical-expr/src/expressions/binary.rs
macro_rules! binary_primitive_array_op_dyn_scalar {
($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
// unwrap underlying (non dictionary) value
let right = unwrap_dict_value($RIGHT);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that we have "scalar dictionary":

    /// Dictionary type: index type and value
    Dictionary(Box<DataType>, Box<ScalarValue>),

wonder how useful this is ...

)
}
Operator::Modulo => {
// todo: change to binary_primitive_array_op_dyn_scalar! once modulo is implemented
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove binary_primitive_array_op_scalar once this is done?

Comment thread datafusion/physical-expr/src/expressions/binary.rs Outdated
Copy link
Copy Markdown
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liukun4515 liukun4515 merged commit c759865 into apache:master Feb 3, 2023
@ursabot
Copy link
Copy Markdown

ursabot commented Feb 3, 2023

Benchmark runs are scheduled for baseline = 224c682 and contender = c759865. c759865 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Arithmetic scalar operation doesn't work with DictionaryArray

5 participants