Skip to content

Move logical expression type-coercion code from physical-expr crate to expr crate#2257

Merged
andygrove merged 7 commits intoapache:masterfrom
andygrove:move-logical-type-logic
Apr 18, 2022
Merged

Move logical expression type-coercion code from physical-expr crate to expr crate#2257
andygrove merged 7 commits intoapache:masterfrom
andygrove:move-logical-type-logic

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 17, 2022

Which issue does this PR close?

Closes #2251.

Rationale for this change

We have logic to determine the return type of various logical expressions and this logic is called from expr_schema.rs in the logical_plan module, but the code is located in the physical-expr crate and it does not need to be there.

We could move this logic into the datafusion-expr crate instead and remove many logical plan dependencies on the datafusion-physical-expr crate, as a step towards to moving the logical plan itself to datafusion-expr, which would allow us to have logical expressions that reference logical plans (needed for subquery support).

What changes are included in this PR?

This PR moves logical type coercion code from datafusion-physical-expr to datafusion-expr. There are no functional changes.

Note that physical type coercion logic remains in the physical-expr crate.

  • Move aggregate type code (0b7327d)
  • Move window function type code (7bc25ae)
  • Move scalar function type code (6f1eadb)

Are there any user-facing changes?

Yes - code is moved between crates, so users will need to update their imports.

@andygrove andygrove added the api change Changes the API exposed to users of the crate label Apr 17, 2022
@andygrove andygrove self-assigned this Apr 17, 2022
@andygrove andygrove changed the title WIP: Move logical expression type-coercion code to datafuision-expr WIP: Move logical expression type-coercion code to datafusion-expr Apr 17, 2022
@andygrove andygrove changed the title WIP: Move logical expression type-coercion code to datafusion-expr WIP: Move logical expression type-coercion code from physical-expr crate to expr crate Apr 17, 2022
@andygrove andygrove marked this pull request as ready for review April 17, 2022 15:57
@andygrove andygrove changed the title WIP: Move logical expression type-coercion code from physical-expr crate to expr crate Move logical expression type-coercion code from physical-expr crate to expr crate Apr 17, 2022
@andygrove andygrove requested review from alamb and jimexist April 17, 2022 15:57
@andygrove
Copy link
Copy Markdown
Member Author

@alamb @jimexist Apologies for the monster PR. I am happy to try and break this down into smaller changes if that will help with reviews. There are no functional changes, just moving code around to fix one of the dependency issues that is blocking me from making progress with subquery support.

I'm not sure who else I should be @ mentioning here so feel free to tag others that you think should review this.


/// Return `true` if `arg_type` is of a [`DataType`] that the
/// [`ApproxPercentileCont`] aggregation can operate on.
pub fn is_approx_percentile_cont_supported_arg_type(arg_type: &DataType) -> bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought of a trait with supported_arg_type and return_type methods before but seems not much difference.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moving more coercion / expr logic into datafusion-expr makes sense -- thank you @andygrove

FWIW I think there was some discussion of keeping the Expr -> PhysicalExpr mapping isolated (including the type coercion) to allow different implementations of PhysicalExprs for each Expr (I think @jorgecarleitao discussed this 1+ year ago).

I haven't heard this idea mentioned for a while and I think it would be quite challenging to implement given the current DataFusion codebase, and I think coercion logic close to the exprs is 👍

cc @liukun4515 who I think has also had some thoughts in this area

@jorgecarleitao
Copy link
Copy Markdown
Member

The background of my thoughts back then was: there are two types of "coercion rules":

  • coercion rules that exist as a consequence of the supported ops from the physical engine (e.g. i32 + i64 -> cast(i32 AS i64) + i64 when the engine does not support i32 + i64).

  • coercion rules related to the semantic meaning of what the user wants. For example, a user usually wants sum(Vec<u8>) to be u64, not u8, to avoid numeric overflows.

imo the former is part of the step logical -> physical, the latter is part of the (run-time) type checking done when performing logical planning for schema resolution (i.e. what schema will the logical plan have)

@andygrove andygrove merged commit 5f0b61b into apache:master Apr 18, 2022
@andygrove andygrove deleted the move-logical-type-logic branch April 18, 2022 20:41
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Jun 9, 2023
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Jun 9, 2023
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Jan 19, 2024
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Feb 5, 2024
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move logical expression type coercion logic from physical-expr crate to expr crate

4 participants