Introduce FunctionRegistry dependency to optimize and rewrite rule#10714
Introduce FunctionRegistry dependency to optimize and rewrite rule#10714jayzhan211 merged 6 commits intoapache:mainfrom
Conversation
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
| pub mod unwrap_cast_in_comparison; | ||
| pub mod utils; | ||
|
|
||
| #[cfg(test)] |
There was a problem hiding this comment.
Not only integration test but also datafusion-example
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
alamb
left a comment
There was a problem hiding this comment.
Thanks @jayzhan211 -- this looks very cool. THank you for breaking the dependency
Rather than adding a new argument to all optimizer sites, which will cause API churn, what do you think about adding a new (backwards compatible) API to OptimizerConfig?
Perhaps something like
trait OptimizerConfig {
...
/// return the function registry for looking up functions
/// returns None by default.
/// Used for optimizations like `distinct ON` --> first_value
fn function_registry(&self) -> Option<&dyn FunctionRegistry> {
None
}
}The only downside I can see to providing the entire registry is that it then adds an implicit dependency on whatever is registered as "first_value" that should probably be documented
A way to make that more explicit might be something like
trait OptimizerConfig {
...
/// return the function for first_value, if known. This function will be
/// used to rewrite DISTINCT ON if present
fn first_value(&self) -> Option<AggregateUDF> {
None
}
}🤔
| let aggr_expr = select_expr | ||
| .into_iter() | ||
| .map(|e| first_value(vec![e], false, None, sort_expr.clone(), None)); | ||
| let aggr_expr = select_expr.into_iter().map(|e| { |
There was a problem hiding this comment.
this change is the reason for this PR I think -- to avoid the hard coded dependency on first_value...
| 45 15673 | ||
| -72 -11122 | ||
|
|
||
| # test distinct on |
|
I would prefer the former to avoid adding too many function in optimize config trait |
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
1414b15 to
62d59a2
Compare
|
Looks very nice 🆗 I think we can also remove the dependency from cargo now as well: |
I totally forgot my main purpose of this PR :( |
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
| chrono = { workspace = true } | ||
| datafusion-common = { workspace = true, default-features = true } | ||
| datafusion-expr = { workspace = true } | ||
| datafusion-functions-aggregate = { workspace = true } |
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
|
Thanks @alamb |
…pache#10714) * mv function registry to expr Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * registry move to config trait Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm dependency Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix cli cargo lock Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Which issue does this PR close?
Closes #10703.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?