feat: Enhance percentile_cont to support array argument#18663
feat: Enhance percentile_cont to support array argument#18663Tim-53 wants to merge 10 commits intoapache:mainfrom
Conversation
Signed-off-by: Tim Trost <82676248+Tim-53@users.noreply.github.com>
| ) -> Result<ApproxPercentileAccumulator> { | ||
| let percentile = | ||
| validate_percentile_expr(&args.exprs[1], "APPROX_PERCENTILE_CONT")?; | ||
| validate_percentile_expr(&args.exprs[1], "APPROX_PERCENTILE_CONT")?[0]; |
There was a problem hiding this comment.
This change was made purely to make percentile_cont compile and behave correctly.
I'm not yet sure whether approx_percentile_cont should be enhanced in the same way; that can be addressed in a separate issue.
| let list_f64 = | ||
| DataType::List(Arc::new(Field::new("item", DataType::Float64, true))); | ||
|
|
||
| for num in NUMERICS { | ||
| // Accept any numeric value paired with a float64 percentile | ||
| variants.push(TypeSignature::Exact(vec![num.clone(), DataType::Float64])); | ||
| //Accept any numeric value paired with a list of float64 percentiles | ||
| variants.push(TypeSignature::Exact(vec![num.clone(), list_f64.clone()])); | ||
| } | ||
|
|
There was a problem hiding this comment.
Hmm this might not be ideal, but I don't think our current signature API does have a clean way of representing what we want here 🤔
Might need some further work on the signature API to make this nicer (not to mention hopefully cleaning up the usage of NUMERICS here, which I've been slowly working on #18092)
Perhaps we go with the user_defined approach for now 🤔
There was a problem hiding this comment.
Ye this part felt a little hacky to implement, but i diden't find a better way to express this.
Just to confirm: you're suggesting using TypeSignature::UserDefined instead of enumerating the Exact variants, right?
There was a problem hiding this comment.
Yep; in future can look into refactoring away from user_defined once signature API has better support, but for now I think it'll be easier to use user_defined here
| Some(values[0]) | ||
| } else if percentile == 0.0 { | ||
| vec![Some(values[0]); percentiles.len()] | ||
| } else if percentiles[0] == 0.0 { |
There was a problem hiding this comment.
This needs to consider length of percentiles
| percentiles | ||
| .iter() | ||
| .map(|percentile| { |
There was a problem hiding this comment.
This logic might be inefficient; I don't know how postgres does it but perhaps they just sort the values once up front allowing each percentile to be calculated O(1) on the sorted output.
Maybe we need a separate path for when we have a single percentile (do it the old way) and multiple percentiles (sort input then calculate all percentiles at once)
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Since this is still a draft, I’d love feedback on whether I’m going in the right direction here and if the operator overloading is implemented in a way that aligns with the project’s conventions, or if there’s a better pattern to use.
Which issue does this PR close?
percentile_contaggregate function #18600.Rationale for this change
What changes are included in this PR?
percentile_contAre these changes tested?
Tests will be added before finalizing the PR
Are there any user-facing changes?
Yes:
percentile_contnow accepts arrays of percentile values.Documentation will be updated once the API is finalized.
This change is backwards compatible and should not(?) be considered a API change.