Conversation
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
Thank you! |
| @testset "sorting API" begin | ||
| # simple tests | ||
| df = DataFrame(x=["b", "c", "b", "a", "c"]) | ||
| @test getindex.(keys(groupby(df, :x)), 1) == ["b", "c", "a"] |
There was a problem hiding this comment.
I think these can use only instead of getindex(_, 1).
There was a problem hiding this comment.
indeed it could. I initially had this test implemented on groups not on keys and using getindex works on both.
| @test getindex.(keys(groupby(df, :x, sort=true)), 1) == [1, 2, 100] | ||
| @test getindex.(keys(groupby(df, :x, sort=NamedTuple())), 1) == [1, 2, 100] | ||
| @test getindex.(keys(groupby(df, :x, sort=false)), 1) == [2, 100, 1] | ||
| @test getindex.(keys(groupby(df, order(:x))), 1) == [1, 2, 100] |
There was a problem hiding this comment.
Having so many equivalent ways to specify sorting does seem a bit much? Not sure if it's worth doing anything about.
There was a problem hiding this comment.
The issue is that sort etc. provide that many ways to specify sort order, so we cannot do anything about it.
What is the rationale behind it:
- normally people will use the "global" settings like
(rev=true,), which applies to all columns - however there are cases when you want to specify sorting order per column, e.g.
[order(:x, rev=true), :y], where you reverse:xbut sort on:yin ascending order. Thereforeorder"per column" is needed.
In general - this complexity is needed when one has several columns.
|
@jariji - if you would be willing actually improving https://dataframes.juliadata.org/stable/man/sorting/ section of the manual would be welcome. I planned to do it at some point, but maybe you would be willing to give it a shot and give a more in-depth coverage of all sorting options (this PR just inherits the complexity we allow for there). |
|
Is there an advantage to sorting during the groupby versus sorting the groups afterwards? |
|
There is no convenient way to sort the groups afterwards AFAICT. To get a desired order you would need to sort the data frame you |
|
It is expensive (in terms of time and memory)
Sorting while grouping will be faster and more convenient if user wants groups sorted (and most likely this is a most typical use case where user knows upfront how one wants groups to be sorted).
|
Fixes #3251