Fix type instability in sort for few columns case and fix issorted bug#2746
Fix type instability in sort for few columns case and fix issorted bug#2746
Conversation
|
We have a bad design of sorting in general: I will push an update (and then the code will be simplified with recursion) |
|
@nalimilan - this should be good to have a look at. The only issue is that in general sorting is expensive to compile. I will have to think how to reduce this cost (though maybe it is not easy to do). |
|
Timings after this PR: and so all is OK (the penalty of |
Yeah, that little overhead is fantastic. Thank you! |
nalimilan
left a comment
There was a problem hiding this comment.
Thanks! I wonder how this went unnoticed for so long.
Maybe one way to limit the compilation cost would be to sort column-wise rather than row-wise (starting with the last column)? Not sure whether that would be fast. Anyway that would require a deeper refactoring so this PR is useful even if we later change the approach.
| # sort the original (presumably larger) DataFrame | ||
|
|
||
| struct DFPerm{O<:Union{Ordering, AbstractVector}, T<:Tuple} <: Ordering | ||
| struct DFPerm{O<:Union{Ordering, Tuple{Vararg{Ordering}}}, |
There was a problem hiding this comment.
Would it make sense to make O always a Tuple{Vararg{Ordering}}? That would avoid the need for ord isa Ordering below.
There was a problem hiding this comment.
The problem is that if you do sort(df) you want a single Ordering that is reused to avoid excessive compilation. I would assume that ord isa Ordering check should be optimized out by the compiler so it should have no performance penalty.
There was a problem hiding this comment.
I tried simplifying it but I always ended up with compiler allocating.
There was a problem hiding this comment.
I don't get it. Why would wrapping the Ordering in a one-element tuple trigger more compilation or allocations?
There was a problem hiding this comment.
Ah - one element Tuple is not a problem, but in this case we would anyway have to branch if the tuple is one element or matches length of the column vector.
What allocates is if we wanted to have one tuple where each element would hold a tuple consisting of order and column.
There was a problem hiding this comment.
OK. Anyway always wrapping Ordering in a tuple sounds simpler conceptually.
There was a problem hiding this comment.
I will change it if I can make work it fast.
There was a problem hiding this comment.
What's the conclusion regarding this?
There was a problem hiding this comment.
I decided to leave the union (as in the original design) as in the end the logic was simpler (through dispatch).
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
This would be slower AFAICT, because now we are short circuting (so mostly only one comparison is needed on the first column in typical cases) |
|
We cannot use recursion, as it will break on very wide data frames. I will fix it tomorrow. |
Good point. But I guess that depends on the types of sorted columns. When sorting on bitstype columns which can use optimized algorithms, I imagine that sorting one column at a time could be faster. I'm thinking about integer columns with small ranges which use counting sort in Base, floating point columns which use a special quick sort IIRC, or other types which could use radix sort (only via SortingAlgorithms currently). But we would have to be very careful to do that only for cases known to be fast anyway and that's tricky to get right. |
I agree, but currently we have: so I think that the first step should be to make sorting on single column fast (which we do at some point - in general sorting and reshaping are things to work on in the near future as these were the areas here no new things were added for a long time). But for this PR I would concentrate the design on fixing type instability issues. |
|
Here are the benchmarks after the fix of recursion. Additionally I have discovered a bug in this PR current |
|
@nalimilan - no rush, but it would be good to review it and merge, as it is fixing a bug in |
src/abstractdataframe/sort.jl
Outdated
| function Sort.lt(o::DFPerm{<:Any, <:Tuple}, a, b) | ||
| ord = o.ord | ||
| cols = o.cols | ||
| length(cols) > 16 && return unstable_lt(ord, cols, a, b) |
There was a problem hiding this comment.
Add a comment explaining how the 16 threshold was chosen?
| length(cols) > 16 && return unstable_lt(ord, cols, a, b) | |
| # if there are too many columns fall back to type unstable mode to avoid high compilation cost | |
| # it is expected that in practice users sort data frames on only few columns | |
| length(cols) > 16 && return unstable_lt(ord, cols, a, b) |
There was a problem hiding this comment.
added. I have not tuned 16 specifically. I just assume that 16 is a safe threshold. Probably we could pass some higher number here, but I think that normally one does not sort on more than something like 4 columns.
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
Thank you! |
Fixes #2745 by loop unrolling.
Probably is is possible to do it in a smarter way using metaprogramming (but maybe not) so I am marking it as a draft.