add ==, isequal <, and isless for DataFrameRow and GroupKey#2669
add ==, isequal <, and isless for DataFrameRow and GroupKey#2669
Conversation
|
Ah - and it is mildly breaking as we now define This means in particular that this hash is different than |
We probably don't care about this since it's internal, right? |
Yes - I have checked that we use rowhash only in one place in the code so it should be OK. |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
Also |
|
OK - I hope I fixed everything :). This was hard (but hopefully now Julia Base will be also more consistent with |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
@pdeffebach - this is what we concluded is a good design. Could you please comment on it before merging (as you were against it)? |
|
Sounds good! I'm fine with this. |
src/dataframerow/utils.jl
Outdated
| # table columns are passed as a tuple of vectors to ensure type specialization | ||
| rowhash(cols::Tuple{AbstractVector}, r::Int, h::UInt = zero(UInt))::UInt = | ||
| hash(cols[1][r], h) | ||
| function rowhash(cols::Tuple{Vararg{AbstractVector}}, r::Int, h::UInt = zero(UInt))::UInt |
There was a problem hiding this comment.
it is interesting that we do not have this covered by tests. I will add some.
There was a problem hiding this comment.
it turns out we do not use these functions. I have removed them. @nalimilan - could you please have a look if we would ever need them? (I have never used them)
There was a problem hiding this comment.
AFAICT one guy removed the last use of findrow and group_rows in a recent PR. :-D #2641
It's weird that coverage didn't notice this. Maybe we only looked at the changed code and not at unrelated parts?
There was a problem hiding this comment.
There were tests that checked correctness of internal functions.
| ngroups, rhashes, gslots, sorted = | ||
| row_group_slots(ntuple(i -> df[!, i], ncol(df)), Val(true), groups, false, false) | ||
| rperm, starts, stops = compute_indices(groups, ngroups) | ||
| return RowGroupDict(df, rhashes, gslots, groups, rperm, starts, stops) |
There was a problem hiding this comment.
I think RowGroupDict can also be removed.
There was a problem hiding this comment.
Indeed - I am running tests to double check. It is astonishing how much we have reworked internally in this release.
There was a problem hiding this comment.
OK - all seems clean. I will merge the PR after CI passes.
There was a problem hiding this comment.
Now that this file contains only grouping code, we will be able to move it to the corresponding folder and rename it without touching anything else. :-)
There was a problem hiding this comment.
OK will move it in this PR as otherwise we will forget. Also nonunique uses it, but I think it is not a problem.
|
Thank you! |
|
This allows right? |
|
Right, and even more e.g.: |
Fixes #2639.
As usual, before I add tests and update documentation please have a look at the implementation.
It would be good to have a decision on JuliaLang/julia#40142 before finalizing this PR.
In particular we fix a bug according to which two
DataFrameRowweretruein==if they were the same row from the same data frame even if they containedmissing(missingshould be produced then).