[BREAKING] Handle zero groups#2324
Conversation
|
CC @pdeffebach - you might want to test it, as the cases are tricky. |
|
Thanks! I just played around with it and I think this is good. It basically just adds new columns so that the returned data frame has the correct names and types. I think this is convenient behavior since it requires less data validation on the user's side. |
|
Thank you for looking into this. I will re-read the whole code before @nalimilan goes back on-line to make sure we can merge this when he is available. |
nalimilan
left a comment
There was a problem hiding this comment.
Thanks. Looks mostly good. I have to trust you regarding the places where you added checks for zero groups as the code is really tricky...
| collect(axes(df, 1)), [1], [nrow(df)], 1, nothing, | ||
| Threads.ReentrantLock()) | ||
| return GroupedDataFrame(df, Symbol[], ones(Int, nrow(df)), | ||
| nothing, nothing, nothing, nrow(df) == 0 ? 0 : 1, |
There was a problem hiding this comment.
Why not continue filling fields with vectors instead of nothing?
There was a problem hiding this comment.
Because they can have 0 or 1 element (this was a bug to fill them before). Now we could conditionally fill them like we fill number of groups, but as filling them later is very cheap anyway I felt that setting them to nothing is OK.
There was a problem hiding this comment.
If computing the actual value here is trivial I'd do it, otherwise I agree it's cheap to compute later.
There was a problem hiding this comment.
I would leave it for later - this way code is more modular (otherwise we hardcode something here and can forget to update it if we change the default way to compute them in 5 years from now).
|
Thank you for a review.
I hope I did it right. The changes are in a mix of very old code and new code, so I tried to cover everything in tests. |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
only coverage fails |
|
I have added the test. only coverage fails as usual |
|
No problem - thank you for looking into it! |
|
Thank you! |
Fix #2322
Fix #2297
This is a major fix to split-apply-combine that introduces many internal changes and some breaking user visible changes.
What is chiefly changed:
colsfield holdsSymbolnotInt; this was not strictly needed but asselect!can mutate a parent of aGroupedDataFrameit is better to keepSymbolsto avoid invalidating theGroupedDataFrametransform!andtransformcombine(arg, ::DataFrame)when data frame passed has 0 rows which I leave for later as it is tricky to implement and would only obfuscate the code, and the use case is very limited)This is breaking so it will require a minor release to go in.