[BREAKING] Multicolumn transformations for GoupedDataFrame#2481
[BREAKING] Multicolumn transformations for GoupedDataFrame#2481bkamins merged 28 commits intoJuliaData:masterfrom
Conversation
|
OK - this PR is not cleaned, tested, nor documented yet, but the promised functionality seems (remember - not tested 😄)) to work, so please feel free to experiment with it and comment if you catch something surprising. Thank you! |
|
@pdeffebach - with this PR we ensure the following invariant (which I think is relevant for DataFramesMeta.jl design). The result of: is always the same as the result of (also up to errors - if one errors the other also errors) if |
|
Wait, why would we want that? What if I want to perform the operation |
|
Right - I forgotten to add that |
|
Oh good. You scared me! Yes having the same with |
|
I still have a lot of tests to write (but now the PR should pass the tests) and documentation to update. (all others: feel free to have a look at docs/src/man/split_apply_combine.md if you are interested as it specifies how new rules work) |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
@nalimilan - documentation updates part 2 (and hopefully final) is pushed here. I made adjustments to the manual per the comments given + I have refactored the docstrings. Now they are consistent with the manual and reuse the same template. This has three benefits I believe:
Another review of documentation would be appreciated (after it is done I will review the tests of the functionality to make sure we cover everything and the PR then will be good for a final review). |
|
I have added additional tests of correctness of the functionality we expose. This should be good for a whole code review. |
|
This PR is holding implementation of EDIT: I managed to get it without needing this, see #2496 |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
@nalimilan I have incorportated all the recommended changes apart from syncing manual and docs - I will do it when manual is finalized. It should be ready for another round of reviews. Thank you! |
| is undefined (a typical case is that they follow the order of appreance of | ||
| respecive values in the grouping columns, but a notable exception is when the | ||
| columns are `PooledVector`s, in which case they are ordered accoring to the `pool` | ||
| field in these vectors) |
There was a problem hiding this comment.
I don't think that's true: in general the order is really undefined due to the dict-like grouping fallback. And the pool of PooledDataArray isn't exposed to users so it's not really useful to tell give them this information.
EDIT: I was wrong, for some reason I hadn't realize that the fallback grouping method uses the order of appearance.
There was a problem hiding this comment.
I have updated the description to make it more precise.
src/abstractdataframe/selection.jl
Outdated
| (not all keyword arguments are supported in all cases; in general they are allowed | ||
| in situations when they are meaningful, see the documentation of the specific functions | ||
| for details): |
There was a problem hiding this comment.
But how about making this part specific to each function's docstring so that we never mention an argument that a function doesn't support?
(BTW I don't see the change to "signature".)
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
@nalimilan - I have incorporated the comments. Let me know if it is OK to move the descriptions from the manual to the docstring. Thank you! |
|
TODO: add NEWS.md entry (when @nalimilan approves moving the manual text into the docstrings). |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
All suggestions are applied, manual and dosstrings are synchronized, and NEWS.md is updated. |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
Thank you! |
Fixes #2410
I am opening now to make sure that 0.22 branch of a13db50 does not get lost.
This PR is not finished. I have implemented all (in particular allowing returning multiple columns from functions) except handling
AsTableand multipeSymbols as destination columns (this still needs to be implemented as it requires new logic)