Performance of transform! on SubDataFrame#3070
Conversation
nalimilan
left a comment
There was a problem hiding this comment.
These issues are tricky as usual...
Is there a performance impact for transform when the number of columns is large? In theory, couldn't the check be restricted to columns that have been used as inputs of transformations?
src/subdataframe/subdataframe.jl
Outdated
| function _replace_columns!(sdf::SubDataFrame, newdf::DataFrame) | ||
| colsmatch = _names(sdf) == _names(newdf) | ||
|
|
||
| function _replace_columns!(sdf::SubDataFrame, newdf::DataFrame, wastransform::Bool) |
There was a problem hiding this comment.
Maybe a name like keep_present would be easier to follow (i.e. say what it does rather than when it's set)? Could also make it a keyword argument, as it's hard to guess what it means when seeing just a Bool value.
src/dataframe/dataframe.jl
Outdated
| function _replace_columns!(df::DataFrame, newdf::DataFrame) | ||
| # here we do not support wastransform argument like for SubDataFrame | ||
| # because by default in DataFrame case columns are not copied | ||
| # so we need to pass `:` to select! to handle this case correctly |
There was a problem hiding this comment.
I don't get what select! call this refers to.
There was a problem hiding this comment.
_replace_columns! is used by transform! and select!. For DataFrame objects transform! falls back to select! by passing extra : (as opposed to what we now do in GroupedDataFrame case).
I might drop this comment if you feel it is not useful. This is exactly related to the fact that select for DataFrame uses a less robust de-aliasing approach (checking input-output mapping) than the approach I proposed for GroupedDataFrame with IdDict.
There was a problem hiding this comment.
I have simplified the comment
It is negligible, for 10,000 columns it is around 500μs:
It could, and this is what we do for |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
Thank you! |
Fixes #3069.
What this PR addresses.
and before the PR:
select[!]andtransform[!]onGroupedDataFrameAfter the PR:
and before the PR: