allow transformation destination to be a function#2897
Conversation
|
@nalimilan - CI passes |
nalimilan
left a comment
There was a problem hiding this comment.
Looks good. I just wonder whether we should also pass the function, as it could be useful to generate a column name e.g. in a loop over multiple functions and/or inputs. Though that would make the API less convenient. What kind of use case do we expect?
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Could you please give an example or explain it a bit more as I am not sure what you mean here.
You mean use cases of the change I have implemented? The most common will be |
I mean something like: julia> combine(df, [:x, :y, :z] .=> [mean median] =.> string)
4×6 DataFrame
Row │ meanx meany meanz medianx mediany medianz
│ Float64 Float64 Float64 Float64 Float64 Float64
─────┼───────────────────────────────────────────────────────────
1 │ 0.483742 0.4106 0.287058 0.47259 0.32765 0.247748Given that it's almost the same as the default names maybe that's not useful. It could be useful when there are many input variable names but you would like to use only one of them in the output rather than the "etc" default. That's not super likely but i thought I'd mention it just to be sure. |
Yes.
So that's |
Ah - now I understand. I think it would complicate the API too much.
Yes - Recently @jtrakk asked for this in #2893 so maybe we can get some more comment when it would be useful. |
nalimilan
left a comment
There was a problem hiding this comment.
Looks good, but let's wait a bit as it's always interesting to check concrete use cases before merging.
|
My case was like this: transform(df, [:X,:Y,:Z] .=> (v->v.-mean(v)) => (s->"$(s)demeaned"))The case of "for each variable, for each function" combine(df, [:x, :y, :z] .=> [mean median] .=> string)is an interesting one and does suggest taking the function as well, but I agree it would complicate the API. Perhaps there could be a wrapper combine(df, [:x, :y, :z] .=> [mean median] .=> Rename((s,f)->"$s$f"))and passing only a function like |
|
@jtrakk - thank you very much for the valuable feedback. |
|
Thank you! Essentially as @jtrakk shows the use case is when the function name does not describe well the operation you perform and you prefer to give an explicit name for the transformation. This is especially useful when transformations are anonymous functions. |
Fixes #2876
The crucial API decision wis what function should take. I assume it is passed a string or a vector of strings.
The vector is passed if multicolumn selector is used (and
AsTableis always considered to be multicolumn selector in particular)