add vcat with source; deprecate indicator in joins in favor of source#2649
add vcat with source; deprecate indicator in joins in favor of source#2649
Conversation
|
CI error is unrelated and is fixed in #2648 |
|
Should this be called |
Co-authored-by: Eric Hanson <[email protected]>
I am OK with |
|
Consistency with joins sounds like a good reason to use |
| dfs′ = Vector{AbstractDataFrame}(undef, len) | ||
| for (i, (v, df)) in enumerate(zip(vals, dfs)) | ||
| dfs′[i] = insertcols!(copy(df, copycols=false), 1, col => Ref(v)) | ||
| end |
There was a problem hiding this comment.
Probably not a big deal, but wouldn't it be more efficient (and not really more complex) to create the column only after concatenating?
There was a problem hiding this comment.
This is a design decision related with an empty data frame. As you can see in tests and examples, if you pass DataFrame() to vcat now a single row is created with all missing values except indicator column. Otherwise we would drop such a data frame.
But maybe dropping it is preferable. What do you think?
There was a problem hiding this comment.
I have thought about it and concluded that it is better to drop it. I will change the implementation.
|
Regarding the API, have you considered something like |
|
I analyzed it before making the PR. We could allow: but I am hesitant to add it as currently: (and this makes sense and is expected) So adding a kwarg would significantly affect the produced result, which I think is not desirable. |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Not doing this in
Which would all have to be produced, which I thought would be more confusing than clarifying. Similarly for two columns passed we have three values (left, right, and both) so it is not the same. But of course the function of the kwarg is similar (clearly indicate the source data frame) so I would not oppose to syncing it between Given these considerations what do you think. Should we make them consistent? |
|
and I will change the kwarg do |
nalimilan
left a comment
There was a problem hiding this comment.
Looks good! I kind of wish we would have used source instead of indicator for joins, as that sounds more explicit, but well...
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Same with me. Maybe we can:
|
|
I have no recollection. |
|
Sorry for pinging then. Do you have an opinion though 😄? |
|
I also don't recall this discussion. I'm also in favour of |
|
My only comment is that |
OK. I will change it. (it is the place where it goes in joins) |
|
OK - I have traced the original issue here #1412. Thank you all for responding. |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
| Pair{<:SymbolOrString, <:AbstractVector}}=nothing) = | ||
| reduce(vcat, dfs; cols=cols, source=source) | ||
|
|
||
| """ |
There was a problem hiding this comment.
@nalimilan - I have added a docstring for reduce to make this option more discoverable. Could you please have a look at it before I merge? Thank you!
Co-authored-by: Milan Bouchet-Valat <[email protected]>
|
Thank you! |
Fixes #659
I was prompted by https://discourse.julialang.org/t/would-it-help-to-have-a-tool-that-automatically-determines-which-issues-should-be-closed/56274/16 :).
The PR is relatively simple so we can add it in 1.0 release I think.