require AbstractVector from subset selectors#2744
Conversation
|
I classify this PR as a bug, since I understand we agree that the current behavior is unintended and confusing. |
NEWS.md
Outdated
|
|
||
| * patch v1.0.3: make sure `subset` checks if the passed condition function | ||
| returns a vector of values | ||
| ([#XXXX](https://github.com/JuliaData/DataFrames.jl/pull/XXXX)) |
There was a problem hiding this comment.
I'm not this qualifies for a patch release as this may break existing workflows, right?
| ([#XXXX](https://github.com/JuliaData/DataFrames.jl/pull/XXXX)) | |
| ([#2744](https://github.com/JuliaData/DataFrames.jl/pull/2744)) |
There was a problem hiding this comment.
The question is how we qualify this change. Since we added this feature in 1.0 release only I would prefer to say this is a bug (i.e. not intended functionality) so we patch it. Otherwise we should deprecate it and it could be changed only in 2.0 release.
There was a problem hiding this comment.
Yeah in theory no breaking changes should be introduced until 2.0. In practice there are lots of grey areas like this one. Just like Julia allows "minor changes" across minor versions, I think we can do this kind of thing given that the risk of breakage is limited. But this PR doesn't fix any obviously incorrect behavior nor allows doing something that currently fails -- it's just an improvement to limit confusion. Better keep disruption in patch releases to the strict minimum IMO.
There was a problem hiding this comment.
So in summary do we go for 1.1.0 or 1.0.3 release with this? I understand you feel 1.1.0 would be better. Right? I am OK with this (I guess this should not cause package update issues).
There was a problem hiding this comment.
Right, I think 1.1.0 would be better. Not sure whether it's simpler to merge the PR and create a release-1.1 branch, or whether we should keep it open for a while.
There was a problem hiding this comment.
I am OK to switch to 1.1.0 but I would make the release as soon as possible. I do not want people using the functionality we want to remove.
There was a problem hiding this comment.
OK. I admit it's a bit weird to tag 1.1 so quickly, but in practice that's not a problem.
There was a problem hiding this comment.
Yes - it is weird, but it is not a problem so I would go for it (and explain it in my blog on Friday 😄).
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
src/abstractdataframe/subset.jl
Outdated
|
|
||
| Note that as opposed to [`filter`](@ref) the `subset` function works on whole | ||
| columns (or all rows in groups for `GroupedDataFrame`). | ||
| columns and always makes a subset of rows (or all rows in groups for |
There was a problem hiding this comment.
What does "makes a subset of rows" mean? Also it's not mentioned for subset!.
There was a problem hiding this comment.
it is intended to mean that selection happens row-wise always (not whole groups) - this was the original reason why this PR is made. In particular this is a crucial difference between subset and filter for GroupedDataFrame (as the latter works per group). Could you please propose a better wording to convey this message?
There was a problem hiding this comment.
OK. Then maybe this?
works on whole columns (and selects rows within groups for `GroupedDataFrame`
rather than whole groups).
And I'd mention that the result should be a vector of Boolean in the first paragraph as that's really an essential information.
src/abstractdataframe/subset.jl
Outdated
| If you want to filter rows of an `AbstractDataFrame` or groups of | ||
| `GroupedDataFrame` by a function that returns `Bool` value use [`filter!`](@ref) | ||
| or [`filter`](@ref) (note that [`filter!`](@ref) is not supported for | ||
| `GroupedDataFrame`). |
There was a problem hiding this comment.
Sync this with subset docstring.
There was a problem hiding this comment.
I will sync once we finalize subset docstring.
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
NEWS.md
Outdated
| * patch v1.0.3: make sure `subset` checks if the passed condition function | ||
| returns a vector of values |
There was a problem hiding this comment.
Update this if bumping to 1.1.
There was a problem hiding this comment.
updated and added more explanations. Fortunately the change we make will now throw an error so it will be easy to catch.
NEWS.md
Outdated
|
|
||
| * patch v1.0.3: make sure `subset` checks if the passed condition function | ||
| returns a vector of values | ||
| ([#XXXX](https://github.com/JuliaData/DataFrames.jl/pull/XXXX)) |
There was a problem hiding this comment.
OK. I admit it's a bit weird to tag 1.1 so quickly, but in practice that's not a problem.
|
Thank you! I will make a 1.1. release now |
Fixes #2740
It was easier than I thought in the end. I have not implemented the
ByGroupidea (as one can usefilterfor this), but we can discuss it here.Some other minor things I have changed: