Add examples for issorted docstrings.#2941
Add examples for issorted docstrings.#2941bkamins merged 17 commits intoJuliaData:mainfrom Chandu-4444:ImproveDocs
issorted docstrings.#2941Conversation
src/abstractdataframe/sort.jl
Outdated
| lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) | ||
|
|
||
| Test whether data frame `df` sorted by column(s) `cols`. | ||
| If no column is specified, it checks if the `df` is sorted lexicographically. |
There was a problem hiding this comment.
This is an important comment. Can you please add it also to sort and sort!. Also can you make it more precise that "no column is specified" means "column selector selects no columns".
@nalimilan - this rule is non-intuitive, but changing it would be breaking. Should we deprecate this behavior for 2.0 release (it would be a separate PR).
There was a problem hiding this comment.
@Chandu-4444 - if what I write is not clear please comment and I will give you examples.
There was a problem hiding this comment.
Can you please add it also to sort and sort!. Also can you make it more precise that "no column is specified" means "column selector selects no columns".
I get this and will do the same.
@nalimilan - this rule is non-intuitive, but changing it would be breaking. Should we deprecate this behavior for 2.0 release (it would be a separate PR).
This, I didn't understand (And I'm really curious). Can you please elaborate?
There was a problem hiding this comment.
Can you please elaborate?
sort and related functions were developed in very early days of DataFrames.jl when there was no precise rules for the whole ecosystem. Therefore some inconsistencies slipped through.
A proper definition of sort should be (I am skipping kwargs):
Base.sort(df::AbstractDataFrame, cols=All())
however it is
Base.sort(df::AbstractDataFrame, cols=[])
and as a consequence if you pass any column selector that selects no columns the data frame is sorted lexicographically.
While technically it should happen if you select All() columns. If you select no columns a proper behavior should be that the data frame is not reordered (it is left as is as you have not requested on what the data frame should be sorted).
We have two options:
- leave things as is forever (probably there is not much harm with that as it is unlikely that someone does sorting and wants the data frame to be left untouched)
- make a deprecation of how selecting no columns is handled and in 2.0 release change it the way I describe above.
There was a problem hiding this comment.
I get it now, thanks for that excellent explanation. Perhaps this column selector can be made a required argument (Like the one Python has in Pandas)?
There was a problem hiding this comment.
Perhaps this column selector can be made a required argument
This would be even more breaking. By proposing the change to:
Base.sort(df::AbstractDataFrame, cols=All())
we are not breaking calls like sort(df) which are quite likely to exist in current user code.
We try to limit breaking changes. Calls like sort(df, []) are quite unlikely, so we could deprecate them and change their behavior in 2.0.
Let us wait for an opinion from @nalimilan.
There was a problem hiding this comment.
That's indeed unfortunate. I'm not sure we should make a breaking change though given that it's not super likely to confuse people (if you call sort with no columns, it's most likely a mistake anyway and you'll get unexpected results), so I'd opt for a deprecation.
There was a problem hiding this comment.
OK - let us then document that passing no columns is an undefined behavior.
Co-authored-by: Bogumił Kamiński <[email protected]>
when column selector is not specified.
src/abstractdataframe/sort.jl
Outdated
|
|
||
| Return a permutation vector of row indices of data frame `df` that puts them in | ||
| sorted order according to column(s) `cols`. | ||
| If `cols` were not specified, it returns row indices that sorts `df` lexicographically. |
There was a problem hiding this comment.
Rather than repeating this here (which would have to be put in all sorting functions), how about putting cols=All() in the signature above? We could also change the actual implementation to make it clearer.
There was a problem hiding this comment.
This is what I want to do. I will make these changes to this PR and then let @Chandu-4444 to sync the docs following your suggestions and writing that passing no columns leads to an undefined row order.
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
|
@Chandu-4444 - I have updated the method signatures and added NEWS.md entry following our discussion. Now can you please sync the documentation accordingly? Thank you. I am marking it as 1.3 release PR (we wait for Julia 1.7 anyway and this should be quick to merge) |
Changing this to the line below would be fine I guess (In all the docstrings)?:
|
|
Rather than saying "undefined", wouldn't it be better to say what happens now and that this will change in the future? That's less scary and more useful. |
Yeah! Like this? :
|
|
Well the exact wording will depend on the function (cf. the suggestions I made above). So I'd just add "(this behavior is deprecated and will change in future versions)" to each of them. |
functions's docstrings.
|
I have reviewed the PR pushing appropriate changes (in particular introducing deprecation warning) and making them consistent across functions. Can you please have a look if all is OK? Thank you! |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
|
Thank you! |
I have added a few examples in
issorteddocstrings. And mentioned about lexicographic sorting atissortedandsortperm.