add firstindex check and fix a bug in mapcols!#2594
Conversation
|
@nalimilan - are you OK with these proposed changes. If yes I would finalize the PR |
|
@nalimilan - the issue came up again in #2604. Are you OK to finalize this PR in the form I have proposed? |
|
@nalimilan - this should be good to have a look at (especially corner cases in the tests). |
|
I had to split the tests into a separate file as OffsetArrays.jl does not run on Julia 1.0 (due to its dependency). |
|
CI and coverage passes cleanly now. |
| if len_min != len_max | ||
| throw(DimensionMismatch("lengths of returned vectors must be identical")) | ||
| end | ||
| _columns(df) .= vs |
There was a problem hiding this comment.
What was the problem with this syntax? Compilation overhead?
There was a problem hiding this comment.
yes - I try to remove broadcasting in the internal code to reduce latency a bit (also I thought that the syntax might seem too magic for casual reader).
There was a problem hiding this comment.
The difference is:
current release:
julia> using DataFrames
julia> df = DataFrame(a=1)
1×1 DataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
julia> @time mapcols!(length, df)
0.003640 seconds (2.75 k allocations: 146.932 KiB)
1×1 DataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
this PR:
julia> using DataFrames
julia> df = DataFrame(a=1)
1×1 DataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
julia> @time mapcols!(length, df)
0.000015 seconds (4 allocations: 256 bytes)
1×1 DataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
so it is not a big deal (but we have to iterate vs anyway to check firstindex).
There was a problem hiding this comment.
Wow. I wouldn't have expected such a large difference in the number of allocations.
There was a problem hiding this comment.
broadcasting is heavy (when doing the PRs that Tim Holy prompted I will review the whole codebase against using broadcasting). Of course here the difference is one-time and not that big.
|
@bkamins I just learned about the function It is defined here: https://github.com/JuliaLang/julia/blob/ae53238c45a0cd6dafc6e121f4daaa93143bf627/base/abstractarray.jl#L103 |
|
@yurivish - thank you for looking into this. I preferred custom checks for the following reasons (although the function seems to be a part of "official" API even if it is not exported):
|
|
Mmm, is the CI failure related to recent changes in Tables? |
|
Yes, I will rebase and merge after CI passes. |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
eaf2e99 to
5281272
Compare
|
Thank you! |
We currently assume that vectors are 1-based in DataFrames.jl. This PR makes sure we check for that. Not checking this will lead to bugs. Also I found some small bug in
mapcols!in the process related toAbstractRangehandling.I think we should decide on this before 1.0 release (it is better to be strict I think now; we can be more flexible later).
I will add tests, NEWS.md and update documentation with this PR after we agree that we want to add these changes.