Conversation
|
Thank you for working on it. To get a safe eltype of the resulting columns using The issues you raise are exactly the things that I expected/feared (i.e. that it would be hard to find the optimal solution) |
|
Aright, doing a bit more on this.
Results are for This is basically the same as before. A couple of new observations:
With both copycols:
Both give columns of I'm not totally clear on where to add the I had some ideas with Very possible I'm missing something. |
Yes. To avoid promotion you would have to write something like So the conclusion is that In general an input from @nalimilan would be welcome on naming kwargs in this function (he is far superior than me in such things). |
Yes, by a fair amount.
This makes sense to me. Do you have a sense of whether the default for promotion should be |
Not sure - @nalimilan seemed to prefer Having said that I would go for |
|
Open questions
I wrote some docs and tests, but will wait to finalize until the above are answered. Any additional feedback appreciated |
|
I'm getting some weird segfaults trying to test locally that don't appear to be happening on CI. Is it possible that Revise is causing issues? |
It is necessary as the name of
Not sure - it should not be an issue. |
|
Great points, thanks for review! I will try to address early next week 👍 |
|
Note that in theory
I'm not sure how useful the second option could be in practice, but we could consider supporting it too, for example by allowing to pass any function to compute the type to an argument called |
So maybe let us make a |
This is what I thought you would prefer (that is why I have raised it). Let us then switch to |
Agree it's a pity, but makes sense. Main downside from my perspective is discoverability - what do we think about defining something like: |
|
I think it is a good idea. |
src/abstractdataframe/reshape.jl
Outdated
| m = permutedims(Matrix(df_notsrc)) | ||
| df_tmp = rename!(DataFrame(Tables.table(m)), df[!, src_namescol], makeunique=makeunique) | ||
| end | ||
| return hcat!(df_permuted, df_tmp, copycols=false) |
There was a problem hiding this comment.
Sounds OK. Though people may be surprised if they do e.g. permutedims(DataFrame(x=[1, 2], y=[2, 3])) and get an error about src_colnames as they didn't specify it. Could be worth having a specific error message for that method.
On a second thought maybe requiring to pass it (like the proposal Especially that it now occurred to me that in the future we might want |
You mean like Or should the column names also become a new column? |
I am not sure - this is exactly what I was thinking of and both options seemed plausible. The point is that we might change our minds what is good in the future so it is better to be conservative now (have a look at #2464 where we started discussing things that worked for years and no one complained about them). |
TBH I don't 100% follow the relevance of that PR. Are you just saying that in the PR you're working on "fixing" things that have been around for years that people didn't complain about? Regardless, I definitely agree that being conservative is the right approach. But I'm not sure what the conservative version is in this case. Should we include a Just so I'm clear, the new behavior being proposed is:
|
No, I am saying that it is hard to get consensus what should be the behaviour of some functions. |
What I am considering if we should allow |
|
Ahh, got it. That makes sense. Ok, will take another look tomorrow |
bkamins
left a comment
There was a problem hiding this comment.
Looks good except for three places in the documentation where I left comments. Thank you!
|
@nalimilan - I am OK to merge this. |
nalimilan
left a comment
There was a problem hiding this comment.
Thanks, looks good apart from a few small suggestions.
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
I will merge the PR once the CI passes. Thank you! |
|
Woo! Thank you both for thoughtful and detailed review! Really appreciate it :-) |
|
Thank you! |
Figured I'd get a jumpstart on Hacktoberfest, and this is something I've wanted in DataFrames for a longtime. This is an attempt to address #2420
todo
So far
I implemented 3 versions of
transpose:transpose1copies @tbeason 's stack/unstack method from the issue abovetranspose2creates aMatrixfrom everything except the indexing column, and doespermutedimson ittranspose3builds column-by-column by iterating rowsBenchmarking
First impressions
transpose1(stack/unstack) is slower on smallDataFrames but scales well.transpose2is the fastest, but might be harder to fix type issues (see below)transpose3is pretty fast for smallDataFrames but doesn't scale well at allOther thoughts
All 3 versions end up with
Anytyped columns, regardless of what's in the resulting column (unless all the columns are the same type). This seems like it might be more easily fixed with 1 and 3. The behavior is also not quite what I was expecting if all of the columns are numerical:Still lots to do here, but I thought it would be worth it to get the ball rolling and see if people have any thoughts. I will probably get back to this in earnest next week.