Skip to content

Correctly handle Tables.AbstractRow in operation specficiation#3348

Merged
bkamins merged 13 commits intomainfrom
bk/AbstractRow
Jul 2, 2023
Merged

Correctly handle Tables.AbstractRow in operation specficiation#3348
bkamins merged 13 commits intomainfrom
bk/AbstractRow

Conversation

@bkamins
Copy link
Copy Markdown
Member

@bkamins bkamins commented Jun 25, 2023

Fixes #3335

After this PR Tables.AbstractRow is treated in the same way as DataFrames.DataFrameRow in all combine/select/transform operations.

@bkamins bkamins requested a review from nalimilan June 25, 2023 14:06
@bkamins bkamins added this to the 1.6 milestone Jun 25, 2023
@bkamins
Copy link
Copy Markdown
Member Author

bkamins commented Jun 25, 2023

This is mildly breaking, but I assume it is OK to add it in 1.6 release. The point is that I assume that when someone uses Tables.AbstractRow then expansion is expected.

_insert_row_multicolumn(newdf, df, allow_resizing_newdf, colnames, res)
end

function _add_multicol_res(res::Tables.AbstractRow, newdf::DataFrame, df::AbstractDataFrame,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're at it, maybe we should make DataFrameRow <: AbstractRow? That would avoid duplicating a few methods and simplifying type unions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it.

First (for future readers) DataFrameRow fully supports Tables.AbstractRow interface, so this is just an issue of code design.

Pros of doing subtyping:

  • Less code.
  • It is more clear which functionalities are on more "abstract level".

Cons of doing subtyping:

  • Most of methods for DataFrameRow and Tables.AbstractRow are the same. However, not all of them. Some methods are different, because DataFrameRow has a richer functionality than Tables.AbstractRow. The challenge is that keeping DataFrameRow and Tables.AbstractRow separate makes it easier (at least for me) in the future to find all places in the source code where DataFrameRow is used. I know this is not a super strong reason but with the size of the code that we have I often end up doing updates of code by running "find in all files" of a certain code pattern (as otherwise it is easy to forget about some place when some functionality is used).

(for a reference: I started implementing this change and noticed that it would affect more code than only this PR and after this change)

So - we could add it, but it also has some practical downsides. What do you think, given this, we should do?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. Given that Tables.jl uses duck typing anyway it's not super important to have DataFrameRow <: AbstractRow, and indeed nobody has requested it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - duck typing is my main reason for sticking with what we have.

end
end

@testset "Tables.AbstractRow interface" begin
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should also be tested in other tests where we cover DataFrameRow?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added more such tests (where I managed to track them down).

@nalimilan
Copy link
Copy Markdown
Member

The 32-bit failure seems unrelated but real?

@bkamins bkamins merged commit 6338825 into main Jul 2, 2023
@bkamins bkamins deleted the bk/AbstractRow branch July 2, 2023 20:19
@bkamins
Copy link
Copy Markdown
Member Author

bkamins commented Jul 2, 2023

Thank you (I will fix the 32-bit error in a separate PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accepting array element in rows specificed by named tuples, in combine

2 participants