Skip to content

Respect copycols when building DataFrame from Tables.CopiedColumns#2656

Merged
quinnj merged 2 commits intomainfrom
jq/copiedcolumns
Mar 12, 2021
Merged

Respect copycols when building DataFrame from Tables.CopiedColumns#2656
quinnj merged 2 commits intomainfrom
jq/copiedcolumns

Conversation

@quinnj
Copy link
Copy Markdown
Member

@quinnj quinnj commented Mar 12, 2021

Fixes apache/arrow-julia#146. The issue here
was we were just ignoring copycols entirely when building DataFrame
from Tables.CopiedColumns. Which is mostly fine because
CopiedColumns implies DataFrames can assume ownership of the columns.
The wrinkle comes in when you consider cases like Arrow.Table, which
wraps columns in Tables.CopiedColumns, not necessarily because copies
have been made, but because the data is immutable, so it should be safe
for other tables to assume ownership. The problem is when users want to
mutate those columns, they need to make copies of them, so naturally
users try DataFrame(::Arrow.Table; copycols=true), but then that
doesn't actually make copies.

The proposal here just intercepts the top-level
DataFrame(::CopiedColumns) constructor and passes copycols=false as
the default, so if users pass copycols=true, it will still be
respected.

Fixes apache/arrow-julia#146. The issue here
was we were just ignoring `copycols` entirely when building DataFrame
from `Tables.CopiedColumns`. Which is mostly fine because
`CopiedColumns` implies DataFrames can assume ownership of the columns.
The wrinkle comes in when you consider cases like `Arrow.Table`, which
wraps columns in `Tables.CopiedColumns`, not necessarily because copies
have been made, but because the data is immutable, so it should be safe
for other tables to assume ownership. The problem is when users want to
_mutate_ those columns, they need to make copies of them, so naturally
users try `DataFrame(::Arrow.Table; copycols=true)`, but then that
doesn't actually make copies.

The proposal here just intercepts the top-level
`DataFrame(::CopiedColumns)` constructor and passes `copycols=false` as
the default, so if users pass `copycols=true`, it will still be
respected.
@bkamins bkamins added the ecosystem Issues in DataFrames.jl ecosystem label Mar 12, 2021
@bkamins bkamins added this to the 1.0 milestone Mar 12, 2021
@quinnj quinnj merged commit 140a3e0 into main Mar 12, 2021
@quinnj quinnj deleted the jq/copiedcolumns branch March 12, 2021 17:35
bkamins added a commit that referenced this pull request Mar 31, 2021
…2656)

* Respect copycols when building DataFrame from Tables.CopiedColumns

Fixes apache/arrow-julia#146. The issue here
was we were just ignoring `copycols` entirely when building DataFrame
from `Tables.CopiedColumns`. Which is mostly fine because
`CopiedColumns` implies DataFrames can assume ownership of the columns.
The wrinkle comes in when you consider cases like `Arrow.Table`, which
wraps columns in `Tables.CopiedColumns`, not necessarily because copies
have been made, but because the data is immutable, so it should be safe
for other tables to assume ownership. The problem is when users want to
_mutate_ those columns, they need to make copies of them, so naturally
users try `DataFrame(::Arrow.Table; copycols=true)`, but then that
doesn't actually make copies.

The proposal here just intercepts the top-level
`DataFrame(::CopiedColumns)` constructor and passes `copycols=false` as
the default, so if users pass `copycols=true`, it will still be
respected.

* Update src/other/tables.jl

Co-authored-by: Bogumił Kamiński <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecosystem Issues in DataFrames.jl ecosystem

Projects

None yet

Development

Successfully merging this pull request may close these issues.

copying Arrow.Table does not always copy columns

2 participants