Skip to content

add ungroup keyword argument to filter#3021

Merged
bkamins merged 7 commits intomainfrom
bk/filter
Mar 30, 2022
Merged

add ungroup keyword argument to filter#3021
bkamins merged 7 commits intomainfrom
bk/filter

Conversation

@bkamins
Copy link
Copy Markdown
Member

@bkamins bkamins commented Mar 12, 2022

fixes #2954

@bkamins bkamins added this to the 1.4 milestone Mar 12, 2022
@bkamins bkamins requested a review from nalimilan March 12, 2022 19:15
@bkamins
Copy link
Copy Markdown
Member Author

bkamins commented Mar 16, 2022

@nalimilan - same here. CI passes cleanly now

Comment on lines +1139 to +1140
@inline function Base.filter(f, gdf::GroupedDataFrame; ungroup::Bool=false)
res = _filter_helper(gdf, f)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was to drop other methods and move all the handling to _filter_helper(gdf, f), so that a single common filter method is enough. Or is there anything preventing this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this, but what would be the benefit of such a change? (we could also do the same change for filter for AbstractDataFrame). There would no benefit for dispatch as in Base Julia filter already implements 9 methods for filter.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of doing what you want is that the stack trace when the error occurs would be one level deeper. For example if the user used a wrong call signature and got MethodError it would be method error in _filter_helper not in filter. Example:

julia> f(x) = g(x)
f (generic function with 1 method)

julia> g(x::Int) = 1
g (generic function with 1 method)

julia> g(x::String) = 2
g (generic function with 2 methods)

julia> f(true)
ERROR: MethodError: no method matching g(::Bool)
Closest candidates are:
  g(::Int64) at REPL[5]:1
  g(::String) at REPL[6]:1
Stacktrace:
 [1] f(x::Bool)
   @ Main .\REPL[4]:1
 [2] top-level scope
   @ REPL[7]:1

is hard to understand by regular users, while the following:

julia> h(x::Int) = 1
h (generic function with 1 method)

julia> h(x::String) = 2
h (generic function with 2 methods)

julia> h(true)
ERROR: MethodError: no method matching h(::Bool)
Closest candidates are:
  h(::Int64) at REPL[8]:1
  h(::String) at REPL[9]:1
Stacktrace:
 [1] top-level scope
   @ REPL[10]:1

is clearer as in the suggestion they can see the function they tried to call.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I just thought it would be slightly simpler and would avoid inlining too much code in the caller function. But indeed giving an error immediately matters.

In that case maybe revert the change on this line, as I don't see why only this method would call filter_helper but not others.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - removed.

@nalimilan
Copy link
Copy Markdown
Member

Do you think it's possible to test that inference works using @inferred?

@bkamins
Copy link
Copy Markdown
Member Author

bkamins commented Mar 26, 2022

Things are strange here. I have checked that _filter_helper_gdf passes @inferred, but:

julia> df = DataFrame(id=1:3);

julia> gdf = groupby(DataFrame(id=1:3), :id);

julia> @inferred filter(x -> true, gdf)
GroupedDataFrame with 3 groups based on key: id
First Group (1 row): id = 1
 Row │ id    
     │ Int64 
─────┼───────
   1 │     1
⋮
Last Group (1 row): id = 3
 Row │ id    
     │ Int64 
─────┼───────
   1 │     3

julia> @inferred filter(1 => x -> true, gdf)
ERROR: return type GroupedDataFrame{DataFrame} does not match inferred return type Any

and I do not understand the reason.

@nalimilan
Copy link
Copy Markdown
Member

I also got weird results with the levels PR. Does that also happen if you wrap the call in a function?

@bkamins
Copy link
Copy Markdown
Member Author

bkamins commented Mar 28, 2022

This is the same unfortunately:

julia> f(gdf) = filter(1 => x -> true, gdf)
f (generic function with 1 method)

julia> df = DataFrame(id=1:3);

julia> gdf = groupby(DataFrame(id=1:3), :id);

julia> @inferred f(gdf)
ERROR: return type GroupedDataFrame{DataFrame} does not match inferred return type Any

@bkamins
Copy link
Copy Markdown
Member Author

bkamins commented Mar 30, 2022

I managed to fix inference issues.
Type assertions like in https://github.com/JuliaData/DataFrames.jl/pull/3021/files#diff-59220dc318de6e96863d157770ce8450bd086b40b25bc3a9d4ca9e0099547d5fR1146 should always hold.

Hopefully if they do not hold users will report.

@bkamins bkamins merged commit b1fdb76 into main Mar 30, 2022
@bkamins bkamins deleted the bk/filter branch March 30, 2022 19:59
@bkamins
Copy link
Copy Markdown
Member Author

bkamins commented Mar 30, 2022

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Could it be useful to add the ungroup keyword to the filter function?

2 participants