Support threading in joined column creation#2664
Conversation
src/join/composer.jl
Outdated
| dfr_noon = joiner.dfr[right_ixs, Not(joiner.right_on)] | ||
|
|
||
| if VERSION >= v"1.4" && Threads.nthreads() > 1 && length(left_ixs) >= 1_000_000 | ||
| dfl_task = Threads.@spawn joiner.dfl[left_ixs, :] |
There was a problem hiding this comment.
since getindex already uses threading this is only to ensure we are fast in case left and right table have very different processing times.
|
|
||
| @assert col_idx == ncol(joiner.dfl_on) + 1 | ||
|
|
||
| for col in eachcol(dfl_noon) |
There was a problem hiding this comment.
I did not add threading for on-columns creation as in practice I expect that on-columns are minority of created columns and I avoid adding too much boilerplate code this way.
|
Example run. Setup: Execution on Execution on this PR: (using 8 threads in both cases) |
nalimilan
left a comment
There was a problem hiding this comment.
Cool! I just wish we could find a way to avoid putting if checks everywhere in the long term.
src/join/composer.jl
Outdated
| end | ||
| @assert col_idx == ncol(joiner.dfl) + 1 | ||
| for col in eachcol(dfr_noon) | ||
| let cols_i = col_idx |
There was a problem hiding this comment.
Why do you use let here but not above?
There was a problem hiding this comment.
You are right - let is not needed, as just writing cols_i = col_idx will ensure we get a new binding for cols_i. I will change this.
src/join/composer.jl
Outdated
| function _noon_compose_helper(cols, similar_col, cols_i, col, target_nrow, | ||
| side_ixs, offset, sideonly_ixs, tocopy) |
There was a problem hiding this comment.
Can you add types to the signature? Maybe also add a ! at the end of the function name.
When we stop supporting Julia 1.0 it will be possible to significantly simplify the codes. |
|
Thank you! |
A minor PR to speed up column creation in joins using threading.
Its major benefit is for left, right and outerjoins having many extra columns (and producing over 1_000_000 rows)