Skip to content

fix: correctly calculate join output schema nullability#2803

Merged
xudong963 merged 1 commit intoapache:masterfrom
alamb:alamb/fix_join_nullability
Jun 28, 2022
Merged

fix: correctly calculate join output schema nullability#2803
xudong963 merged 1 commit intoapache:masterfrom
alamb:alamb/fix_join_nullability

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented Jun 27, 2022

Which issue does this PR close?

Part of #2778

Rationale for this change

apache/arrow-rs#1888 in arrow added validation to RecordBatch if the schema's declared nullability is different than its actual nullability.

This caught that the output schema calculation for joins is incorrect -- specifically, LEFT/RIGHT/FULL joins can introduce nulls even if the input schema is not nullable.

For example, given the following non-null input:

a
1
b
2

This query:

SELECT * FROM a LEFT JOIN b

Produces a null on b (though a is non nullable if a is non nullable in the input) and thus b must be marked nullable

a b
1 NULL

What changes are included in this PR?

  1. Account for NULLs introduced in joins in output schema calculation
  2. Tests

Are there any user-facing changes?

I don't think so (except more correct null schema marking)

@github-actions github-actions bot added the core Core DataFusion crate label Jun 27, 2022
@alamb alamb force-pushed the alamb/fix_join_nullability branch from 5f1aab4 to 89fa13a Compare June 27, 2022 23:46
@alamb alamb mentioned this pull request Jun 27, 2022
6 tasks
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Merging #2803 (89fa13a) into master (533e2b4) will increase coverage by 0.09%.
The diff coverage is 88.46%.

@@            Coverage Diff             @@
##           master    #2803      +/-   ##
==========================================
+ Coverage   85.11%   85.20%   +0.09%     
==========================================
  Files         273      274       +1     
  Lines       48240    48590     +350     
==========================================
+ Hits        41060    41402     +342     
- Misses       7180     7188       +8     
Impacted Files Coverage Δ
datafusion/core/src/physical_plan/join_utils.rs 93.61% <88.46%> (-3.20%) ⬇️
datafusion/expr/src/logical_plan/plan.rs 74.00% <0.00%> (-0.40%) ⬇️
datafusion/optimizer/src/reduce_outer_join.rs 99.39% <0.00%> (ø)
datafusion/core/src/execution/context.rs 78.56% <0.00%> (+0.02%) ⬆️
datafusion/core/tests/sql/joins.rs 99.31% <0.00%> (+0.20%) ⬆️
datafusion/core/src/physical_plan/metrics/value.rs 87.43% <0.00%> (+0.50%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 533e2b4...89fa13a. Read the comment docs.

Copy link
Copy Markdown
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb

@xudong963 xudong963 merged commit b73a39a into apache:master Jun 28, 2022
@alamb alamb deleted the alamb/fix_join_nullability branch June 28, 2022 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants