Skip to content

pkg/planner: avoid wrong outer join simplification with nested IN#67764

Closed
hawkingrei wants to merge 6 commits intopingcap:masterfrom
hawkingrei:issue-67373-fix-20260414
Closed

pkg/planner: avoid wrong outer join simplification with nested IN#67764
hawkingrei wants to merge 6 commits intopingcap:masterfrom
hawkingrei:issue-67373-fix-20260414

Conversation

@hawkingrei
Copy link
Copy Markdown
Member

@hawkingrei hawkingrei commented Apr 14, 2026

What problem does this PR solve?

Issue Number: close #67373

Problem Summary:

RIGHT OUTER JOIN can be incorrectly simplified to INNER JOIN when the filter is considered
null-rejected too aggressively. For expressions with nested IN, this drops rows in repeated
derived-table UNION ALL queries, while the equivalent CTE form keeps the correct result.

What changed and how does it work?

  • Make the local outer-join null-reject check in logical_join.go conservative for predicates
    containing nested IN.
  • Add a regression case that keeps the original issue shape and verifies the derived-table query
    matches the CTE result.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fixed an issue where repeated derived tables over a RIGHT OUTER JOIN in UNION ALL could lose rows because the join was incorrectly simplified to an inner join.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed planner handling of nested IN expressions so null-rejection logic is correct.
    • Corrected EXPLAIN and execution behavior for queries with “safe nested IN” to report and return empty results when appropriate.
    • Ensured queries using CTEs and equivalent derived-table + UNION ALL yield consistent results.
  • Tests

    • Added regression tests covering the nested-IN behaviors and CTE vs. UNION ALL equivalence.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-triage-completed release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 26757bea-76b9-413c-83d7-c449ff43a363

📥 Commits

Reviewing files that changed from the base of the PR and between ac8f4c6 and 477e749.

📒 Files selected for processing (2)
  • pkg/planner/core/operator/logicalop/logical_join.go
  • pkg/planner/util/null_misc.go

📝 Walkthrough

Walkthrough

Added two regression tests for UNION ALL vs CTE behavior, updated null-rejection logic to detect/handle nested IN expressions, adjusted test BUILD deps, and added unit tests and helpers validating nested-IN null-rejection behavior.

Changes

Cohort / File(s) Summary
Regression tests
pkg/planner/core/issuetest/planner_issue_test.go
Added two regression test blocks: repeated-derived-union-all-keeps-all-rows (compares CTE vs repeated derived-table UNION ALL) and safe-nested-in-still-allows-outer-to-inner (plan assertion + empty-result check).
Planner null-rejection logic
pkg/planner/core/operator/logicalop/logical_join.go
Added containsNestedInDescendant and adjusted isNullRejected to bypass null-rejection when nested IN descendants are present.
Null-rejection helpers & tests
pkg/planner/util/null_misc.go, pkg/planner/util/column_test.go
Replaced a PlanContext type, added hasUnsafeNestedIn check to prevent treating predicates as null-rejected when inner IN is unsafe; added null-rejected-nested-in unit test and test helpers.
Build/test deps
pkg/importsdk/BUILD.bazel, pkg/planner/util/BUILD.bazel
Test-only BUILD updates: added //pkg/parser/ast to importsdk_test deps and //pkg/util/mock to util_test deps.

Sequence Diagram(s)

(Skipped — changes are localized to planner expression handling and tests; no multi-component sequential flow requiring visualization.)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • qw4990
  • guo-shaoge
  • AilinKid

Poem

🐰 I sniffed the plan where INs hid inside,

Peeked through derived tables where rows did hide,
I hopped in tests and BUILDs with a cheer,
Brought back the rows that had disappeared,
A tiny carrot dance — bug fixed, hop wide!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly summarizes the main issue being fixed: avoiding incorrect simplification of outer joins when nested IN predicates are present.
Description check ✅ Passed The pull request description adequately covers the problem, solution approach, test coverage, and release notes as required by the template.
Linked Issues check ✅ Passed The changes directly address issue #67373 by making the null-reject check conservative for nested IN predicates and adding regression tests verifying the fix.
Out of Scope Changes check ✅ Passed All changes align with the linked issue objectives: null-rejection logic refinements, helper function additions for nested IN detection, and regression test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/planner/core/operator/logicalop/logical_join.go (1)

362-377: Clarify that this helper detects descendant IN, not root IN.

The current behavior is subtle (Line 362 returns false for a top-level IN expression). A short comment here will prevent accidental misuse.

Suggested clarification
+// hasNestedIn returns true when any descendant scalar-function node is `IN`.
+// Note: a top-level `IN` root intentionally returns false.
 func hasNestedIn(expr expression.Expression) bool {
 	sf, ok := expr.(*expression.ScalarFunction)
 	if !ok {
 		return false
 	}
As per coding guidelines, "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs, and SHOULD NOT restate what the code already makes clear."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/operator/logicalop/logical_join.go` around lines 362 - 377,
hasNestedIn currently returns false for a top-level IN and only detects
descendant/child IN expressions, which is subtle and can be misused; update the
code by adding a concise comment above the hasNestedIn function clarifying that
the helper detects nested/descendant IN expressions (not root-level IN) and
documenting the invariant/intent for callers (reference function name
hasNestedIn and the check for expr.(*expression.ScalarFunction) and
child.FuncName.L == ast.In) so future readers understand its exact behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/planner/core/operator/logicalop/logical_join.go`:
- Around line 362-377: hasNestedIn currently returns false for a top-level IN
and only detects descendant/child IN expressions, which is subtle and can be
misused; update the code by adding a concise comment above the hasNestedIn
function clarifying that the helper detects nested/descendant IN expressions
(not root-level IN) and documenting the invariant/intent for callers (reference
function name hasNestedIn and the check for expr.(*expression.ScalarFunction)
and child.FuncName.L == ast.In) so future readers understand its exact behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 66bdfbc4-eabd-4969-b748-e0d2bf8a7064

📥 Commits

Reviewing files that changed from the base of the PR and between 5733f22 and b052e15.

📒 Files selected for processing (10)
  • .github/workflows/update-bazel-files.yml
  • pkg/planner/core/casetest/schema/cannot_find_column_test.go
  • pkg/planner/core/casetest/schema/testdata/cannot_find_column_suite_in.json
  • pkg/planner/core/casetest/schema/testdata/cannot_find_column_suite_out.json
  • pkg/planner/core/casetest/schema/testdata/cannot_find_column_suite_xut.json
  • pkg/planner/core/issuetest/planner_issue_test.go
  • pkg/planner/core/operator/logicalop/logical_join.go
  • pkg/planner/core/operator/logicalop/logical_top_n.go
  • pkg/planner/core/operator/logicalop/logicalop_test/BUILD.bazel
  • pkg/planner/core/operator/logicalop/logicalop_test/logical_operator_test.go

@hawkingrei hawkingrei force-pushed the issue-67373-fix-20260414 branch from b052e15 to 9dcac1b Compare April 14, 2026 10:08
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 14, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign winoros for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.1480%. Comparing base (5733f22) to head (477e749).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67764        +/-   ##
================================================
- Coverage   77.6020%   77.1480%   -0.4540%     
================================================
  Files          1981       1964        -17     
  Lines        548804     548846        +42     
================================================
- Hits         425883     423424      -2459     
- Misses       122111     125420      +3309     
+ Partials        810          2       -808     
Flag Coverage Δ
integration 40.8892% <48.5714%> (+6.5495%) ⬆️
unit 76.3372% <100.0000%> (+0.0078%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.5065% <ø> (ø)
parser ∅ <ø> (∅)
br 49.8452% <ø> (-10.5620%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hawkingrei
Copy link
Copy Markdown
Member Author

/retest

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 15, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/planner/util/null_misc.go (1)

128-146: Add a brief intent comment for hasUnsafeNestedIn logic.

This safety rule is subtle; a short comment describing why these nested IN shapes are treated as unsafe will reduce future regressions.

Proposed patch
-func hasUnsafeNestedIn(ctx base.PlanContext, schema *expression.Schema, expr expression.Expression, skipPlanCacheCheck bool) bool {
+// hasUnsafeNestedIn detects nested IN patterns that cannot be proven null-rejected
+// safely via current IN-list decomposition, so callers should conservatively treat
+// the whole predicate as not null-rejected.
+func hasUnsafeNestedIn(ctx base.PlanContext, schema *expression.Schema, expr expression.Expression, skipPlanCacheCheck bool) bool {
As per coding guidelines, "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/util/null_misc.go` around lines 128 - 146, Add a short intent
comment above the hasUnsafeNestedIn function explaining why specific nested IN
expressions are considered unsafe: note that the function detects nested
ScalarFunction IN nodes (child.FuncName.L == ast.In) that are not null-rejected
via isNullRejectedInList and therefore can change semantics with NULLs (and
affect plan caching), and that the recursion checks nested scalar functions for
this unsafe shape; reference the function name hasUnsafeNestedIn and the helper
isNullRejectedInList in the comment and briefly state the invariant the function
enforces (i.e., treat non-null-rejected nested INs as unsafe for
planning/plan-cache reuse).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/planner/util/null_misc.go`:
- Around line 128-146: Add a short intent comment above the hasUnsafeNestedIn
function explaining why specific nested IN expressions are considered unsafe:
note that the function detects nested ScalarFunction IN nodes (child.FuncName.L
== ast.In) that are not null-rejected via isNullRejectedInList and therefore can
change semantics with NULLs (and affect plan caching), and that the recursion
checks nested scalar functions for this unsafe shape; reference the function
name hasUnsafeNestedIn and the helper isNullRejectedInList in the comment and
briefly state the invariant the function enforces (i.e., treat non-null-rejected
nested INs as unsafe for planning/plan-cache reuse).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4bed4773-2e00-4138-8e23-a283f4eda6ff

📥 Commits

Reviewing files that changed from the base of the PR and between 87a2c1c and ac8f4c6.

📒 Files selected for processing (5)
  • pkg/planner/core/issuetest/planner_issue_test.go
  • pkg/planner/core/operator/logicalop/logical_join.go
  • pkg/planner/util/BUILD.bazel
  • pkg/planner/util/column_test.go
  • pkg/planner/util/null_misc.go
✅ Files skipped from review due to trivial changes (2)
  • pkg/planner/util/BUILD.bazel
  • pkg/planner/core/operator/logicalop/logical_join.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/planner/core/issuetest/planner_issue_test.go

@hawkingrei
Copy link
Copy Markdown
Member Author

/retest

@hawkingrei
Copy link
Copy Markdown
Member Author

@pantheon-bot review

@pantheon-ai
Copy link
Copy Markdown

pantheon-ai bot commented Apr 15, 2026

@hawkingrei I've received your request and will start reviewing the pull request. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@hawkingrei hawkingrei closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Query returns incorrect results (missing rows) with derived tables in UNION ALL, while CTE works correctly

1 participant