Skip to content

feat: improve LiteralGuarantee for the case like (a=1 AND b=1) OR (a=2 AND b=3)#16762

Merged
alamb merged 8 commits intoapache:mainfrom
haohuaijin:hj/guarantee-optimize
Jul 23, 2025
Merged

feat: improve LiteralGuarantee for the case like (a=1 AND b=1) OR (a=2 AND b=3)#16762
alamb merged 8 commits intoapache:mainfrom
haohuaijin:hj/guarantee-optimize

Conversation

@haohuaijin
Copy link
Copy Markdown
Contributor

@haohuaijin haohuaijin commented Jul 13, 2025

Which issue does this PR close?

Rationale for this change

improve LiteralGuarantee to handle the case like
(a=1 AND b=1) OR (a=2 AND b=3) or (a IN ("foo", "bar") AND b = 5) OR (a IN ("bar") AND b=6)

What changes are included in this PR?

add the logical to extract (a=1 AND b=1) OR (a=2 AND b=3) to in_guarantee("a", [1, 2]), in_guarantee("b", [1, 3]);

  1. splits each disjunction into its constituent conjunctions and filters for equality operations
  2. the find_common_columns function that identifies columns present in all termsets
  3. iterates through common columns and builds guarantees

Are these changes tested?

yes, add some test case

Are there any user-facing changes?

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Jul 13, 2025
@haohuaijin
Copy link
Copy Markdown
Contributor Author

cc @debajyoti-truefoundry @alamb

@alamb alamb changed the title feat: imporve LiteralGuarantee for the case like (a=1 AND b=1) OR (a=2 AND b=3) feat: improve LiteralGuarantee for the case like (a=1 AND b=1) OR (a=2 AND b=3) Jul 14, 2025
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haohuaijin -- this looks like a great start to me

I think we need a few more tests to show it doesn't incorrectly pick up literal guarantees for NOT IN / != terms, but otherwise I think it is good

Comment thread datafusion/physical-expr/src/utils/guarantee.rs
Comment thread datafusion/physical-expr/src/utils/guarantee.rs
@haohuaijin
Copy link
Copy Markdown
Contributor Author

haohuaijin commented Jul 15, 2025

Thanks fo you reviews @alamb , i address you comment in 89dc6be

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jul 18, 2025

I am sorry @haohuaijin -- I will review this more carefully soon. I just need to sit down and think through the details to make sure it doesn't have any correctness problems

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @haohuaijin -- I reviewed the code and tests carefully and I think this PR looks good to me.

It is a very nice improvement

Comment thread datafusion/physical-expr/src/utils/guarantee.rs
@alamb alamb added the performance Make DataFusion faster label Jul 21, 2025
@haohuaijin
Copy link
Copy Markdown
Contributor Author

Thanks for you reviews @alamb

@alamb alamb merged commit 3c95281 into apache:main Jul 23, 2025
27 checks passed
@haohuaijin haohuaijin deleted the hj/guarantee-optimize branch July 23, 2025 15:22
adriangb pushed a commit to pydantic/datafusion that referenced this pull request Jul 28, 2025
…=2 AND b=3)` (apache#16762)

* feat: imporve LiteralGuarantee for the case like (a=1 AND b=1) OR (a=2 AND b=3)

* support inlist

* fmt and clippy

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Make DataFusion faster physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bloom filters are unused for certain where clause patterns (improve LiteralGuarantee)

2 participants