Skip to content

Commit 79e9f9d

Browse files
EeshanBembialamb
authored andcommitted
fix: SanityCheckPlan error with window functions and NVL filter (apache#20231)
## Which issue does this PR close? Closes apache#20194 ## Rationale for this change A query with `ROW_NUMBER() OVER (... ORDER BY CASE WHEN col='0' THEN 1 ELSE 0 END)` combined with a filter `nvl(t2.value_2_3,'0')='0'` fails with a `SanityCheckPlan` error. This worked in 50.3.0 but broke in 52.1.0. ## What changes are included in this PR? **Root cause**: `collect_columns_from_predicate_inner` was extracting equality pairs where neither side was a `Column` (e.g. `nvl(col, '0') = '0'`), creating equivalence classes between complex expressions and literals. `normalize_expr`'s deep traversal would then replace the literal `'0'` inside unrelated sort/window CASE WHEN expressions with the complex NVL expression, corrupting the sort ordering and causing a mismatch between `SortExec`'s reported output ordering and `BoundedWindowAggExec`'s expected ordering. **Fix** (two changes in `filter.rs`): 1. **`collect_columns_from_predicate_inner`**: Only extract equality pairs where at least one side is a `Column` reference. This matches the function's documented intent ("Column-Pairs") and prevents complex-expression-to-literal equivalence classes from being created. 2. **`extend_constants`**: Recognize `Literal` expressions as inherently constant (previously only checked `is_expr_constant` on the input's equivalence properties, which doesn't know about literals). This ensures constant propagation still works for `complex_expr = literal` predicates — e.g. `nvl(col, '0')` is properly marked as constant after the filter. ## How was this tested? - Unit test `test_collect_columns_skips_non_column_pairs` verifying the filtering logic - Sqllogictest reproducing the exact query from the issue - Full test suites: equivalence tests (51 passed), physical-plan tests (1255 passed), physical-optimizer tests (20 passed) - Manual verification with datafusion-cli running the reproduction query ## Test plan - [x] Unit test for `collect_columns_from_predicate_inner` column filtering - [x] Sqllogictest regression test for apache#20194 - [x] Existing test suites pass - [x] Manual reproduction query succeeds --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent 9dd1def commit 79e9f9d

1 file changed

Lines changed: 42 additions & 0 deletions

File tree

datafusion/physical-plan/src/filter.rs

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1978,6 +1978,48 @@ mod tests {
19781978
Ok(())
19791979
}
19801980

1981+
/// Regression test for https://github.com/apache/datafusion/issues/20194
1982+
///
1983+
/// `collect_columns_from_predicate_inner` should only extract equality
1984+
/// pairs where at least one side is a Column. Pairs like
1985+
/// `complex_expr = literal` must not create equivalence classes because
1986+
/// `normalize_expr`'s deep traversal would replace the literal inside
1987+
/// unrelated expressions (e.g. sort keys) with the complex expression.
1988+
#[test]
1989+
fn test_collect_columns_skips_non_column_pairs() -> Result<()> {
1990+
let schema = test::aggr_test_schema();
1991+
1992+
// Simulate: nvl(c2, 0) = 0 → (c2 IS DISTINCT FROM 0) = 0
1993+
// Neither side is a Column, so this should NOT be extracted.
1994+
let complex_expr: Arc<dyn PhysicalExpr> = binary(
1995+
col("c2", &schema)?,
1996+
Operator::IsDistinctFrom,
1997+
lit(0u32),
1998+
&schema,
1999+
)?;
2000+
let predicate: Arc<dyn PhysicalExpr> =
2001+
binary(complex_expr, Operator::Eq, lit(0u32), &schema)?;
2002+
2003+
let (equal_pairs, _) = collect_columns_from_predicate_inner(&predicate);
2004+
assert_eq!(
2005+
0,
2006+
equal_pairs.len(),
2007+
"Should not extract equality pairs where neither side is a Column"
2008+
);
2009+
2010+
// But col = literal should still be extracted
2011+
let predicate: Arc<dyn PhysicalExpr> =
2012+
binary(col("c2", &schema)?, Operator::Eq, lit(0u32), &schema)?;
2013+
let (equal_pairs, _) = collect_columns_from_predicate_inner(&predicate);
2014+
assert_eq!(
2015+
1,
2016+
equal_pairs.len(),
2017+
"Should extract equality pairs where one side is a Column"
2018+
);
2019+
2020+
Ok(())
2021+
}
2022+
19812023
/// Columns with Absent min/max statistics should remain Absent after
19822024
/// FilterExec.
19832025
#[tokio::test]

0 commit comments

Comments
 (0)