Skip to content

Commit cd7d208

Browse files
krinartalamb
authored andcommitted
Restore Sort unparser guard for correct ORDER BY placement (apache#20658)
## Which issue does this PR close? - closes apache#20905 - closes apache#20923 Restore the `already_projected()` guard in the Sort case of `select_to_sql_recursively`. Without it, queries with ORDER BY expressions generate invalid SQL when the Sort node follows an outer Projection. ## Problem ## Two regressions in the DuckDB federation unparser after upgrading to DataFusion 52: 1. clickbench q25: Queries like `SELECT "SearchPhrase" ... ORDER BY to_timestamp("EventTime") LIMIT 10` produce ORDER BY outside the subquery, referencing table names ("hits") that are out of scope. 2. tpcds q36: ORDER BY with `GROUPING()` expressions loses the `derived_sort` subquery alias, placing sort references outside their valid scope. ## Root Cause ## DF52's optimizer merges `Limit` into `Sort` as a fetch parameter, changing the plan shape from `Projection → Limit → Sort → ...` to `Projection → Sort(fetch) → ....` The Sort case previously had a guard that wrapped the sort into a `derived_sort` subquery when `already_projected()` was true. This guard was removed in DF52, causing ORDER BY and LIMIT to be placed on the outer query where inner table references are invalid. ## Fix ## Re-add the guard at the top of the Sort match arm in select_to_sql_recursively: ```rust if select.already_projected() { return self.derive_with_dialect_alias("derived_sort", plan, relation, false, vec![]); } ```
1 parent 1563ef8 commit cd7d208

3 files changed

Lines changed: 56 additions & 1 deletion

File tree

datafusion/sql/src/unparser/plan.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -499,6 +499,17 @@ impl Unparser<'_> {
499499
)
500500
}
501501
LogicalPlan::Sort(sort) => {
502+
// Sort can be top-level plan for derived table
503+
if select.already_projected() {
504+
return self.derive_with_dialect_alias(
505+
"derived_sort",
506+
plan,
507+
relation,
508+
false,
509+
vec![],
510+
);
511+
}
512+
502513
let Some(query_ref) = query else {
503514
return internal_err!(
504515
"Sort operator only valid in a statement context."

datafusion/sql/src/unparser/rewrite.rs

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,15 @@ pub(super) fn rewrite_plan_for_sort_on_non_projected_fields(
223223

224224
let mut collects = p.expr.clone();
225225
for sort in &sort.expr {
226-
collects.push(sort.expr.clone());
226+
// Strip aliases from sort expressions so the comparison matches
227+
// the inner Projection's raw expressions. The optimizer may add
228+
// sort expressions to the inner Projection without aliases, while
229+
// the Sort node's expressions carry aliases from the original plan.
230+
let mut expr = sort.expr.clone();
231+
while let Expr::Alias(alias) = expr {
232+
expr = *alias.expr;
233+
}
234+
collects.push(expr);
227235
}
228236

229237
// Compare outer collects Expr::to_string with inner collected transformed values

datafusion/sql/tests/cases/plan_to_sql.rs

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1740,6 +1740,42 @@ fn test_sort_with_push_down_fetch() -> Result<()> {
17401740
Ok(())
17411741
}
17421742

1743+
#[test]
1744+
fn test_sort_with_scalar_fn_and_push_down_fetch() -> Result<()> {
1745+
let schema = Schema::new(vec![
1746+
Field::new("search_phrase", DataType::Utf8, false),
1747+
Field::new("event_time", DataType::Utf8, false),
1748+
]);
1749+
1750+
let substr_udf = unicode::substr();
1751+
1752+
// Build a plan that mimics the DF52 optimizer output:
1753+
// Projection(search_phrase) → Sort(substr(event_time), fetch=10)
1754+
// → Projection(search_phrase, event_time) → Filter → TableScan
1755+
// This triggers a subquery because the outer projection differs from the inner one.
1756+
// The ORDER BY scalar function must not reference the inner table qualifier.
1757+
let plan = table_scan(Some("t1"), &schema, None)?
1758+
.filter(col("search_phrase").not_eq(lit("")))?
1759+
.project(vec![col("search_phrase"), col("event_time")])?
1760+
.sort_with_limit(
1761+
vec![
1762+
substr_udf
1763+
.call(vec![col("event_time"), lit(1), lit(5)])
1764+
.sort(true, true),
1765+
],
1766+
Some(10),
1767+
)?
1768+
.project(vec![col("search_phrase")])?
1769+
.build()?;
1770+
1771+
let sql = plan_to_sql(&plan)?;
1772+
assert_snapshot!(
1773+
sql,
1774+
@"SELECT t1.search_phrase FROM (SELECT t1.search_phrase, t1.event_time FROM t1 WHERE (t1.search_phrase <> '') ORDER BY substr(t1.event_time, 1, 5) ASC NULLS FIRST LIMIT 10)"
1775+
);
1776+
Ok(())
1777+
}
1778+
17431779
#[test]
17441780
fn test_join_with_table_scan_filters() -> Result<()> {
17451781
let schema_left = Schema::new(vec![

0 commit comments

Comments
 (0)