Skip to content

minor: inconsistent group by position planning#10679

Merged
alamb merged 3 commits intoapache:mainfrom
korowa:resolve-gby-col-idx
May 28, 2024
Merged

minor: inconsistent group by position planning#10679
alamb merged 3 commits intoapache:mainfrom
korowa:resolve-gby-col-idx

Conversation

@korowa
Copy link
Copy Markdown
Contributor

@korowa korowa commented May 26, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

While #10591 found that planning for INT literals in GROUP BY clause is inconsistent. For example:

EXPLAIN SELECT "URL", COUNT(1) from hits GROUP BY 1, 3, 4, 5

produces following logical plan (this is logical plan before any optimizations):

Explain
  Projection: hits.URL, COUNT(Int64(1))
    Aggregate: groupBy=[[hits.URL, Int64(3), Int64(4), Int64(5)]], aggr=[[COUNT(Int64(1))]]
      TableScan: hits

1 resolved as column from select clause, and 3, 4, 5 remain as constants. Don't think this behavior is intended + there are couple of tests in sql_integration.rs (modified below), for similar cases, expect queries to fail.

The suggestion is to fail while planning phase for such cases, and return error, specifying first index which can not be resolved as select column.

What changes are included in this PR?

Resolution for positions (integer literals in GROUP BY clause) now returns error on first unresolved position.

SIgnature for resolve_positions_to_exprs changed to avoid unnecessary cloning. resolve_aliases_to_exprs signature has similar changes, for the same reasons (it's been changed just for consistency with resolve_positions_... function).

Are these changes tested?

Added sqllogictests + some test coverage for resolve_positions_to_exprs

Are there any user-facing changes?

Users might experience query planning failures for GROUP BY %idx% where idx is out of bounds of SELECT expressions list (which seems to be OK from SQL semantics point of view -- PG/MySQL handle this case in the same way).

@github-actions github-actions bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels May 26, 2024
@korowa korowa changed the title fix: inconsistent group by position planning minor: inconsistent group by position planning May 26, 2024
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @korowa -- this looks like a nice usability improvement to me.

I had a suggestion for improved message but I don't think it is needed

cc @jonahgao who I think is an expert in this area

Comment thread datafusion/sql/src/utils.rs Outdated
Copy link
Copy Markdown
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

I have verified that the behavior of this ordinal reference aligns with PostgreSQL.
The work to eliminate cloning is also great.

korowa and others added 2 commits May 27, 2024 20:21
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb alamb merged commit 3dc1773 into apache:main May 28, 2024
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 28, 2024

Thanks again @korowa and @jonahgao

findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* fix: inconsistent group by position planning

* Update datafusion/sql/src/utils.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* improved error message

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants