Skip to content

fix dml logical plan output schema#10394

Merged
alamb merged 3 commits intoapache:mainfrom
leoyvens:dml-output-schema
May 7, 2024
Merged

fix dml logical plan output schema#10394
alamb merged 3 commits intoapache:mainfrom
leoyvens:dml-output-schema

Conversation

@leoyvens
Copy link
Copy Markdown
Contributor

@leoyvens leoyvens commented May 6, 2024

Previously, LogicalPlan::schema would return the input schema for DML plans, rather than the expected output schema. It is typical for the output to be the count of rows affected by the DML statement, so the code assumes that.

See fn dml_output_schema for a test.

Which issue does this PR close?

Closes #10393.

Rationale for this change

Current behaviour is wrong.

Are these changes tested?

Yes there is a test fn dml_output_schema.

Are there any user-facing changes?

The bug being fixed is visible to users of the DataFrame API.

Previously, `LogicalPlan::schema` would return the
input schema for Dml plans, rather than the expected
output schema. This is an unusal case since Dmls are
typically not run for their output, but it is typical
for the output to be the `count` of rows affected by
the DML statement.

See `fn dml_output_schema` for a test.
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate labels May 6, 2024
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leoyvens for your contribution, please add more details to the issue description

ctx.sql_with_options(sql, options).await.unwrap();
}

#[tokio::test]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its better to have this test in one of .slt files which runs end to end tests

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this might not be caught in a .slt test as this issue is not visible in the pretty-printed logical plan, and doesn't affect the physical plan or execution at all, afaict.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I see some slt tests failing on CI, I should look into that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiously, this was in fact tested in many sqllogictests. Because the tests do in fact cover checking that the physical output schema matches the logical output schema, which is pretty cool.

Let me know if you want to keep this Rust test as well, or if you rather that I remove it. It does test more behaviour than what the slts can test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping the rust level test is a good idea as it makes explicit the expected output schema

Comment thread datafusion/expr/src/logical_plan/dml.rs
@comphead
Copy link
Copy Markdown
Contributor

comphead commented May 6, 2024

I checked current behavior

> CREATE TABLE test (x int)
;
0 row(s) fetched. 
Elapsed 0.017 seconds.

> INSERT INTO test VALUES (1);
+-------+
| count |
+-------+
| 1     |
+-------+
1 row(s) fetched. 
Elapsed 0.015 seconds.

> INSERT INTO test VALUES (3);
+-------+
| count |
+-------+
| 1     |
+-------+
1 row(s) fetched. 
Elapsed 0.005 seconds.

I'm not even sure why we output anything after DDL/DML... I'd probably prefer no output like in duckdb

D create table x (id int);
D insert into x values(1);

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label May 6, 2024
@leoyvens
Copy link
Copy Markdown
Contributor Author

leoyvens commented May 6, 2024

@comphead Thank you for your review, I have addressed the outstanding comments. I have no opinion on desired behaviour, this PR is just trying to make things consistent.


# insert NULL values for the missing column (name)
query IT
query I
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it fail with old value?

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @leoyvens and @comphead -- this is a very nice contribution 🏆

ctx.sql_with_options(sql, options).await.unwrap();
}

#[tokio::test]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping the rust level test is a good idea as it makes explicit the expected output schema

CREATE TABLE strings(g INTEGER, x VARCHAR, y VARCHAR)

query ITT
query I
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 161d0f2 into apache:main May 7, 2024
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* fix dml logical plan output schema

Previously, `LogicalPlan::schema` would return the
input schema for Dml plans, rather than the expected
output schema. This is an unusal case since Dmls are
typically not run for their output, but it is typical
for the output to be the `count` of rows affected by
the DML statement.

See `fn dml_output_schema` for a test.

* document DmlStatement::new

* Fix expected logical schema of 'insert into' in sqllogictests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

For DML plans, LogicalPlan::schema returns the input schema instead of output schema

3 participants