Skip to content

Update to arrow 33.0.0#5241

Merged
tustvold merged 10 commits intoapache:mainfrom
tustvold:update-arrow-33
Feb 20, 2023
Merged

Update to arrow 33.0.0#5241
tustvold merged 10 commits intoapache:mainfrom
tustvold:update-arrow-33

Conversation

@tustvold
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates sql SQL Planner labels Feb 10, 2023
];
assert_batches_eq!(expected, &result);
let err = pretty_format_batches(&result).err().unwrap().to_string();
assert_eq!(err, "Parser error: Invalid timezone \"+08:00:00\": Expected format [+-]XX:XX, [+-]XX, or [+-]XXXX");
Copy link
Copy Markdown
Contributor Author

@tustvold tustvold Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid timezones now result in an error when creating the formatter, instead of rendering the invalid timezone to each string. I'm not sure how people feel about this, I think it is better to fail loud but welcome thoughts

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok -- cc @comphead and @waitingkuo

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late reply. this looks great to me. I originally followed how pyarrow works to implement it. raising parser error makes more sense to me.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tustvold

];
assert_batches_eq!(expected, &result);
let err = pretty_format_batches(&result).err().unwrap().to_string();
assert_eq!(err, "Parser error: Invalid timezone \"+08:00:00\": Expected format [+-]XX:XX, [+-]XX, or [+-]XXXX");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok -- cc @comphead and @waitingkuo

@tustvold
Copy link
Copy Markdown
Contributor Author

tustvold commented Feb 10, 2023

   process didn't exit successfully: `/Users/runner/work/arrow-datafusion/arrow-datafusion/target/debug/deps/datafusion-ba45d330e390f3a9` (signal: 4, SIGILL: illegal instruction)

Eep, will investigate, shenanigans may be afoot. Might be a double-panic

@tustvold
Copy link
Copy Markdown
Contributor Author

Shenanigans did abound - apache/arrow-rs#3691, fix in apache/arrow-rs#3692

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 10, 2023

Shenanigans did abound - apache/arrow-rs#3691, fix in apache/arrow-rs#3692

Testing for the win!

@tustvold tustvold marked this pull request as ready for review February 14, 2023 13:39
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with these changes, but I worry about the possible effects on our downstream users (e.g. @maxburke was just discussing the upgrade pain earlier today on ASF slack).

What would you think about sending a note to the mailing lists / slack channels giving a heads up and asking for feedback? I can do the communicating if you like

"+---+-----+-----------------+",
"| a | b | COUNT(1)[count] |",
"+---+-----+-----------------+",
"| | 1.0 | 2 |",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes are going to create some serious downstream churn I suspect. I wonder if we should wait until apache/arrow-rs#3717 is released (so downstream crates can choose to have the old behavior) 🤔

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb is the issue related to column name [count]?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this, even with apache/arrow-rs#3717 floats will be formatted with a decimal point, there isn't an option currently exposed to configure this...

@alamb alamb changed the title Update arrow 33 Update to arrow 33.0.0 Feb 14, 2023
@tustvold
Copy link
Copy Markdown
Contributor Author

I can do the communicating if you like

Yes please, my 2 cents is optimising for stability is premature at this point, but welcome other input

"+--------------------+----------+",
"| 0.9294097332465232 | 1 |",
"| 0.3114712539863804 | 1 |",
"| 0.9294097332465232 | 1.0 |",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you confused about? Despite its name, this column is a floating point type and so is formatted with a decimal point?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite its name

I certainly found that confusing 😆

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you confused about? Despite its name, this column is a floating point type and so is formatted with a decimal point?

I missed the data type from the name of this column

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 15, 2023

I posted a note to the mailing list: https://lists.apache.org/thread/9y6bhdj2vgo9ll7bj72mf5gpwxkqy2b1

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 18, 2023

Unless there are any comments, I think we should merge this next week

@tustvold tustvold merged commit e3679e2 into apache:main Feb 20, 2023
@ursabot
Copy link
Copy Markdown

ursabot commented Feb 20, 2023

Benchmark runs are scheduled for baseline = c9d4eac and contender = e3679e2. e3679e2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

jiangzhx pushed a commit to jiangzhx/arrow-datafusion that referenced this pull request Feb 24, 2023
* Update arrow 33

* Fix test

* Fix avro

* Update pin

* Format

* Further fixes

* Remove pin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants