Describe the bug
Consider a snippet like this:
df.write_parquet(
"dir/data",
DataFrameWriteOptions::new().with_single_file_output(true),
None
).await
Before v43 this would write a single file called data, but in v43 this is creating data as a directory with a randomly named file(s) in it.
This seems to be related to #13079 (cc @dhegberg) that added an extension-based heuristic.
I see this as a regression, as single file output is requested explicitly, and I don't want a heuristics to be applied.
We are using Parquet files with a content-addressable file system and our files don't have extensions.
To Reproduce
See above
Expected behavior
Considering the introduction of the extension-based heuristic I would suggest the following behavior:
with_single_file_output is not called (single_file_output == None) - apply the heuristic
with_single_file_output(true) - produce a single file at the exact path specified
with_single_file_output(false) - create directory under specified path if doesn't exist and write one or many files
Additional context
Describe the bug
Consider a snippet like this:
Before v43 this would write a single file called
data, but in v43 this is creatingdataas a directory with a randomly named file(s) in it.This seems to be related to #13079 (cc @dhegberg) that added an extension-based heuristic.
I see this as a regression, as single file output is requested explicitly, and I don't want a heuristics to be applied.
We are using Parquet files with a content-addressable file system and our files don't have extensions.
To Reproduce
See above
Expected behavior
Considering the introduction of the extension-based heuristic I would suggest the following behavior:
with_single_file_outputis not called (single_file_output == None) - apply the heuristicwith_single_file_output(true)- produce a single file at the exact path specifiedwith_single_file_output(false)- create directory under specified path if doesn't exist and write one or many filesAdditional context