Is your feature request related to a problem or challenge?
CSV writers usually supports configuration of quote style/mode with the following options:
Always
Necessary
Never
NonNumeric
Sometimes this just need to be controlled, and for now only way to change that is to re-iterate through result file(s) in order to store the content with desired quote style.
You can find such configs in many libraries:
Describe the solution you'd like
Just expose a way to pass the QuoteStyle enum along with other properties like quote, delimiter and etc (as part of CsvOptions). However, need to keep in mind that the configuration only makes sense for writers, not readers.
That shouldn't be an issue to support, because datafusion relies on arrow-csv which uses csv crate under the hood.
- requires to update
arrow-csv to accept quote-style param (sub-issue for arrow-rs?)
- update
datafusion
- add parameter to
CsvOptions:
|
pub struct CsvOptions { |
|
/// Specifies whether there is a CSV header (i.e. the first line |
|
/// consists of is column names). The value `None` indicates that |
|
/// the configuration should be consulted. |
|
pub has_header: Option<bool>, default = None |
|
pub delimiter: u8, default = b',' |
|
pub quote: u8, default = b'"' |
|
pub escape: Option<u8>, default = None |
|
pub compression: CompressionTypeVariant, default = CompressionTypeVariant::UNCOMPRESSED |
|
pub schema_infer_max_rec: usize, default = 100 |
|
pub date_format: Option<String>, default = None |
|
pub datetime_format: Option<String>, default = None |
|
pub timestamp_format: Option<String>, default = None |
|
pub timestamp_tz_format: Option<String>, default = None |
|
pub time_format: Option<String>, default = None |
|
pub null_value: Option<String>, default = None |
|
} |
- pass to
arrow-csv:
|
impl TryFrom<&CsvOptions> for CsvWriterOptions { |
|
type Error = DataFusionError; |
|
|
|
fn try_from(value: &CsvOptions) -> Result<Self> { |
|
let mut builder = WriterBuilder::default() |
|
.with_header(value.has_header.unwrap_or(false)) |
|
.with_delimiter(value.delimiter); |
|
|
|
if let Some(v) = &value.date_format { |
|
builder = builder.with_date_format(v.into()) |
|
} |
|
if let Some(v) = &value.datetime_format { |
|
builder = builder.with_datetime_format(v.into()) |
|
} |
|
if let Some(v) = &value.timestamp_format { |
|
builder = builder.with_timestamp_format(v.into()) |
|
} |
|
if let Some(v) = &value.time_format { |
|
builder = builder.with_time_format(v.into()) |
|
} |
|
if let Some(v) = &value.null_value { |
|
builder = builder.with_null(v.into()) |
|
} |
|
Ok(CsvWriterOptions { |
|
writer_options: builder, |
|
compression: value.compression, |
|
}) |
|
} |
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
CSV writers usually supports configuration of quote style/mode with the following options:
AlwaysNecessaryNeverNonNumericSometimes this just need to be controlled, and for now only way to change that is to re-iterate through result file(s) in order to store the content with desired quote style.
You can find such configs in many libraries:
csvcrate (QuoteStyle),csvfrom python (constants, likeQUOTE_ALLQuoteMode)Describe the solution you'd like
Just expose a way to pass the
QuoteStyleenum along with other properties likequote,delimiterand etc (as part ofCsvOptions). However, need to keep in mind that the configuration only makes sense for writers, not readers.That shouldn't be an issue to support, because
datafusionrelies onarrow-csvwhich usescsvcrate under the hood.arrow-csvto accept quote-style param (sub-issue forarrow-rs?)WriterBuilder: https://github.com/apache/arrow-rs/blob/4b5d9bfc958c06fb1ff71d90ba58497e965eff40/arrow-csv/src/writer.rs#L191-L214csv::Writer: https://github.com/apache/arrow-rs/blob/4b5d9bfc958c06fb1ff71d90ba58497e965eff40/arrow-csv/src/writer.rs#L402-L408datafusionCsvOptions:datafusion/datafusion/common/src/config.rs
Lines 1554 to 1570 in ea92ae7
arrow-csv:datafusion/datafusion/common/src/file_options/csv_writer.rs
Lines 48 to 75 in ea92ae7
Describe alternatives you've considered
No response
Additional context
No response