Skip to content

Add quote-style parameter for CSV options #10669

@DDtKey

Description

@DDtKey

Is your feature request related to a problem or challenge?

CSV writers usually supports configuration of quote style/mode with the following options:

  • Always
  • Necessary
  • Never
  • NonNumeric

Sometimes this just need to be controlled, and for now only way to change that is to re-iterate through result file(s) in order to store the content with desired quote style.

You can find such configs in many libraries:

Describe the solution you'd like

Just expose a way to pass the QuoteStyle enum along with other properties like quote, delimiter and etc (as part of CsvOptions). However, need to keep in mind that the configuration only makes sense for writers, not readers.

That shouldn't be an issue to support, because datafusion relies on arrow-csv which uses csv crate under the hood.

  • requires to update arrow-csv to accept quote-style param (sub-issue for arrow-rs?)
  • update datafusion
    • add parameter to CsvOptions:
      pub struct CsvOptions {
      /// Specifies whether there is a CSV header (i.e. the first line
      /// consists of is column names). The value `None` indicates that
      /// the configuration should be consulted.
      pub has_header: Option<bool>, default = None
      pub delimiter: u8, default = b','
      pub quote: u8, default = b'"'
      pub escape: Option<u8>, default = None
      pub compression: CompressionTypeVariant, default = CompressionTypeVariant::UNCOMPRESSED
      pub schema_infer_max_rec: usize, default = 100
      pub date_format: Option<String>, default = None
      pub datetime_format: Option<String>, default = None
      pub timestamp_format: Option<String>, default = None
      pub timestamp_tz_format: Option<String>, default = None
      pub time_format: Option<String>, default = None
      pub null_value: Option<String>, default = None
      }
    • pass to arrow-csv:
      impl TryFrom<&CsvOptions> for CsvWriterOptions {
      type Error = DataFusionError;
      fn try_from(value: &CsvOptions) -> Result<Self> {
      let mut builder = WriterBuilder::default()
      .with_header(value.has_header.unwrap_or(false))
      .with_delimiter(value.delimiter);
      if let Some(v) = &value.date_format {
      builder = builder.with_date_format(v.into())
      }
      if let Some(v) = &value.datetime_format {
      builder = builder.with_datetime_format(v.into())
      }
      if let Some(v) = &value.timestamp_format {
      builder = builder.with_timestamp_format(v.into())
      }
      if let Some(v) = &value.time_format {
      builder = builder.with_time_format(v.into())
      }
      if let Some(v) = &value.null_value {
      builder = builder.with_null(v.into())
      }
      Ok(CsvWriterOptions {
      writer_options: builder,
      compression: value.compression,
      })
      }

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions