| title | TiDB Lightning CSV Support | |
|---|---|---|
| summary | Learn how to import CSV files via TiDB Lightning. | |
| category | reference | |
| aliases |
|
TiDB Lightning supports reading CSV (comma-separated values) data source, as well as other delimited format such as TSV (tab-separated values).
A CSV file representing a whole table must be named as db_name.table_name.csv. This will be
restored as a table table_name inside the database db_name.
If a table spans multiple CSV files, they should be named like db_name.table_name.003.csv.
The file extension must be *.csv, even if the content is not separated by commas.
CSV files are schema-less. To import them into TiDB, a table schema must be provided. This could be done either by:
- Providing a file named
db_name.table_name-schema.sqlcontaining theCREATE TABLEDDL statement - Creating the empty tables directly in TiDB in the first place, and then setting
[mydumper] no-schema = trueintidb-lightning.toml.
The CSV format can be configured in tidb-lightning.toml under the [mydumper.csv] section.
Most settings have a corresponding option in the MySQL LOAD DATA statement.
[mydumper.csv]
# Separator between fields, should be an ASCII character.
separator = ','
# Quoting delimiter, can either be an ASCII character or empty string.
delimiter = '"'
# Whether the CSV files contain a header.
# If `header` is true, the first line will be skipped.
header = true
# Whether the CSV contains any NULL value.
# If `not-null` is true, all columns from CSV cannot be NULL.
not-null = false
# When `not-null` is false (i.e. CSV can contain NULL),
# fields equal to this value will be treated as NULL.
null = '\N'
# Whether to interpret backslash escapes inside fields.
backslash-escape = true
# If a line ends with a separator, remove it.
trim-last-separator = false-
Defines the field separator.
-
Must be a single ASCII character.
-
Common values:
','for CSV"\t"for TSV
-
Corresponds to the
FIELDS TERMINATED BYoption in the LOAD DATA statement.
-
Defines the delimiter used for quoting.
-
If
delimiteris empty, all fields are unquoted. -
Common values:
'"'quote fields with double-quote, same as RFC 4180''disable quoting
-
Corresponds to the
FIELDS ENCLOSED BYoption in theLOAD DATAstatement.
- Whether all CSV files contain a header row.
- If
headeris true, the first row will be used as the column names. Ifheaderis false, the first row is not special and treated as an ordinary data row.
-
The
not-nullsetting controls whether all fields are non-nullable. -
If
not-nullis false, the string specified bynullwill be transformed to the SQL NULL instead of a concrete value. -
Quoting will not affect whether a field is null.
For example, with the CSV file:
A,B,C \N,"\N",
In the default settings (
not-null = false; null = '\N'), the columnsAandBare both converted to NULL after importing to TiDB. The columnCis simply the empty string''but not NULL.
-
Whether to interpret backslash escapes inside fields.
-
If
backslash-escapeis true, the following sequences are recognized and transformed:Sequence Converted to \0Null character (U+0000) \bBackspace (U+0008) \nLine feed (U+000A) \rCarriage return (U+000D) \tTab (U+0009) \ZWindows EOF (U+001A) In all other cases (e.g.
\") the backslash is simply stripped, leaving the next character (") in the field. -
Quoting will not affect whether backslash escapes are interpreted.
-
Corresponds to the
FIELDS ESCAPED BY '\'option in theLOAD DATAstatement.
-
Treats the field
separatoras a terminator, and removes all trailing separators.For example, with the CSV file:
A,,B,,
-
When
trim-last-separator = false, this is interpreted as a row of 5 fields('A', '', 'B', '', ''). -
When
trim-last-separator = true, this is interpreted as a row of 3 fields('A', '', 'B').
TiDB Lightning does not support every option supported by the LOAD DATA statement. Some examples:
- The line terminator must only be CR (
\r), LF (\n) or CRLF (\r\n), i.e.LINES TERMINATED BYis not customizable. - There cannot be line prefixes (
LINES STARTING BY). - The header cannot be simply skipped (
IGNORE n LINES), it must be valid column names if present. - Delimiters and separators can only be a single ASCII character.
The default setting is already tuned for CSV following RFC 4180.
[mydumper.csv]
separator = ','
delimiter = '"'
header = true
not-null = false
null = '\N'
backslash-escape = true
trim-last-separator = falseExample content:
ID,Region,Count
1,"East",32
2,"South",\N
3,"West",10
4,"North",39
[mydumper.csv]
separator = "\t"
delimiter = ''
header = true
not-null = false
null = 'NULL'
backslash-escape = false
trim-last-separator = falseExample content:
ID Region Count
1 East 32
2 South NULL
3 West 10
4 North 39
[mydumper.csv]
separator = '|'
delimiter = ''
header = false
not-null = true
backslash-escape = false
trim-last-separator = trueExample content:
1|East|32|
2|South|0|
3|West|10|
4|North|39|