Skip to content

Memory allocation error due to Longtext column #43

@langesven

Description

@langesven

Hey, not entirely sure this issue will make sense, but I'm an absolute parquet noob and found your tool as a way of dumping stuff from a MariaDB to Parquet to provide this to other folks.

I'm encountering a memory allocation error

# odbc2parquet -vvv query --connection-string "Driver={MySQL ODBC 8.0 Unicode Driver};Server=${MARIADB_HOST};Database=${DATABASE};Uid=${MARIADB_USER};Pwd=${MARIADB_PASS};" --batch-size 100000 --batches-per-file 100 "/tmp/${TABLE}.par" "SELECT * FROM ${TABLE}"
2021-03-19T12:45:21+00:00 - DEBUG - ODBC Environment created.
2021-03-19T12:45:21+00:00 - INFO - Batch size set to 100000
2021-03-19T12:45:21+00:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [114, 101, 115, 101, 108, 108, 101, 114, 95, 105, 100], data_type: Integer, nullability: Nullable }
2021-03-19T12:45:21+00:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: I32 }
[...]
2021-03-19T12:45:21+00:00 - DEBUG - ODBC column description for column 182: ColumnDescription { name: [101, 120, 116, 114, 97, 95, 97, 116, 116, 114, 105, 98, 117, 116, 101, 115], data_type: Other { data_type: SqlDataType(-10), column_size: 65535, decimal_digits: 0 }, nullability: Nullable }
2021-03-19T12:45:21+00:00 - DEBUG - ODBC buffer description for column 182: BufferDescription { nullable: true, kind: Text { max_str_len: 21845 } }
memory allocation of 143165576600000 bytes failed
Aborted (core dumped)

which I'm fairly certain should be connected to the following

2021-03-19T12:45:21+00:00 - DEBUG - ODBC column description for column 166: ColumnDescription { name: [112, 114, 105, 118, 97, 99, 121, 95, 112, 111, 108, 105, 99, 121], data_type: Other { data_type: SqlDataType(-10), column_size: 4294967295, decimal_digits: 0 }, nullability: Nullable }
2021-03-19T12:45:21+00:00 - DEBUG - ODBC buffer description for column 166: BufferDescription { nullable: true, kind: Text { max_str_len: 1431655765 } }

which in MySQL is this

| privacy_policy                        | longtext                                                           | YES  |     | NULL                |                |

The factor between the columns max_str_len and the memory allocation is a bit more than 100000 so this appears too connected to be random to me.
I have no influence over the source data, so I will not be able to convince anyone to change the type of this field from LONGTEXT to something more reasonable. The largest entry in this column is 366211 characters, so there's definitely no data in there that would require a memory allocation of 143TB.

I'm not entirely sure why this happens though, hence this issue.
The maximum length for an entry in a LONGTEXT column is 4.3GB, which again, none of the entries are even close to having, but no one will be touching this. But how could this lead to a memory allocation of a bit more than 100000 that?
I'm guessing the allocation happens somewhere around https://github.com/pacman82/odbc2parquet/blob/master/src/query.rs#L417-L428 given that this is a field of type other? The entire loop runs through though as you can see above. The table in question has 182 columns and we see column/buffer descriptions for every column. The memory allocation error happens after that.

Do you have any ideas of what could be done about this? Would be really nice to dump this data into Parquet but with it randomly crashing right now I'm entirely at a loss :)
I'm running odbc2parquet 0.5.9 installed via cargo on debian buster.
If I can provide any more data that could help here I'm completely up for that!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions