fix: respect offset/length when converting ArrayData to StructArray#7366
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes incorrect behavior in converting ArrayData to StructArray by respecting the offset and length from parent ArrayData when slicing child arrays. Key changes include:
- Updating test cases to validate error scenarios and correct slicing in StructArray.
- Altering the From implementation in StructArray to adjust child array offsets and lengths.
- Adding tests to cover both correct conversion and expected panics when child arrays are undersized.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| arrow-ipc/src/reader.rs | Updated tests to use the new structural field handling for invalid arrays. |
| arrow-array/src/array/struct_array.rs | Modified array conversion to adjust child offsets and lengths, plus added tests. |
alamb
left a comment
There was a problem hiding this comment.
Thank you @westonpace -- I have one question about the assert / safety check. Otherwise this seems like a nice fix to me
FYI @kylebarron and @tustvold
There was a problem hiding this comment.
I did verify that these tests fail without the code change in this PR
thread 'array::struct_array::tests::test_struct_array_from_data_with_offset_and_length' panicked at arrow-array/src/array/struct_array.rs:551:9:
assertion `left == right` failed
left: StructArray
-- validity:
There was a problem hiding this comment.
doesn't this require that child_offset + parent_offset + parent_len is >= child_len?
The check above only verifies that parent_len + parent_offset is greater than child_len
There was a problem hiding this comment.
We do not need to consider child_offset. The child_len is "number of items in the underlying array, after skipping any child_offset."
There was a problem hiding this comment.
Rather than this somewhat opaque logic, which I am also not sure is correct w.r.t nulls, I wonder if we could just do something along these lines
let child = make_array(cd.clone());
if (parent_offset != 0 || parent_len != cd .len()) {
return child.slice(parent_offset, parent_len);
}
return child
There was a problem hiding this comment.
Done. That's much nicer, thanks.
There was a problem hiding this comment.
I think this need to also slice the child's null buffer, although I left a comment above on how to simplify this logic - ArrayData is very fiddly to use correctly, the less code using it the better
|
It seems we've been blessed with a new version of Rust 😆 I'll work on the clippy errors in a new PR. |
…reate invalid arrays by creating the invalid arrays with a different path.
febe908 to
6737ebe
Compare
alamb
left a comment
There was a problem hiding this comment.
Thanks @westonpace -- this is looking very good. I have only one more question
Good catch. I've added a test case to ensure it is covered. |
alamb
left a comment
There was a problem hiding this comment.
Thank you @westonpace -- a very nice piece of work
Which issue does this PR close?
Rationale for this change
See issue. When converting arrays from arrow-cpp then sliced struct arrays may have an offset/length. By ignoring these the resulting
StructArrayis incorrect.What changes are included in this PR?
Changes the behavior of
StructArray::from::<ArrayData>Are there any user-facing changes?
No. It's hard to imagine any user would be relying on the past behavior as it was invalid. There was one unit test that had to change because it was relying on this behavior to construct invalid arrays (to ensure the IPC could handle these invalid arrays).