Skip to content

Commit 6b44167

Browse files
junyan-lingJunyan Ling
andauthored
feat(parquet/file): pre-allocate BinaryBuilder data buffer using column chunk metadata to eliminate resize overhead (#689)
### Rationale for this change This PR is to address issue #688 `byteArrayRecordReader` builds binary/string Arrow arrays using `array.BinaryBuilder`, but the builder's data buffer starts empty and grows via repeated doublings as values are appended. For large binary columns this causes O(log n) realloc+copy cycles per row group, wasting both time and memory. This PR threads column chunk size metadata (`TotalUncompressedSize`, `NumRows`) from `columnIterator.NextChunk()` down to `leafReader`, and uses it to pre-allocate the builder's data buffer at the start of each `LoadBatch` call via `BinaryBuilder.ReserveData`. ### What changes are included in this PR? - **`parquet/file/record_reader.go`**: adds `ReserveData(int64)` to `BinaryRecordReader` interface and implements it on `byteArrayRecordReader`; adds a no-op implementation on `flbaRecordReader`. - **`parquet/pqarrow/file_reader.go`**: `columnIterator.NextChunk()` now returns `(PageReader, uncompressedBytes, numRows, error)`. - **`parquet/pqarrow/column_readers.go`**: `leafReader` stores current row group metadata; `LoadBatch` calls `reserveBinaryData(nrecords)` after each reset; `nextRowGroup` takes a `remainingRows` parameter to extend the reservation when crossing row group boundaries mid-batch. - **`parquet/pqarrow/properties.go`**: adds `PreAllocBinaryData bool` to `ArrowReadProperties` (default: `false`). Opt in via: ```go props := pqarrow.ArrowReadProperties{ PreAllocBinaryData: true, } reader, err := pqarrow.NewFileReader(pf, props, mem) ``` ### Are these changes tested? Yes. parquet/pqarrow/binary_prealloc_test.go covers: - Default flag value is false (no behaviour change for existing callers) - Correctness of output for binary, string, nullable, int32, FLBA, and dict-encoded columns - All batch size configurations: unbounded, one batch per row group, multiple batches per row group, and batches that span row group boundaries Benchmark in parquet/pqarrow/reader_writer_test.go (BenchmarkPreAllocBinaryData) compares prealloc=false vs prealloc=true on a two-column schema (slim string id + fat binary blob, 5 KB–50 KB values, Zstd, 2 row groups × 484 rows): Environment: Apple M1 Max · count=3 · medians reported ``` ┌────────────────┬─────────────┬─────────────┬────────┬─────────────┬─────────────┬────────┬────────────────┬───────────────┬─────────┐ │ Sub-benchmark │ ns/op │ ns/op │ Δ │ B/op │ B/op (true) │ Δ B/op │ allocs/op │ allocs/op │ Δ │ │ │ (false) │ (true) │ ns/op │ (false) │ │ │ (false) │ (true) │ allocs │ ├────────────────┼─────────────┼─────────────┼────────┼─────────────┼─────────────┼────────┼────────────────┼───────────────┼─────────┤ │ batchAll │ 9,117,272 │ 7,993,732 │ -12.3% │ 144,021,824 │ 115,098,562 │ -20.1% │ 511 │ 494 │ -3.3% │ ├────────────────┼─────────────┼─────────────┼────────┼─────────────┼─────────────┼────────┼────────────────┼───────────────┼─────────┤ │ batchPerRG │ 9,190,661 │ 8,083,567 │ -12.0% │ 144,024,680 │ 115,096,686 │ -20.1% │ 513 │ 493 │ -3.9% │ ├────────────────┼─────────────┼─────────────┼────────┼─────────────┼─────────────┼────────┼────────────────┼───────────────┼─────────┤ │ batchQuarterRG │ 9,116,379 │ 7,896,174 │ -13.4% │ 144,023,299 │ 115,097,206 │ -20.1% │ 512 │ 493 │ -3.7% │ └────────────────┴─────────────┴─────────────┴────────┴─────────────┴─────────────┴────────┴────────────────┴───────────────┴─────────┘ ``` Note: production workloads with larger values (~250 KB/row) will see larger improvements - more reallocation doublings are eliminated at greater value sizes. This benchmark uses 5–50 KB values to keep runtime practical. ### Are there any user-facing changes? Yes, opt-in. A new field PreAllocBinaryData bool is added to ArrowReadProperties. It defaults to false, so all existing code is unaffected without any changes. Users with large binary or string columns can enable it to reduce memory allocations and improve read throughput. --------- Co-authored-by: Junyan Ling <jling22@apple.com>
1 parent 2895752 commit 6b44167

6 files changed

Lines changed: 610 additions & 7 deletions

File tree

parquet/file/record_reader.go

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,9 @@ type BinaryRecordReader interface {
9191
RecordReader
9292
GetBuilderChunks() []arrow.Array
9393
ReadDictionary() bool
94+
// ReserveData pre-allocates nbytes in the underlying data buffer to reduce
95+
// reallocations when the total data size is known in advance.
96+
ReserveData(int64)
9497
}
9598

9699
// recordReaderImpl is the internal interface implemented for different types
@@ -117,6 +120,7 @@ type binaryRecordReaderImpl interface {
117120
recordReaderImpl
118121
GetBuilderChunks() []arrow.Array
119122
ReadDictionary() bool
123+
ReserveData(int64)
120124
}
121125

122126
// primitiveRecordReader is a record reader for primitive types, ie: not byte array or fixed len byte array
@@ -343,6 +347,10 @@ func (b *binaryRecordReader) GetBuilderChunks() []arrow.Array {
343347
return b.recordReaderImpl.(binaryRecordReaderImpl).GetBuilderChunks()
344348
}
345349

350+
func (b *binaryRecordReader) ReserveData(nbytes int64) {
351+
b.recordReaderImpl.(binaryRecordReaderImpl).ReserveData(nbytes)
352+
}
353+
346354
func newRecordReader(descr *schema.Column, info LevelInfo, mem memory.Allocator, bufferPool *sync.Pool) RecordReader {
347355
if mem == nil {
348356
mem = memory.DefaultAllocator
@@ -758,6 +766,8 @@ func (fr *flbaRecordReader) GetBuilderChunks() []arrow.Array {
758766

759767
func (fr *flbaRecordReader) ReadDictionary() bool { return false }
760768

769+
func (fr *flbaRecordReader) ReserveData(int64) {}
770+
761771
func newFLBARecordReader(descr *schema.Column, info LevelInfo, mem memory.Allocator, bufferPool *sync.Pool) RecordReader {
762772
if mem == nil {
763773
mem = memory.DefaultAllocator
@@ -817,6 +827,18 @@ func (br *byteArrayRecordReader) ReserveValues(extra int64, hasNullable bool) er
817827
return br.primitiveRecordReader.ReserveValues(extra, hasNullable)
818828
}
819829

830+
// ReserveData pre-allocates nbytes in the builder's data buffer.
831+
// This reduces reallocations when the total binary payload size is known in advance,
832+
// e.g. from TotalUncompressedSize in the column chunk metadata.
833+
func (br *byteArrayRecordReader) ReserveData(nbytes int64) {
834+
if nbytes <= 0 {
835+
return
836+
}
837+
if binaryBldr, ok := br.bldr.(*array.BinaryBuilder); ok {
838+
binaryBldr.ReserveData(int(nbytes))
839+
}
840+
}
841+
820842
func (br *byteArrayRecordReader) Retain() {
821843
br.bldr.Retain()
822844
br.primitiveRecordReader.Retain()

0 commit comments

Comments
 (0)