feat(parquet/file): pre-allocate BinaryBuilder data buffer using column chunk metadata to eliminate resize overhead by junyan-ling · Pull Request #689 · apache/arrow-go

junyan-ling · 2026-03-05T21:34:55Z

Rationale for this change

This PR is to address issue #688

byteArrayRecordReader builds binary/string Arrow arrays using array.BinaryBuilder, but the builder's data buffer starts empty and grows via repeated doublings as values are appended. For large binary columns this causes O(log n) realloc+copy cycles per row group, wasting both time and memory.

This PR threads column chunk size metadata (TotalUncompressedSize, NumRows) from columnIterator.NextChunk() down to leafReader, and uses it to pre-allocate the builder's data buffer at the start of each LoadBatch call via BinaryBuilder.ReserveData.

What changes are included in this PR?

parquet/file/record_reader.go: adds ReserveData(int64) to BinaryRecordReader interface and implements it on
byteArrayRecordReader; adds a no-op implementation on flbaRecordReader.
parquet/pqarrow/file_reader.go: columnIterator.NextChunk() now returns (PageReader, uncompressedBytes, numRows, error).
parquet/pqarrow/column_readers.go: leafReader stores current row group metadata; LoadBatch calls
reserveBinaryData(nrecords) after each reset; nextRowGroup takes a remainingRows parameter to extend the reservation when crossing
row group boundaries mid-batch.
parquet/pqarrow/properties.go: adds PreAllocBinaryData bool to ArrowReadProperties (default: false).

Opt in via:

props := pqarrow.ArrowReadProperties{
    PreAllocBinaryData: true,
}
reader, err := pqarrow.NewFileReader(pf, props, mem)

Are these changes tested?

Yes. parquet/pqarrow/binary_prealloc_test.go covers:

Default flag value is false (no behaviour change for existing callers)
Correctness of output for binary, string, nullable, int32, FLBA, and dict-encoded columns
All batch size configurations: unbounded, one batch per row group, multiple batches per row group, and batches that span row group
boundaries

Benchmark in parquet/pqarrow/reader_writer_test.go (BenchmarkPreAllocBinaryData) compares prealloc=false vs prealloc=true on a two-column
schema (slim string id + fat binary blob, 5 KB–50 KB values, Zstd, 2 row groups × 484 rows):

Environment: Apple M1 Max · count=3 · medians reported

┌────────────────┬─────────────┬─────────────┬────────┬─────────────┬─────────────┬────────┬────────────────┬───────────────┬─────────┐
│ Sub-benchmark  │   ns/op     │   ns/op     │   Δ    │    B/op     │ B/op (true) │ Δ B/op │   allocs/op    │  allocs/op    │   Δ     │
│                │   (false)   │   (true)    │ ns/op  │   (false)   │             │        │    (false)     │    (true)     │ allocs  │
├────────────────┼─────────────┼─────────────┼────────┼─────────────┼─────────────┼────────┼────────────────┼───────────────┼─────────┤
│ batchAll       │   9,117,272 │   7,993,732 │ -12.3% │ 144,021,824 │ 115,098,562 │ -20.1% │            511 │           494 │   -3.3% │
├────────────────┼─────────────┼─────────────┼────────┼─────────────┼─────────────┼────────┼────────────────┼───────────────┼─────────┤
│ batchPerRG     │   9,190,661 │   8,083,567 │ -12.0% │ 144,024,680 │ 115,096,686 │ -20.1% │            513 │           493 │   -3.9% │
├────────────────┼─────────────┼─────────────┼────────┼─────────────┼─────────────┼────────┼────────────────┼───────────────┼─────────┤
│ batchQuarterRG │   9,116,379 │   7,896,174 │ -13.4% │ 144,023,299 │ 115,097,206 │ -20.1% │            512 │           493 │   -3.7% │
└────────────────┴─────────────┴─────────────┴────────┴─────────────┴─────────────┴────────┴────────────────┴───────────────┴─────────┘

Note: production workloads with larger values (~250 KB/row) will see larger improvements - more reallocation doublings are eliminated at greater value sizes. This benchmark uses 5–50 KB values to keep runtime practical.

Are there any user-facing changes?

Yes, opt-in. A new field PreAllocBinaryData bool is added to ArrowReadProperties. It defaults to false, so all existing code is
unaffected without any changes. Users with large binary or string columns can enable it to reduce memory allocations and improve read throughput.

…builder resize

zeroshade

Thanks for this! Just one question really below. Though you do have some linting issues to fix 😄

zeroshade · 2026-03-05T21:38:01Z


 func (fr *flbaRecordReader) ReadDictionary() bool { return false }

+func (fr *flbaRecordReader) ReserveData(int64) {}


Can't we also figure out how much to reserve here too?

FixedSizeBinaryBuilder doesn't have a variable-length data buffer, since all values are the same width, the buffer size is fully determined by the slot count, which ReserveValues already handles via bldr.Reserve(n). ReserveData is only meaningful for byteArrayRecordReader where value lengths vary.

Gotcha! Sounds good

junyan-ling · 2026-03-05T23:23:18Z

Hi @zeroshade , thanks a lot for merging the PR! I am curious about the release schedule - what does the cadence look like, or what's the best practice for us to upgrade our arrow-go to include this commit before a new release?

zeroshade · 2026-03-09T20:04:39Z

We just did a release last week, so ideally we'd wait a couple weeks at least before doing a new release. Depending on your own requirements, you could update your own go.mod to point at the commit hash directly (or use a replace directive) if you're okay with that. Is it urgent for you that there be a release with this change soon?

junyan-ling · 2026-03-10T20:05:54Z

We just did a release last week, so ideally we'd wait a couple weeks at least before doing a new release. Depending on your own requirements, you could update your own go.mod to point at the commit hash directly (or use a replace directive) if you're okay with that. Is it urgent for you that there be a release with this change soon?

Thanks @zeroshade ! Sounds good, I can build an internal version cherrypicking this commit. Thanks.

Yet I need to point out that, after cherry-picking this change, in the profiling files of e2e benchmark test k8s jobs, I noticed the CPU time is not dramatically improved. After further investigation, here are the reasons:

Pre-allocation only ensures the destination buffer is large enough, and it does not eliminate the per-value copy. When we read a parquet page, the decompressed bytes live in the page buffer. Each value then gets copied individually from the page buffer into the BinaryBuilder's data buffer via bufferBuilder.Append → copy() → memmove. This copy happens for every single value regardless of how large the destination buffer is. Pre-allocation prevents the destination from needing to resize mid-batch, but the fundamental copy from source to destination is unavoidable. This accounts for most of the memmove cost in the profiles.
Pre-allocation itself has a CPU cost: zeroing. Both Go's runtime and Arrow's memory allocator zero-fill all newly allocated memory. When ReserveData pre-allocates the data buffer, every byte gets zeroed via memset before any data is written. This zeroing is wasted work - we're about to overwrite every byte with actual values.

zeroshade · 2026-03-14T04:55:15Z

You're absolutely correct there, this really gives the best benefit to very specific use cases that were causing multiple reallocate-and-copy scenarios but doesn't do anything to help the case you're mentioning.

I have a series of other various performance focused optimizations I'm working on, but if there's a particular bottleneck you're hitting somewhere please let me know and I can try to focus some time on that.

Junyan Ling added 5 commits March 5, 2026 11:38

pre-allocate BinaryBuilder data buffer from column chunk metadata

8e8b35d

add flag to enable pre-allocation to reduce memory burden from binary…

b12d17f

…builder resize

add unit tests

dc42958

fix unit tests

196306f

add benchmark tests

23dd853

junyan-ling requested a review from zeroshade as a code owner March 5, 2026 21:34

zeroshade reviewed Mar 5, 2026

View reviewed changes

fix lint

c909d53

junyan-ling changed the title ~~Parquet binary prealloc~~ pre-allocate BinaryBuilder data buffer using column chunk metadata to eliminate resize overhead Mar 5, 2026

zeroshade changed the title ~~pre-allocate BinaryBuilder data buffer using column chunk metadata to eliminate resize overhead~~ feat(parquet/file): pre-allocate BinaryBuilder data buffer using column chunk metadata to eliminate resize overhead Mar 5, 2026

zeroshade approved these changes Mar 5, 2026

View reviewed changes

zeroshade merged commit 6b44167 into apache:main Mar 5, 2026
40 of 41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parquet/file): pre-allocate BinaryBuilder data buffer using column chunk metadata to eliminate resize overhead#689

feat(parquet/file): pre-allocate BinaryBuilder data buffer using column chunk metadata to eliminate resize overhead#689
zeroshade merged 6 commits intoapache:mainfrom
junyan-ling:parquet-binary-prealloc

junyan-ling commented Mar 5, 2026 •

edited

Loading

Uh oh!

zeroshade left a comment

Uh oh!

zeroshade Mar 5, 2026

Uh oh!

junyan-ling Mar 5, 2026

Uh oh!

zeroshade Mar 5, 2026

Uh oh!

Uh oh!

junyan-ling commented Mar 5, 2026

Uh oh!

zeroshade commented Mar 9, 2026

Uh oh!

junyan-ling commented Mar 10, 2026 •

edited

Loading

Uh oh!

zeroshade commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		func (fr *flbaRecordReader) ReadDictionary() bool { return false }

		func (fr *flbaRecordReader) ReserveData(int64) {}

Conversation

junyan-ling commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

zeroshade left a comment

Choose a reason for hiding this comment

Uh oh!

zeroshade Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

junyan-ling Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

zeroshade Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

junyan-ling commented Mar 5, 2026

Uh oh!

zeroshade commented Mar 9, 2026

Uh oh!

junyan-ling commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zeroshade commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

junyan-ling commented Mar 5, 2026 •

edited

Loading

junyan-ling commented Mar 10, 2026 •

edited

Loading