perf(parquet/compress): set zstd pool encoder concurrency to 1#717
Merged
zeroshade merged 1 commit intoapache:mainfrom Mar 17, 2026
Merged
perf(parquet/compress): set zstd pool encoder concurrency to 1#717zeroshade merged 1 commit intoapache:mainfrom
zeroshade merged 1 commit intoapache:mainfrom
Conversation
The zstdEncoderPool is used exclusively by EncodeAll(), which is a
single-shot synchronous call that uses exactly one inner block encoder.
However, zstd.NewWriter defaults concurrent to runtime.GOMAXPROCS,
pre-allocating that many inner block encoders — each with its own ~1 MiB
history buffer (ensureHist). On a 10-core machine, each pooled Encoder
allocates 10 inner encoders when only 1 is ever used by EncodeAll.
With WithEncoderConcurrency(1), each pooled encoder creates a single
inner encoder, matching actual usage. The streaming Write/Close path
is unaffected — it does not use the pool.
Benchmark results (Apple M4 Pro, arm64, 256 KiB semi-random data):
BenchmarkZstdPooledEncodeAll/Default-14 11000 B/op 5250 MB/s
BenchmarkZstdPooledEncodeAll/Concurrency1-14 810 B/op 5500 MB/s
14x less memory per operation, ~5% higher throughput from reduced GC
pressure.
In a parquet write workload (1 GiB Arrow data, ZSTD level 3), this
reduced ensureHist allocations from 22 GiB to 7 GiB and madvise
kernel CPU from 4.6s to 2.3s (10% wall-time improvement).
Member
|
Should this be a configurable setting that we just default to 1? |
Contributor
Author
|
Thanks for the prompt review! |
Member
|
I think this is fine for now, we can look into making it more controllable in a follow-up. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The zstdEncoderPool is used exclusively by EncodeAll(), which is a single-shot synchronous call that uses exactly one inner block encoder. However, zstd.NewWriter defaults concurrent to runtime.GOMAXPROCS, pre-allocating that many inner block encoders — each with its own ~1 MiB history buffer (ensureHist). On a 10-core machine, each pooled Encoder allocates 10 inner encoders when only 1 is ever used by EncodeAll.
With WithEncoderConcurrency(1), each pooled encoder creates a single inner encoder, matching actual usage. The streaming Write/Close path is unaffected — it does not use the pool.
Benchmark results (Apple M4 Pro, arm64, 256 KiB semi-random data):
14x less memory per operation, ~5% higher throughput from reduced GC pressure.
In a parquet write workload (1 GiB Arrow data, ZSTD level 3), this reduced ensureHist allocations from 22 GiB to 7 GiB and madvise kernel CPU from 4.6s to 2.3s (10% wall-time improvement).
Rationale for this change
High memory churn during parquet encoding
What changes are included in this PR?
Change to zstd encoder concurrency, a benchmark to reproduce results.
Are these changes tested?
Yes
Are there any user-facing changes?
No