Adaptively select XZ recompress dictionary size of up to 128 MiB by kotauskas · Pull Request #97 · rust-lang/promote-release

kotauskas · 2025-09-21T08:41:06Z

This increases peak RSS for users of Rustup by 64 MiB in exchange for non-negligible improvements in compression ratio for the larger tarballs:

# component    bytes_un    bytes_cur   bytes_128m   ratio
rust-docs   :  669916672   21485344    20294200    -5.543984%
rustc       :  386717696   82519204    76896156    -6.81423 %
llvm-tools  :  194253312   39117832    36593820    -6.45233 %
rust-std    :  163678208   29115852    28910652    -0.70477 %
cargo       :   42116608   10679724    10679732    +0.000075%
rust-src    :   40181760    3473408     3473416    +0.00023 %
clippy      :   21029376    4544900     4544908    +0.00018 %
rustfmt     :    9690624    2255472     2255480    +0.00035 %

All tests were done on tarballs from https://static.rust-lang.org/dist/2025-09-18/{component}-1.90.0-x86_64-unknown-linux-gnu.tar.xz. The size of the compressed tarballs directly downloaded from static.rust-lang.org is shown in the bytes_cur column.

bytes_128m is the size of the output of xz -T1 --lzma=preset=9e,depth=1000,dict=128M, which is the same configuration as what prepare-release does with the change this pull request makes.
The version used is XZ Utils 5.8.1 from Arch Linux repositories. The cargo, rust-src, clippy and rustfmt components (all smaller than 128 MiB) appearing as having regressed by exactly 8 bytes is likely a mismatch between the compressor version information written by xz and that written by prepare-release, so the 8-byte increase will probably not show up in actuality.

I have confirmed via GNU Time (/bin/time -v) that decompressor memory usage increases by no more than 64 MiB.

Additionally, the recompressor now takes note of the size of the uncompressed file to avoid excessive dictionary sizes for components that are too small to benefit from the new 128 MiB maximum. This reduces memory usage, during both compression and decompression. As per XZ documentation, a dictionary size of the form 2^n or 2^n + 2^(n-1) is selected. For files smaller than 128 MiB, the smallest possible one that meets or exceeds the size of the file (thus maximizing compression ratio) is chosen; beyond that, it is capped at 128 MiB.

src/recompress.rs

The recompressor will do GZip compression first, take note of the decompressed size measured in the process, and use it to select an optimal dictionary size for the XZ recompression, if it is to happen at all. If XZ recompression is needed but GZip compression isn't, it will perform a decompression solely to measure the decompressed size.

kotauskas · 2025-09-21T21:06:49Z

I've refactored the code a bit to implement the following algorithm:

If GZip compression is needed, stream-decompress the XZ file to produce a .tar.gz tarball next to it, measuring how long it took and how big the uncompressed tarball is
If XZ recompression is requested:
- If GZip compression wasn't performed, skim the .tar.xz to measure the decompressed size. (Not sure if the xz2 crate exposes a way of getting the uncompressed size XZ chucks at the end of the file.)
- Stream-decompress the XZ file to produce a .tar.xz tarball with better compression.
Print the most recent measurement of the time taken by decompression together with the per-compressor duration stats and the total time taken.

Rustfmt is prone to picking up configs from other places (ones in conflict with the default style that CI enforces). This will keep it from doing that.

Those who use a shared directory, such as myself, would previously see `docker compose build` fail to add the actual build of `promote-release` to the container, as the `target` directory at the repo level simply wouldn't be there. No longer so with this change.

Forgot to `.rewind()`, as one does. Added a handful of Anyhow `.with_context()` calls to help prevent future goose chases.

kotauskas · 2025-09-22T20:09:37Z

Hey, look, it's actually shaved about 4 minutes off the test release timings despite decompression having to happen twice instead of once.

.cargo/config.toml

src/recompress.rs

Increase XZ recompress dictsize to 128 MiB

9e6469c

Mark-Simulacrum reviewed Sep 21, 2025

View reviewed changes

src/recompress.rs Outdated Show resolved Hide resolved

kotauskas force-pushed the patch-1 branch 2 times, most recently from f4bad2b to f1180bd Compare September 21, 2025 19:55

Avoid excessive dictionary size

9d00754

kotauskas force-pushed the patch-1 branch from f1180bd to 9d00754 Compare September 21, 2025 19:57

kotauskas added 2 commits September 21, 2025 23:53

Close the recompression input early

747e4c0

kotauskas changed the title ~~Increase XZ recompress dictionary size to 128 MiB~~ Adaptively select XZ recompress dictionary size of up to 128 MiB Sep 21, 2025

Fix Rustfmt discrepancies

ace027b

kotauskas force-pushed the patch-1 branch from 17d3792 to ace027b Compare September 21, 2025 21:13

kotauskas added 3 commits September 22, 2025 22:36

Add empty Rustfmt config file

3c0d089

Rustfmt is prone to picking up configs from other places (ones in conflict with the default style that CI enforces). This will keep it from doing that.

Fix adaptive XZ dictionary size for real this time

92c06e1

Forgot to `.rewind()`, as one does. Added a handful of Anyhow `.with_context()` calls to help prevent future goose chases.

kotauskas requested a review from Mark-Simulacrum September 23, 2025 16:23

Mark-Simulacrum reviewed Sep 26, 2025

View reviewed changes

kotauskas added 2 commits September 27, 2025 11:16

Improve readability in accordance with code review

3adffd9

Clarify the code some more

d0ecb1f

kotauskas requested a review from Mark-Simulacrum September 27, 2025 18:04

Mark-Simulacrum reviewed Sep 28, 2025

View reviewed changes

src/recompress.rs Outdated Show resolved Hide resolved

Fix comment

a857b11

Mark-Simulacrum enabled auto-merge (squash) September 28, 2025 15:18

Mark-Simulacrum merged commit cad8128 into rust-lang:master Sep 28, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adaptively select XZ recompress dictionary size of up to 128 MiB#97

Adaptively select XZ recompress dictionary size of up to 128 MiB#97
Mark-Simulacrum merged 11 commits intorust-lang:masterfrom
kotauskas:patch-1

kotauskas commented Sep 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

kotauskas commented Sep 21, 2025

Uh oh!

kotauskas commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kotauskas commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kotauskas commented Sep 21, 2025

Uh oh!

kotauskas commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kotauskas commented Sep 21, 2025 •

edited

Loading