Skip to content

Adaptively select XZ recompress dictionary size of up to 128 MiB#97

Merged
Mark-Simulacrum merged 11 commits intorust-lang:masterfrom
kotauskas:patch-1
Sep 28, 2025
Merged

Adaptively select XZ recompress dictionary size of up to 128 MiB#97
Mark-Simulacrum merged 11 commits intorust-lang:masterfrom
kotauskas:patch-1

Conversation

@kotauskas
Copy link
Copy Markdown
Contributor

@kotauskas kotauskas commented Sep 21, 2025

This increases peak RSS for users of Rustup by 64 MiB in exchange for non-negligible improvements in compression ratio for the larger tarballs:

# component    bytes_un    bytes_cur   bytes_128m   ratio
rust-docs   :  669916672   21485344    20294200    -5.543984%
rustc       :  386717696   82519204    76896156    -6.81423 %
llvm-tools  :  194253312   39117832    36593820    -6.45233 %
rust-std    :  163678208   29115852    28910652    -0.70477 %
cargo       :   42116608   10679724    10679732    +0.000075%
rust-src    :   40181760    3473408     3473416    +0.00023 %
clippy      :   21029376    4544900     4544908    +0.00018 %
rustfmt     :    9690624    2255472     2255480    +0.00035 %

All tests were done on tarballs from https://static.rust-lang.org/dist/2025-09-18/{component}-1.90.0-x86_64-unknown-linux-gnu.tar.xz. The size of the compressed tarballs directly downloaded from static.rust-lang.org is shown in the bytes_cur column.

bytes_128m is the size of the output of xz -T1 --lzma=preset=9e,depth=1000,dict=128M, which is the same configuration as what prepare-release does with the change this pull request makes.
The version used is XZ Utils 5.8.1 from Arch Linux repositories. The cargo, rust-src, clippy and rustfmt components (all smaller than 128 MiB) appearing as having regressed by exactly 8 bytes is likely a mismatch between the compressor version information written by xz and that written by prepare-release, so the 8-byte increase will probably not show up in actuality.

I have confirmed via GNU Time (/bin/time -v) that decompressor memory usage increases by no more than 64 MiB.

Additionally, the recompressor now takes note of the size of the uncompressed file to avoid excessive dictionary sizes for components that are too small to benefit from the new 128 MiB maximum. This reduces memory usage, during both compression and decompression. As per XZ documentation, a dictionary size of the form 2^n or 2^n + 2^(n-1) is selected. For files smaller than 128 MiB, the smallest possible one that meets or exceeds the size of the file (thus maximizing compression ratio) is chosen; beyond that, it is capped at 128 MiB.

@kotauskas kotauskas force-pushed the patch-1 branch 2 times, most recently from f4bad2b to f1180bd Compare September 21, 2025 19:55
The recompressor will do GZip compression first, take note of the
decompressed size measured in the process, and use it to select an
optimal dictionary size for the XZ recompression, if it is to happen at
all. If XZ recompression is needed but GZip compression isn't, it will
perform a decompression solely to measure the decompressed size.
@kotauskas
Copy link
Copy Markdown
Contributor Author

I've refactored the code a bit to implement the following algorithm:

  • If GZip compression is needed, stream-decompress the XZ file to produce a .tar.gz tarball next to it, measuring how long it took and how big the uncompressed tarball is
  • If XZ recompression is requested:
    • If GZip compression wasn't performed, skim the .tar.xz to measure the decompressed size. (Not sure if the xz2 crate exposes a way of getting the uncompressed size XZ chucks at the end of the file.)
    • Stream-decompress the XZ file to produce a .tar.xz tarball with better compression.
  • Print the most recent measurement of the time taken by decompression together with the per-compressor duration stats and the total time taken.

@kotauskas kotauskas changed the title Increase XZ recompress dictionary size to 128 MiB Adaptively select XZ recompress dictionary size of up to 128 MiB Sep 21, 2025
Rustfmt is prone to picking up configs from other places (ones in
conflict with the default style that CI enforces). This will keep it
from doing that.
Those who use a shared directory, such as myself, would previously see
`docker compose build` fail to add the actual build of `promote-release`
to the container, as the `target` directory at the repo level simply
wouldn't be there. No longer so with this change.
Forgot to `.rewind()`, as one does. Added a handful of Anyhow
`.with_context()` calls to help prevent future goose chases.
@kotauskas
Copy link
Copy Markdown
Contributor Author

Hey, look, it's actually shaved about 4 minutes off the test release timings despite decompression having to happen twice instead of once.

@Mark-Simulacrum Mark-Simulacrum enabled auto-merge (squash) September 28, 2025 15:18
@Mark-Simulacrum Mark-Simulacrum merged commit cad8128 into rust-lang:master Sep 28, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants