Adaptively select XZ recompress dictionary size of up to 128 MiB#97
Merged
Mark-Simulacrum merged 11 commits intorust-lang:masterfrom Sep 28, 2025
Merged
Adaptively select XZ recompress dictionary size of up to 128 MiB#97Mark-Simulacrum merged 11 commits intorust-lang:masterfrom
Mark-Simulacrum merged 11 commits intorust-lang:masterfrom
Conversation
f4bad2b to
f1180bd
Compare
The recompressor will do GZip compression first, take note of the decompressed size measured in the process, and use it to select an optimal dictionary size for the XZ recompression, if it is to happen at all. If XZ recompression is needed but GZip compression isn't, it will perform a decompression solely to measure the decompressed size.
Contributor
Author
|
I've refactored the code a bit to implement the following algorithm:
|
Rustfmt is prone to picking up configs from other places (ones in conflict with the default style that CI enforces). This will keep it from doing that.
Those who use a shared directory, such as myself, would previously see `docker compose build` fail to add the actual build of `promote-release` to the container, as the `target` directory at the repo level simply wouldn't be there. No longer so with this change.
Forgot to `.rewind()`, as one does. Added a handful of Anyhow `.with_context()` calls to help prevent future goose chases.
Contributor
Author
|
Hey, look, it's actually shaved about 4 minutes off the test release timings despite decompression having to happen twice instead of once. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This increases peak RSS for users of Rustup by 64 MiB in exchange for non-negligible improvements in compression ratio for the larger tarballs:
All tests were done on tarballs from
https://static.rust-lang.org/dist/2025-09-18/{component}-1.90.0-x86_64-unknown-linux-gnu.tar.xz. The size of the compressed tarballs directly downloaded from static.rust-lang.org is shown in thebytes_curcolumn.bytes_128mis the size of the output ofxz -T1 --lzma=preset=9e,depth=1000,dict=128M, which is the same configuration as whatprepare-releasedoes with the change this pull request makes.The version used is XZ Utils 5.8.1 from Arch Linux repositories. The
cargo,rust-src,clippyandrustfmtcomponents (all smaller than 128 MiB) appearing as having regressed by exactly 8 bytes is likely a mismatch between the compressor version information written byxzand that written byprepare-release, so the 8-byte increase will probably not show up in actuality.I have confirmed via GNU Time (
/bin/time -v) that decompressor memory usage increases by no more than 64 MiB.Additionally, the recompressor now takes note of the size of the uncompressed file to avoid excessive dictionary sizes for components that are too small to benefit from the new 128 MiB maximum. This reduces memory usage, during both compression and decompression. As per XZ documentation, a dictionary size of the form
2^nor2^n + 2^(n-1)is selected. For files smaller than 128 MiB, the smallest possible one that meets or exceeds the size of the file (thus maximizing compression ratio) is chosen; beyond that, it is capped at 128 MiB.