improve runtime performance and binary size of RFC 2822 printer#460
Merged
BurntSushi merged 5 commits intomasterfrom Dec 26, 2025
Merged
improve runtime performance and binary size of RFC 2822 printer#460BurntSushi merged 5 commits intomasterfrom
BurntSushi merged 5 commits intomasterfrom
Conversation
We're going to be doing some surgery on the RFC 2822 printer. Mostly motivated by decreasing binary size. But we should be able to optimize runtime as well. The printer is actually already pretty slow compared to `time`: ``` $ critcmp base -g '(.*)/(?:humantime|jiff|chrono|time)$' group base//chrono base//jiff base//time ----- ------------ ---------- ---------- parse/rfc2822 3.08 60.1±0.78ns ? ?/sec 1.00 19.5±0.29ns ? ?/sec 3.73 72.6±0.83ns ? ?/sec print/rfc2822 2.94 65.4±0.57ns ? ?/sec 2.34 52.1±0.42ns ? ?/sec 1.00 22.2±0.22ns ? ?/sec ``` So hopefully we can fix that as well.
227e36f to
9670e8d
Compare
We'll use this to overhaul the RFC 2822 printer in a subsequent commit.
The printers can use this method to write directly into the `Vec<u8>`'s spare capacity, instead of needing to write to our new uninitialized buffer and then copy the data via the `jiff::fmt::Write` interface. Using the uninitialized buffer without this is still a dramatic improvement. But this helps a bit more with minimal impact on code size.
9670e8d to
92554af
Compare
…ction This uses the new code added in the previous two commits. There should be no behavior changes here. We're just changing the implementation to write directly into uninitialized data. Whether it's a fixed size buffer on the stack or directly into the spare capacity of a `String` or a `Vec<u8>`. Note that this also adds a new error case to the RFC 2822 printer: when rounding the offset would result in an out-of-bounds offset, we now return an error. Previously, we would print an offset that Jiff would then later fail to parse.
Kind of interesting that this doesn't appear on newer versions of Rust. I think it's technically correct?
92554af to
867ea6d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This takes some ideas from @hanna-kruppe in #373 to decrease
code bloat in Jiff's various printers. In this PR, we just take
the first bite: we improve the RFC 2822 printer by using a new
abstraction for writing to uninitialized memory. The design of
this abstraction takes a lot of inspiration from the unstable
std::io::BorrowedBuffrom the standard library.For binary size, I used this program as my benchmark:
I then defined a
release-ltoCargo profile that setslto = "fat"and used
cargo llvm-lines --profile release-ltoto measure the numberof LLVM lines emitted. For this particular program, this changes
reduces the number of LLVM lines by about 2,000.
For runtime performance, this PR introduces some new RFC 2822 printing
benchmarks. We compare against the
chronoandtimecrates, which alsoprovided RFC 2822 printers.
We started off better than
chrono, but quite a bit worse thantime.But with this PR, we're not only faster than
time, but our "createa new
Stringallocation" API is as fast (or a hair faster) thantime's "write into caller provided&mut String" API. Which is...somewhat surprising.
There are a few reasons, from my perspective, for the improvement here.
to be completely monomorphic. We're no longer generic over a
jiff::fmt::Writeimplementation internally, so there's no reason togenerate multiple copies.
with no code needing to handle expansion generates much tigher code
than what we had. Moreover, we specialize some forms of integer
printing which I think also helps.
uninitialized buffer on the stack and then copy that data to the
provided
jiff::fmt::Writeimplementation once printing is done.But even with this second write, the code is so much tigher with the
uninitialized buffer and the sizes so small, that this is still a net
win.
allocfeature is enabled, we willtry to get the
jiff::fmt::Writeimplementation as a&mut Vec<u8>.Then we can expand its capacity as needed and write directly into its
spare capacity instead of writing to a stack buffer and then copying it
to the
jiff::fmt::Writeimplementation generically.write_int_pad4thatspecializes integer formatting for values in the range
0..=9999. Thislets us do formatting with less work than would be needed to support a
non-padded generic implementation for any integer.
Another thing that maybe helps is that there are far fewer error branches
in the core printing code.
The main downside here is that we need to futz with uninitialized
memory which increases the risk of undefined behavior. I've added Miri
tests for the new
jiff::fmt::buffermodule to CI to help mitigatethis risk. Another mitigation is that the abstraction exposes an entirely
safe API.
Given the scale of the improvements here, I plan to continue using this
same technique in Jiff's other printers.