Faster implementation of `InputBuffer` by daniel-abramov · Pull Request #214 · snapview/tungstenite-rs

daniel-abramov · 2021-07-05T16:38:59Z

Fixes #210.

Please check the discussion and decision if you don't know the context as well as the latest PR in input_buffer.

It seems like the new simplified implementation has only significant difference on x86_64 architecture while the changes on Armv8.4-A A64 (Apple M1) are not that significant as both versions perform really well. On ARM the new version even behaves a bit slower (but not significantly slower), however the difference on Intel (x86_64) architecture is quite significant.

===== M1 Mac (Armv8.4-A A64) =====

buffers/InputBuffer     time:   [543.66 us 549.32 us 555.54 us]
                        thrpt:  [7.0315 GiB/s 7.1110 GiB/s 7.1851 GiB/s]

buffers/ReadBuffer      time:   [552.16 us 565.83 us 581.26 us]
                        thrpt:  [6.7203 GiB/s 6.9036 GiB/s 7.0745 GiB/s]


===== Intel (x86_64) =====

buffers/InputBuffer     time:   [4.7392 ms 4.7601 ms 4.7817 ms]
                        thrpt:  [836.52 MiB/s 840.31 MiB/s 844.03 MiB/s]

buffers/ReadBuffer      time:   [726.86 us 730.72 us 734.81 us]
                        thrpt:  [5.3160 GiB/s 5.3457 GiB/s 5.3741 GiB/s]

So it seems like migrating to the newer ReadBuffer makes sense as the 'average performance gain' is quite significant (and I believe most users are still using x86_64).

⚠️ Notes:

The PR also removes a couple of errors that could have been triggered by tungstenite-rs. These were removed as they could only occur if the incoming message does not fit into the hardcoded u32::max_value(). Currently we have max_frame_size and max_message_size, but we don't use them to cap the size of the input/read buffer (instead we compare the values once we read something). This in theory is not a problem when using tungstenite-rs with non-blocking sockets (like in case of tokio-tungstenite), though it could pose the issue for those ones who use blocking sockets. In other words, the previous reading with u32::max_value() did not really add any sort of "additional protection", so I removed them all together to not complicate the ReadBuffer implementation.
There is no automation to run criterion benchmarks with our CI. That's fine as criterion is not meant to be used like this and may give many false positives and false negatives when used like this. In order to automate benchmarking with CI we have to use a tool like lai that was written to be used for such purposes (it works differently than criterion), but I think it's better to make it as part of a separate PR.

P.S.: Thanks @qiujiangkun for bringing this topic and performance testing of tungstenite-rs 👍

We're also deprecating the usage of `input_buffer` crate, see: snapview/input_buffer#6 (comment)

agalakhov · 2021-07-05T18:15:48Z

Is there any size limit now? Is it possible to DoS by memory overfull?

daniel-abramov · 2021-07-06T14:58:17Z

Is there any size limit now? Is it possible to DoS by memory overfull?

Do you mean those points that I mentioned in (1)? If so, then I think there is no problem with it since the previous limit that we had on the InputBuffer was a hardcoded value of usize::max_value() which I believe could not even be triggered in any normal circumstances because of max_frame_size and max_message_size that were checked (and still checked) during the message assembly. Since we read chunks, the only way to trigger the error from the InputBuffer was user setting either of max_frame_size or max_message_size in the WebSocketConfig to the Some(usize::MAX). Another case when the limit could hit was the initial handshake. I don't think that the new implementation is worse than previous implementation in those particular cases since even if we get to the usize::MAX, the size of the underlying Vec will be at its maximum value (len is usize), pushing to such Vec would result in panic. But the old implementation did not handle it better since it tried to return an error by summing up and checking the usize values without taking the overflow into an account. Adding usize with some reserve would lead to the panic in the old implementation as well.

In other words, the new drop-in replacement is probably not perfect (since there is a way to get the panic), but it is at least not worse than the previous implementation. Please correct me if I'm wrong or missed something.

However while replying to your comment and checking the unit tests (trying to extend them) I noticed another thing: the new implementation had an issue that it did not clean up the data that is read by the cursor (which normally is not required anymore), so I pushed a new commit that implements a behavior similar to the previous remove_garbage stage in the InputBuffer and extended the unit test to account for that.

With that step implemented the new ReadBuffer seems to still be faster.

buffers/InputBuffer     time:   [3.1572 ms 3.1728 ms 3.1904 ms]                                 
                        thrpt:  [1.2244 GiB/s 1.2312 GiB/s 1.2373 GiB/s]

buffers/ReadBuffer      time:   [1.2585 ms 1.2659 ms 1.2737 ms]                                
                        thrpt:  [3.0668 GiB/s 3.0857 GiB/s 3.1038 GiB/s]

Faster implementation of `InputBuffer`

daniel-abramov added 2 commits July 5, 2021 18:38

Replace InputBuffer with a faster alternative

2787031

We're also deprecating the usage of `input_buffer` crate, see: snapview/input_buffer#6 (comment)

Implement benchmarks for buffers

ddddd79

daniel-abramov requested review from agalakhov, jxs and strohel July 5, 2021 16:39

daniel-abramov added dependencies Pull requests that update a dependency file enhancement labels Jul 5, 2021

This was referenced Jul 5, 2021

Bad performance with input_buffer > 3.0 #210

Closed

Update input_buffer requirement from 0.4.0 to 0.5.0 #213

Closed

Feature Request: support for AsyncRead and AsyncWrite snapview/input_buffer#8

Closed

Clean up the read portion of the buffer

83722c2

jxs approved these changes Jul 8, 2021

View reviewed changes

daniel-abramov merged commit c742916 into master Jul 8, 2021

danielvschoor mentioned this pull request Jul 12, 2021

Update tungstenite requirement from ^0.13.0 to ^0.14.0 UltraMachine/rust-sc2#23

Merged

daniel-abramov mentioned this pull request Aug 9, 2021

Suggestion: use a better data structure for InputBuffer #8

Closed

PhotonQuantum mentioned this pull request Oct 14, 2021

Move ReadBuffer chunk to heap #241

Merged

Gelbpunkt mentioned this pull request Oct 19, 2021

Update tokio-tungstenite to 0.15 twilight-rs/twilight#1201

Closed

daniel-abramov deleted the faster-input-buffer branch March 22, 2023 23:13

takumi-earth pushed a commit to earthlings-dev/tungstenite-rs that referenced this pull request Jan 27, 2026

Merge pull request snapview#214 from snapview/faster-input-buffer

5e8a5e8

Faster implementation of `InputBuffer`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster implementation of `InputBuffer`#214

Faster implementation of `InputBuffer`#214
daniel-abramov merged 3 commits intomasterfrom
faster-input-buffer

daniel-abramov commented Jul 5, 2021 •

edited

Loading

Uh oh!

agalakhov commented Jul 5, 2021

Uh oh!

daniel-abramov commented Jul 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

daniel-abramov commented Jul 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agalakhov commented Jul 5, 2021

Uh oh!

daniel-abramov commented Jul 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

daniel-abramov commented Jul 5, 2021 •

edited

Loading