Skip to content

Fix large message read performance by enforcing max read_buffer_size read chunks#496

Merged
daniel-abramov merged 3 commits intosnapview:masterfrom
alexheretic:fix-large-message-read-perf
May 23, 2025
Merged

Fix large message read performance by enforcing max read_buffer_size read chunks#496
daniel-abramov merged 3 commits intosnapview:masterfrom
alexheretic:fix-large-message-read-perf

Conversation

@alexheretic
Copy link
Copy Markdown
Contributor

@alexheretic alexheretic commented May 16, 2025

Non-breaking change to use read_buffer_size as a maximum read buffer size, so a large message will be read in chunks of, by default, 128KiB. This seems to significantly improve the performance of large messages.

See #493 (comment)

Resolves #493

Benchmarks

Using the #493 provided benches this fix addresses the regression and indeed provides better performance for all sizes than 0.24.

# 0.24
>>> 1.0 MB took ~1.993457ms with speed ~501.6 MB/s
>>> 3.0 MB took ~4.074073ms with speed ~736.4 MB/s
>>> 6.0 MB took ~5.40309ms with speed ~1.1 GB/s
>>> 10.0 MB took ~7.491196ms with speed ~1.3 GB/s
>>> 20.0 MB took ~16.038946ms with speed ~1.2 GB/s
>>> 30.0 MB took ~24.521725ms with speed ~1.2 GB/s
>>> 40.0 MB took ~37.509191ms with speed ~1.0 GB/s
>>> 50.0 MB took ~45.244393ms with speed ~1.1 GB/s
>>> 60.0 MB took ~53.531487ms with speed ~1.1 GB/s
>>> 100.0 MB took ~102.38638ms with speed ~976.7 MB/s
>>> 1000.0 MB took ~3.816925448s with speed ~262.0 MB/s

# 0.26.2
>>> 1.0 MB took ~1.402537ms with speed ~713.0 MB/s
>>> 3.0 MB took ~2.322722ms with speed ~1.3 GB/s
>>> 6.0 MB took ~3.837006ms with speed ~1.5 GB/s
>>> 10.0 MB took ~3.960994ms with speed ~2.5 GB/s
>>> 20.0 MB took ~31.777352ms with speed ~629.4 MB/s
>>> 30.0 MB took ~72.20983ms with speed ~415.5 MB/s
>>> 40.0 MB took ~95.672327ms with speed ~418.1 MB/s
>>> 50.0 MB took ~177.217286ms with speed ~282.1 MB/s
>>> 60.0 MB took ~261.481951ms with speed ~229.5 MB/s
>>> 100.0 MB took ~647.759606ms with speed ~154.4 MB/s
>>> 1000.0 MB took ~45.621549647s with speed ~21.9 MB/s

# PR
>>> 1.0 MB took ~550.883µs with speed ~1.8 GB/s
>>> 3.0 MB took ~1.555067ms with speed ~1.9 GB/s
>>> 6.0 MB took ~1.648007ms with speed ~3.6 GB/s
>>> 10.0 MB took ~2.242798ms with speed ~4.4 GB/s
>>> 20.0 MB took ~5.894016ms with speed ~3.3 GB/s
>>> 30.0 MB took ~11.865491ms with speed ~2.5 GB/s
>>> 40.0 MB took ~12.758398ms with speed ~3.1 GB/s
>>> 50.0 MB took ~21.80129ms with speed ~2.2 GB/s
>>> 60.0 MB took ~25.910093ms with speed ~2.3 GB/s
>>> 100.0 MB took ~44.506567ms with speed ~2.2 GB/s
>>> 1000.0 MB took ~635.858695ms with speed ~1.5 GB/s

I think we should add some high quality benches into this repo so we can track general performance better with a real Read impl. The current read benches are quite specific to batched small writes/reads. If I get time I'd like to add some as a follow up.

@alexheretic alexheretic force-pushed the fix-large-message-read-perf branch from b9f2acc to 58f6d5a Compare May 16, 2025 16:36
Comment thread src/protocol/frame/frame.rs Outdated
Comment thread src/protocol/frame/mod.rs
Co-authored-by: Daniel Abramov <inetcrack2@gmail.com>
@alexheretic
Copy link
Copy Markdown
Contributor Author

alexheretic commented May 18, 2025

With the proposed e2e benchmarks in #497 we also see the performance addressed there:

send+recv/512 B         time:   [13.378 µs 13.446 µs 13.464 µs]
                        thrpt:  [72.534 MiB/s 72.626 MiB/s 72.998 MiB/s]
                 change:
                        time:   [+0.4288% +0.9211% +1.4151%] (p = 0.14 > 0.05)
                        thrpt:  [−1.3954% −0.9127% −0.4269%]
                        No change in performance detected.

send+recv/4 KiB         time:   [15.385 µs 15.409 µs 15.504 µs]
                        thrpt:  [503.89 MiB/s 507.02 MiB/s 507.81 MiB/s]
                 change:
                        time:   [−0.8401% +0.6998% +2.2758%] (p = 0.73 > 0.05)
                        thrpt:  [−2.2252% −0.6949% +0.8473%]
                        No change in performance detected.

send+recv/32 KiB        time:   [28.154 µs 28.175 µs 28.257 µs]
                        thrpt:  [2.1600 GiB/s 2.1663 GiB/s 2.1679 GiB/s]
                 change:
                        time:   [−9.1622% −8.8444% −8.5256%] (p = 0.07 > 0.05)
                        thrpt:  [+9.3202% +9.7026% +10.086%]
                        No change in performance detected.

send+recv/256 KiB       time:   [114.37 µs 115.23 µs 118.65 µs]
                        thrpt:  [4.1155 GiB/s 4.2376 GiB/s 4.2692 GiB/s]
                 change:
                        time:   [−11.231% −9.0861% −6.9179%] (p = 0.05 < 0.05)
                        thrpt:  [+7.4321% +9.9942% +12.652%]
                        Performance has improved.

send+recv/2 MiB         time:   [944.88 µs 947.06 µs 955.79 µs]
                        thrpt:  [4.0869 GiB/s 4.1246 GiB/s 4.1341 GiB/s]
                 change:
                        time:   [−13.055% −11.016% −8.9031%] (p = 0.05 > 0.05)
                        thrpt:  [+9.7732% +12.379% +15.015%]
                        No change in performance detected.

send+recv/16 MiB        time:   [12.886 ms 12.943 ms 13.172 ms]
                        thrpt:  [2.3724 GiB/s 2.4144 GiB/s 2.4251 GiB/s]
                 change:
                        time:   [−55.081% −54.294% −53.496%] (p = 0.07 > 0.05)
                        thrpt:  [+115.04% +118.79% +122.62%]
                        No change in performance detected.

send+recv/128 MiB       time:   [166.54 ms 172.08 ms 173.47 ms]
                        thrpt:  [1.4412 GiB/s 1.4528 GiB/s 1.5012 GiB/s]
                 change:
                        time:   [−83.027% −82.534% −82.033%] (p = 0.07 > 0.05)
                        thrpt:  [+456.57% +472.53% +489.16%]
                        No change in performance detected.

send+recv/1 GiB         time:   [1.1908 s 1.1914 s 1.1937 s]
                        thrpt:  [1.6755 GiB/s 1.6787 GiB/s 1.6795 GiB/s]
                 change:
                        time:   [−96.623% −96.619% −96.615%] (p = 0.00 < 0.05)
                        thrpt:  [+2854.1% +2857.7% +2861.3%]
                        Performance has improved.

You have to love that criterion performance improvement detection logic 😅

time: [−83.027% −82.534% −82.033%] (p = 0.07 > 0.05)
thrpt: [+456.57% +472.53% +489.16%]
No change in performance detected.

@XavDesbordes
Copy link
Copy Markdown

Basically, it should dramatically close the performance gap with fastwebsockets, which is good

@alexheretic
Copy link
Copy Markdown
Contributor Author

alexheretic commented May 18, 2025

critcmp using #497 e2e benches

group                0.24                                   master                                 #496
-----                ----                                   ------                                 ----
send+recv/1 GiB      1.45  1757.1±36.73ms  1165.6 MB/sec    29.06     35.2±0.00s    58.1 MB/sec    1.00   1212.3±2.19ms  1689.4 MB/sec
send+recv/128 MiB    1.29    231.0±0.01ms  1108.1 MB/sec    5.66  1012.8±65.35ms   252.8 MB/sec    1.00    179.0±0.85ms  1429.9 MB/sec
send+recv/16 MiB     1.76     22.4±0.48ms  1428.0 MB/sec    2.39     30.4±0.63ms  1052.6 MB/sec    1.00     12.7±0.32ms     2.5 GB/sec
send+recv/2 MiB      2.35      2.2±0.06ms  1819.0 MB/sec    1.01   948.1±23.03µs     4.1 GB/sec    1.00   935.2±21.91µs     4.2 GB/sec
send+recv/256 KiB    1.46    168.7±1.08µs     2.9 GB/sec    1.09    125.7±2.52µs     3.9 GB/sec    1.00    115.2±2.30µs     4.2 GB/sec
send+recv/32 KiB     1.24     36.4±0.01µs  1716.2 MB/sec    1.01     29.8±0.61µs     2.1 GB/sec    1.00     29.3±0.50µs     2.1 GB/sec
send+recv/4 KiB      1.14     17.6±0.33µs   445.1 MB/sec    1.05     16.2±0.25µs   483.4 MB/sec    1.00     15.4±0.02µs   506.5 MB/sec
send+recv/512 B      1.02     13.6±0.26µs    72.0 MB/sec    1.02     13.6±0.51µs    72.0 MB/sec    1.00     13.3±0.24µs    73.4 MB/sec

@XavDesbordes
Copy link
Copy Markdown

XavDesbordes commented May 18, 2025

That's crazy! well done Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance regression (compared to 0.24.0) when reading large (100 MB+) messages in 0.25.0+

3 participants