This is more of an FYI, but:
We were benchmarking fastrand against SmallRng, and getting some inconsistent results.
Surprisingly the difference is much larger on M1 than x86.
It turns out that on my machine (i7-10750H), when calling fill, this is auto-vectorized, and this is actually around 2x slower than the non-vectorized version that I got by adding std::hint::black_box(()); to the inner loop of the fill function.
More benchmarks would be needed to see if this is a win in general, but it may be something to consider.
This is more of an FYI, but:
We were benchmarking
fastrandagainstSmallRng, and getting some inconsistent results.Surprisingly the difference is much larger on M1 than x86.
It turns out that on my machine (i7-10750H), when calling
fill, this is auto-vectorized, and this is actually around 2x slower than the non-vectorized version that I got by addingstd::hint::black_box(());to the inner loop of thefillfunction.More benchmarks would be needed to see if this is a win in general, but it may be something to consider.