Bufferless True Peak-analysis#36
Conversation
Some long-running quickcheck-runs showed that the difference between f64 and f32 calculations can be as high as 0.00000386. Increase the allowed error-margin to avoid spurious failures
fa7cc07 to
134751c
Compare
Allows rapidly iterating the sample-buffers, one dasp::Frame at a time
739007a to
21a1ed6
Compare
|
Thanks, this seems great. I'll take a proper look this weekend :) |
3a3fc20 to
7ed48ec
Compare
|
Is it ready for review now? I saw you fixed up/improved various things in the meantime :) |
|
Fair point. :) I'll look into it.
Den lör 7 nov. 2020 kl 15:56 skrev Sebastian Dröge <notifications@github.com
…:
***@***.**** commented on this pull request.
------------------------------
In src/interp.rs
<#36 (comment)>:
> - let imp: Box<dyn Interpolator> = match (taps, factor, channels) {
- (49, 2, 1) => Box::new(specialized::Interp2F::<[f32; 1]>::new()),
- (49, 2, 2) => Box::new(specialized::Interp2F::<[f32; 2]>::new()),
- (49, 2, 4) => Box::new(specialized::Interp2F::<[f32; 4]>::new()),
- (49, 2, 6) => Box::new(specialized::Interp2F::<[f32; 6]>::new()),
- (49, 2, 8) => Box::new(specialized::Interp2F::<[f32; 8]>::new()),
- (49, 4, 1) => Box::new(specialized::Interp4F::<[f32; 1]>::new()),
- (49, 4, 2) => Box::new(specialized::Interp4F::<[f32; 2]>::new()),
- (49, 4, 4) => Box::new(specialized::Interp4F::<[f32; 4]>::new()),
- (49, 4, 6) => Box::new(specialized::Interp4F::<[f32; 6]>::new()),
- (49, 4, 8) => Box::new(specialized::Interp4F::<[f32; 8]>::new()),
- (taps, factor, channels) => Box::new(generic::Interp::new(taps, factor, channels)),
- };
- Self(imp)
+ pub fn new(_taps: usize, _factor: usize, _channels: u32) -> Self {
+ unimplemented!()
This suggests that this commit should be squashed with another one :) This
alone doesn't seem runnable as-is.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#36 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADLXCEWE7724P6UJQBMDTSOVNZBANCNFSM4TKQV5ZQ>
.
|
|
I'd say it's ready for review. I'm still looking for ways to improve performance further, but this is good to merge as-is (I'll look into the squashing-topic though) |
sdroege
left a comment
There was a problem hiding this comment.
Generally looks good to me, thanks a lot :)
Can you add some details to the last commit with the rolling buffer about what kind of optimizations this allows, i.e. what you saw happening in practice here? I assume it simply allows auto-vectorization to kick in at all for this code or is there more to it?
7ed48ec to
cbd916f
Compare
|
You're welcome. Thanks for all the other code I did not have to write :) I think all the feedback is addressed now. Please have a look again. |
- Split interp::Frame into utils::FrameAcc based on dasp::Frame and utils::Samples::foreach_frame - Push incoming frame:s directly onto the interpolator, one at a time, and check sample-max on resulting frames immediately. This removes the need for input and output-buffering. - Cleanup the unused parts of interp.rs
Save samples with shadow-buffering to enable continous fixed-length view into the buffer. For any offset, there will be a correct continous view of the entire circular buffer. This turns the inner loop of filter application from N*4 + M*4, into a predictable 12*4 operation. This avoids some branching, and gives the LLVM optimizer better information to work with. (For example, allowing 512-bit operations)
cbd916f to
7812ea0
Compare
|
You forgot to update |
This is the second of the two TruePeak analysis optimizations. The key optimization here, is avoiding extra memory-copying by not keeping input and output from the upsamling. Every new input-frame is fed immediately to the interpolator, generating 2 or 4 new frames which are immediately checked for new max before being discarded.
The net gain according to my benchmark:
As a nice bonus, it also cleans up a lot of code from the previous step of optimization, causing a significant net reduction of code.