Skip to content

Speed up DangerousGetRowSpan(y)#1901

Merged
antonfirsov merged 13 commits into
masterfrom
af/faster-getrowspan
Dec 20, 2021
Merged

Speed up DangerousGetRowSpan(y)#1901
antonfirsov merged 13 commits into
masterfrom
af/faster-getrowspan

Conversation

@antonfirsov

Copy link
Copy Markdown
Member

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

Fixes #1884 by caching span data for known IMemoryOwner implementations backing MemoryGroup.

Additionally, it moves SwapOrCopyContents responsibility entirely to Buffer2D. Originally I did this because I wanted multiple MemoryGroup.Owned subclasses to deal with different GetRowSpanCoreUnsafe implementations, but in the end I decided to switch on an enum for simplicity & better perf. Regardless, it's cleaner to swap the memory groups, backing Buffer2D instead of swapping their contents.

Benchmarks - DangerousGetRowSpan

Before

|              Method | SizeMegaBytes |     Mean |    Error |   StdDev |
|-------------------- |-------------- |---------:|---------:|---------:|
| DangerousGetRowSpan |           0.5 | 75.03 ns | 1.511 ns | 2.352 ns |
| DangerousGetRowSpan |             2 | 72.42 ns | 1.456 ns | 1.618 ns |
| DangerousGetRowSpan |            10 | 73.64 ns | 1.370 ns | 2.640 ns |

After

|              Method | SizeMegaBytes |      Mean |     Error |    StdDev |
|-------------------- |-------------- |----------:|----------:|----------:|
| DangerousGetRowSpan |           0.5 |  8.017 ns | 0.1790 ns | 0.2131 ns |
| DangerousGetRowSpan |             2 |  6.445 ns | 0.1479 ns | 0.1644 ns |
| DangerousGetRowSpan |            10 | 46.089 ns | 0.9179 ns | 0.9015 ns |

Benchmarks - DecodeJpeg

Before

|                              Method |      Mean |     Error |    StdDev |
|------------------------------------ |----------:|----------:|----------:|
|        'Baseline 4:4:4 Interleaved' | 15.814 ms | 0.2987 ms | 0.2794 ms |
|        'Baseline 4:2:0 Interleaved' | 10.336 ms | 0.2018 ms | 0.2243 ms |
|        'Baseline 4:0:0 (grayscale)' |  2.044 ms | 0.0246 ms | 0.0230 ms |
| 'Progressive 4:2:0 Non-Interleaved' | 13.868 ms | 0.2624 ms | 0.2455 ms |

After

|                              Method |      Mean |     Error |    StdDev |
|------------------------------------ |----------:|----------:|----------:|
|        'Baseline 4:4:4 Interleaved' | 12.682 ms | 0.2425 ms | 0.2695 ms |
|        'Baseline 4:2:0 Interleaved' |  9.241 ms | 0.1263 ms | 0.0986 ms |
|        'Baseline 4:0:0 (grayscale)' |  1.677 ms | 0.0136 ms | 0.0107 ms |
| 'Progressive 4:2:0 Non-Interleaved' | 13.639 ms | 0.2601 ms | 0.3289 ms |

/cc @br3aker

Comment on lines +973 to 974
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsOutOfRange(int value, int min, int max)

@antonfirsov antonfirsov Dec 20, 2021

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a leftover from experiments on a similar helper method that deals with (uint)x >= (uint)this.Height, which made things slower actually, and got removed. I'm afraid the helper method results in suboptimal codegen compared to inline condition (moves around a bool on the stack?).

/cc @gfoidl

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends if the helper is fed with constant arguments (like in the other PR with constant values for min and max) or if these values are non-constant, like here with this.Height.

moves around a bool on the stack?

IMO this would be missed optimization from the JIT. A short method (which should be inlined by itself) + decorated with AggressiveInlining (so inlined for pretty sure), so the codegen should be equal.
But: sometimes -- so far I couldn't figure out when exactly -- the JIT won't optimize as expected when this is concerned, as there's some special checks for this == null which get sometimes removed, sometimes not. I expect this what is hit here.

@codecov

codecov Bot commented Dec 20, 2021

Copy link
Copy Markdown

Codecov Report

Merging #1901 (0d3bd57) into master (39f37d0) will increase coverage by 0%.
The diff coverage is 96%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1901   +/-   ##
======================================
  Coverage      87%     87%           
======================================
  Files         960     961    +1     
  Lines       50979   50991   +12     
  Branches     6312    6319    +7     
======================================
+ Hits        44736   44748   +12     
  Misses       5202    5202           
  Partials     1041    1041           
Flag Coverage Δ
unittests 87% <96%> (+<1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ImageSharp/Common/Helpers/Numerics.cs 97% <ø> (ø)
...Allocators/Internals/UniformUnmanagedMemoryPool.cs 82% <ø> (ø)
...y/Allocators/Internals/SharedArrayPoolBuffer{T}.cs 86% <87%> (+<1%) ⬆️
...harp/Memory/DiscontiguousBuffers/MemoryGroup{T}.cs 83% <91%> (+<1%) ⬆️
...eg/Components/Decoder/SpectralConverter{TPixel}.cs 100% <100%> (ø)
src/ImageSharp/ImageFrame{TPixel}.cs 91% <100%> (ø)
src/ImageSharp/Image{TPixel}.cs 95% <100%> (ø)
.../Memory/Allocators/Internals/UnmanagedBuffer{T}.cs 88% <100%> (ø)
src/ImageSharp/Memory/Buffer2D{T}.cs 100% <100%> (ø)
...mory/DiscontiguousBuffers/MemoryGroupExtensions.cs 90% <100%> (+1%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39f37d0...0d3bd57. Read the comment docs.

@br3aker

br3aker commented Dec 20, 2021

Copy link
Copy Markdown
Contributor

Very nice, extra thanks for the benchmarks!

@JimBobSquarePants JimBobSquarePants left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Well done! 👍

Comment on lines +292 to +293
int bufferIdx = (int)(start / this.BufferLength);
int bufferStart = (int)(start % this.BufferLength);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use Math.DivRem to do this in one operation?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codegen seems to be worse for Int64.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you look at the x86 disassembly, then it's expected -- sharplab sometime bites me too.
Looking at x64 it's better.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-buffer case went from 45ns to 38ns! (0d3bd57)

Comment on lines +317 to +318
bufferIdx = (int)(start / this.BufferLength);
bufferStart = (int)(start % this.BufferLength);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@antonfirsov antonfirsov merged commit 04f64b6 into master Dec 20, 2021
@antonfirsov antonfirsov deleted the af/faster-getrowspan branch December 20, 2021 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Substantial Jpeg performance degradation on master branch as of 09.12.2021

4 participants