[Performance] Improving speed by splitting kernels by shibatch · Pull Request #97 · shibatch/sleef

shibatch · 2017-10-20T09:46:09Z

With this patch, kernels can be splitted into odd and even parts in order to make computation faster.

If the processor is powerful enough, these two parts can be executed in parallel by out-of-order execution.
However, there is some overhead that is required to split the original kernel into two.
The degree of speed-up depends on the micro architecture. It may be slower on some micro architectures.
This feature is enabled by SPLIT_KERNEL macro, and currently, it is enabled only for AVX2 and AVX512F.
This change affects computation speed of many functions, but for example,
the ratio of execution time between the previous version and the new version of asind4_u35 is 2.08.

…rder to make computation faster. If the processor is powerful enough, these two parts can be executed in parallel by out-of-order execution. However, there is some overhead that is required to split the original kernel into two. The degree of speed-up depends on the micro architecture, and the split kernels may be slower on some micro architectures. This feature is enabled by SPLIT_KERNEL macro, and currently, it is enabled only for AVX2 and AVX512F. This change affects computation speed of many functions, but for example, the ratio of execution time between the previous version and the new version of asind4_u35 is 2.08.

…e cases.

fpetrogalli

Please cherry pick this change into cmake-transition.

Please squash the commits into a single commit.

With this patch, kernels can be splitted into odd and even parts in order to make computation faster. If the processor is powerful enough, these two parts can be executed in parallel by out-of-order execution. However, there is some overhead that is required to split the original kernel into two. The degree of speed-up depends on the micro architecture, and the split kernels may be slower on some micro architectures. This feature is enabled by SPLIT_KERNEL macro, and currently, it is enabled only for AVX2 and AVX512F. This change affects computation speed of many functions, but for example, the ratio of execution time between the previous version and the new version of asind4_u35 is 2.08.

shibatch added 3 commits October 20, 2017 17:07

Fixed denormal handling in xtan.

fc0516c

Fixed problems which caused error to exceed the designed limit in rar…

f52f024

…e cases.

fpetrogalli approved these changes Oct 31, 2017

View reviewed changes

shibatch merged commit 18bd3ab into master Nov 1, 2017

shibatch deleted the Improving_speed_by_splitting_kernels branch December 4, 2017 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Improving speed by splitting kernels#97

[Performance] Improving speed by splitting kernels#97
shibatch merged 3 commits intomasterfrom
Improving_speed_by_splitting_kernels

shibatch commented Oct 20, 2017

Uh oh!

fpetrogalli left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shibatch commented Oct 20, 2017

Uh oh!

fpetrogalli left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants