Skip to content

Commit 3f80098

Browse files
committed
[Performance] Improving speed by splitting kernels (#97)
With this patch, kernels can be splitted into odd and even parts in order to make computation faster. If the processor is powerful enough, these two parts can be executed in parallel by out-of-order execution. However, there is some overhead that is required to split the original kernel into two. The degree of speed-up depends on the micro architecture, and the split kernels may be slower on some micro architectures. This feature is enabled by SPLIT_KERNEL macro, and currently, it is enabled only for AVX2 and AVX512F. This change affects computation speed of many functions, but for example, the ratio of execution time between the previous version and the new version of asind4_u35 is 2.08.
1 parent da68d0d commit 3f80098

File tree

5 files changed

+292
-14
lines changed

5 files changed

+292
-14
lines changed

src/arch/helperavx2.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
#define ENABLE_FMA_SP
2525

2626
#define FULL_FP_ROUNDING
27+
#define SPLIT_KERNEL
2728

2829
#if defined(_MSC_VER)
2930
#include <intrin.h>

src/arch/helperavx2_128.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
#define ENABLE_FMA_SP
2525

2626
#define FULL_FP_ROUNDING
27+
#define SPLIT_KERNEL
2728

2829
#if defined(_MSC_VER)
2930
#include <intrin.h>

src/arch/helperavx512f.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
#define ENABLE_FMA_SP
2525

2626
#define FULL_FP_ROUNDING
27+
#define SPLIT_KERNEL
2728

2829
#if defined(_MSC_VER)
2930
#include <intrin.h>

0 commit comments

Comments
 (0)