I made a pull request for implementing masked functions by combining the current unmasked functions and a selection(blending) function.
#139
However, there is a concern on the performance of this implementation, since the ALUs for unused elements are all active. This could lead to increased power consumption and generated heat by the computer. It is considered better to implement the masked functions in such a way that they utilize native masked intrinsic functions.
My plan is to approve the above PR, and after the release of version 3.2, we will start implementing masked functions using native masked intrinsic functions in the following way.
All existing FP functions in the helper files will be converted to masked functions. For example,
vdouble vadd_vd_vd_vd_vo(vdouble x, vdouble y, vopmask m) {
return vaddq_f64(x, y);
}
for an unmasked intrinsic function, and
vdouble vadd_vd_vd_vd_vo(vdouble x, vdouble y, vopmask m) {
return svadd_f64_x(m, x, y);
}
for a masked intrinsic function.
Then, the implementation of each math function would be like the following.
static const inline vdouble xsin(vdouble arg, vopmask mask) { ... }
EXPORT const vdouble Sleef_sindX_u35YYY(vdouble arg) {
return xsin(arg, SLEEF_OPMASK_ALLONE);
}
EXPORT const vdouble Sleef_mask_sindX_u35YYY(vdouble arg, vopmask mask) {
return xsin(arg, mask);
}
The mask argument is assumed to be optimized away if it is not used.
I made a pull request for implementing masked functions by combining the current unmasked functions and a selection(blending) function.
#139
However, there is a concern on the performance of this implementation, since the ALUs for unused elements are all active. This could lead to increased power consumption and generated heat by the computer. It is considered better to implement the masked functions in such a way that they utilize native masked intrinsic functions.
My plan is to approve the above PR, and after the release of version 3.2, we will start implementing masked functions using native masked intrinsic functions in the following way.
All existing FP functions in the helper files will be converted to masked functions. For example,
for an unmasked intrinsic function, and
for a masked intrinsic function.
Then, the implementation of each math function would be like the following.
The mask argument is assumed to be optimized away if it is not used.