Skip to content

Implementation of masked functions using native masked intrinsic functions #142

@shibatch

Description

@shibatch

I made a pull request for implementing masked functions by combining the current unmasked functions and a selection(blending) function.

#139

However, there is a concern on the performance of this implementation, since the ALUs for unused elements are all active. This could lead to increased power consumption and generated heat by the computer. It is considered better to implement the masked functions in such a way that they utilize native masked intrinsic functions.

My plan is to approve the above PR, and after the release of version 3.2, we will start implementing masked functions using native masked intrinsic functions in the following way.

All existing FP functions in the helper files will be converted to masked functions. For example,

vdouble vadd_vd_vd_vd_vo(vdouble x, vdouble y, vopmask m) {
  return vaddq_f64(x, y);
}

for an unmasked intrinsic function, and

vdouble vadd_vd_vd_vd_vo(vdouble x, vdouble y, vopmask m) {
  return svadd_f64_x(m, x, y);
}

for a masked intrinsic function.

Then, the implementation of each math function would be like the following.

static const inline vdouble xsin(vdouble arg, vopmask mask) { ... }

EXPORT const vdouble Sleef_sindX_u35YYY(vdouble arg) {
  return xsin(arg, SLEEF_OPMASK_ALLONE);
}

EXPORT const vdouble Sleef_mask_sindX_u35YYY(vdouble arg, vopmask mask) {
  return xsin(arg, mask);
}

The mask argument is assumed to be optimized away if it is not used.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions