Skip to content

Implement wrapper type for warp masks#2617

Open
sbaldu wants to merge 2 commits intoalpaka-group:developfrom
sbaldu:feature/warp-mask-wrapper
Open

Implement wrapper type for warp masks#2617
sbaldu wants to merge 2 commits intoalpaka-group:developfrom
sbaldu:feature/warp-mask-wrapper

Conversation

@sbaldu
Copy link
Copy Markdown
Contributor

@sbaldu sbaldu commented Mar 13, 2026

This PR addresses issue #2615 from @fwyzard, implementing a wrapper type alpaka::warp::Mask around the type returned by activeMask and ballot, providing the same interface for all the backends.

# else
-> std::uint64_t
# endif
std::int32_t predicate) -> MaskType<WarpUniformCudaHipBuiltIn>::type
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be

Suggested change
std::int32_t predicate) -> MaskType<WarpUniformCudaHipBuiltIn>::type
std::int32_t predicate) -> Mask<WarpUniformCudaHipBuiltIn>

?


namespace trait
{
template<>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mildly concerned that this may introduce an ODR violation.
Do you think the warp type could be templated on TApi, and that could be used to distinguish the warp size ?

@fwyzard
Copy link
Copy Markdown
Contributor

fwyzard commented Mar 16, 2026

Can you make the mask type a nested type of the warp itself ?

Being able to write Warp::mask_t is simpler than alpaka::warp::Mask<Warp>.

@fwyzard
Copy link
Copy Markdown
Contributor

fwyzard commented Mar 16, 2026

Also, could you update the tests and example to make use of this type ?

Comment thread include/alpaka/warp/WarpGenericSycl.hpp Outdated
template<typename TDim>
struct MaskType<WarpGenericSycl<TDim>>
{
using type = std::uint32_t;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mask type depends on the architecture you are targeting. If you target AMD it is 64bit because the warp/wave size is 64. The reason why it is 64 is that the OneApi plugin is using ROCm as connector to AMD hardware and in ROCm the mask size is devied as 64bit data type, IMO even if the warp size is 32.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In alpaka3 we depend on the warp size but in alpaka mainline the warp size is not compile-time.
Maybe we can specialize the type on the accelerator type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, oneAPI uses uint64_t as the underlying type for the mask:

struct sub_group_mask {
  friend class sycl::detail::Builder;
  using BitsType = uint64_t;
...

As you suggest we could use different mask types for different oneAPI back-ends:

  • 32 bit for NVIDIA GPUs
  • 64 bit for AMD GPUs
  • 32 bit for Intel GPUs (?)
  • 64 bit for CPUs (though last time I checked a subgroup size of 64 was broken, but it's been a while)
  • no idea for Intel/Altera FPGAs, probably 32 bit

or just 64 bits for all oneAPI back-ends if it's simpler and does not have any significant impact on the performance.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the version which uses different sizes per device type makes sense. Using always 64bit means one register overhead on systems with 32 as warp size.

@fwyzard fwyzard self-requested a review April 14, 2026 08:31
@psychocoderHPC psychocoderHPC dismissed their stale review April 14, 2026 10:35

I am sick and will not block the pr

@SimeonEhrig
Copy link
Copy Markdown
Member

@sbaldu There is a typename missing in the code: https://gitlab.com/hzdr/crp/alpaka/-/jobs/13914947238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants