Implement wrapper type for warp masks#2617
Implement wrapper type for warp masks#2617sbaldu wants to merge 2 commits intoalpaka-group:developfrom
Conversation
| # else | ||
| -> std::uint64_t | ||
| # endif | ||
| std::int32_t predicate) -> MaskType<WarpUniformCudaHipBuiltIn>::type |
There was a problem hiding this comment.
can this be
| std::int32_t predicate) -> MaskType<WarpUniformCudaHipBuiltIn>::type | |
| std::int32_t predicate) -> Mask<WarpUniformCudaHipBuiltIn> |
?
|
|
||
| namespace trait | ||
| { | ||
| template<> |
There was a problem hiding this comment.
I am mildly concerned that this may introduce an ODR violation.
Do you think the warp type could be templated on TApi, and that could be used to distinguish the warp size ?
|
Can you make the mask type a nested type of the warp itself ? Being able to write |
|
Also, could you update the tests and example to make use of this type ? |
| template<typename TDim> | ||
| struct MaskType<WarpGenericSycl<TDim>> | ||
| { | ||
| using type = std::uint32_t; |
There was a problem hiding this comment.
The mask type depends on the architecture you are targeting. If you target AMD it is 64bit because the warp/wave size is 64. The reason why it is 64 is that the OneApi plugin is using ROCm as connector to AMD hardware and in ROCm the mask size is devied as 64bit data type, IMO even if the warp size is 32.
There was a problem hiding this comment.
In alpaka3 we depend on the warp size but in alpaka mainline the warp size is not compile-time.
Maybe we can specialize the type on the accelerator type.
There was a problem hiding this comment.
Yes, oneAPI uses uint64_t as the underlying type for the mask:
struct sub_group_mask {
friend class sycl::detail::Builder;
using BitsType = uint64_t;
...As you suggest we could use different mask types for different oneAPI back-ends:
- 32 bit for NVIDIA GPUs
- 64 bit for AMD GPUs
- 32 bit for Intel GPUs (?)
- 64 bit for CPUs (though last time I checked a subgroup size of 64 was broken, but it's been a while)
- no idea for Intel/Altera FPGAs, probably 32 bit
or just 64 bits for all oneAPI back-ends if it's simpler and does not have any significant impact on the performance.
There was a problem hiding this comment.
IMO the version which uses different sizes per device type makes sense. Using always 64bit means one register overhead on systems with 32 as warp size.
I am sick and will not block the pr
|
@sbaldu There is a |
This PR addresses issue #2615 from @fwyzard, implementing a wrapper type
alpaka::warp::Maskaround the type returned byactiveMaskandballot, providing the same interface for all the backends.