You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tighten DPP eligibility to require exact thread-count match and use arch DB for wave size
- Change canUseDPP condition from >= to == for blockSize vs
clusterSize * nonReductionDimSizeProduct to prevent potential
out-of-bounds LDS writes by extra threads when blockSize exceeds
the exact thread count needed for the DPP layout.
- Replace hard-coded chipset major version heuristic in
SubgroupReduceToDPP with rock::lookupArchInfo(chip).waveSize
for more robust subgroup size derivation.
- Update lowering_blockwise_broadcast_reduce test to use dimensions
where blockSize == clusterSize * nrDimProd (8 == 2 * 4).
0 commit comments