Skip to content

[WIP]Enable assymetric heads for D_qk and D_v for micro kernel sdpa#5301

Open
h-sadia wants to merge 5 commits into
mainfrom
hsadia/assym_heads_sdpa
Open

[WIP]Enable assymetric heads for D_qk and D_v for micro kernel sdpa#5301
h-sadia wants to merge 5 commits into
mainfrom
hsadia/assym_heads_sdpa

Conversation

@h-sadia

@h-sadia h-sadia commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Description

Fused micro kernel doesn't allow different head sizes of QK and V tensors. This PR focuses on enabling that as per this request here: https://jira.devtools.intel.com/browse/MFDNN-14385

N.B: Will continue testing it with the tests present in Graph API and extend testing starting ww25.2

Fixes # (github issue)

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

@h-sadia h-sadia requested review from a team as code owners June 11, 2026 21:04
@github-actions github-actions Bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel component:tests Codeowner: @oneapi-src/onednn-arch labels Jun 11, 2026
@h-sadia h-sadia force-pushed the hsadia/assym_heads_sdpa branch from 4a844b2 to 0fa2486 Compare June 11, 2026 21:39
@h-sadia h-sadia force-pushed the hsadia/assym_heads_sdpa branch from f3a67be to 943a4cc Compare June 11, 2026 21:43
@h-sadia h-sadia changed the title [WIP] Enable assymetric heads for D_qk and D_v for micro kernel sdpa Enable assymetric heads for D_qk and D_v for micro kernel sdpa Jun 11, 2026
@h-sadia h-sadia force-pushed the hsadia/assym_heads_sdpa branch from 943a4cc to f3a67be Compare June 11, 2026 21:47
@h-sadia h-sadia changed the title Enable assymetric heads for D_qk and D_v for micro kernel sdpa [WIP]Enable assymetric heads for D_qk and D_v for micro kernel sdpa Jun 11, 2026
# f16 inputs + f32 intermediates + f16 outputs
--reset --op-kind=1:Multiply,1:Divide --case=complex_fusion/mha/sdpa-plain-simplified-f16-f32.json
# Asymmetric heads: Q/K head_size=64, V head_size=128
--reset --op-kind=1:Multiply,1:Divide --case=complex_fusion/mha/sdpa-plain-asymm-heads-f16-f32.json

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have test cases to cover d_qk != d_v.

# d_qk != d_v
--reset --in-shapes=8:1x16x384x32,8:1x16x384x64,8:1x16x384x128 --case=complex_fusion/mha/sdpa-plain-simplified-f32.json
--reset --in-shapes=3:1x16x384x32,3:1x16x384x64,3:1x16x384x128 --case=complex_fusion/mha/sdpa-plain-simplified-f16-f32.json
--reset --in-shapes=3:1x16x384x32,3:1x16x384x64,3:1x16x384x128 --case=complex_fusion/mha/sdpa-plain-implicit-causal-mask-fp32-bs1.json
--reset --in-shapes=24:1x16x384x32,24:1x16x384x64,24:1x16x384x128 --case=complex_fusion/mha/sdpa-plain-bottom-right-implicit-causal-mask-f16-f32.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:tests Codeowner: @oneapi-src/onednn-arch platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants