[AIROCMLIR-707] Add split_kv memory checks for attention by bogdan-petkovic · Pull Request #2343 · ROCm/rocMLIR

bogdan-petkovic · 2026-04-15T08:08:52Z

Motivation

Some attention configurations on gfx1201 with large split_kv values were consuming too much memory and failing late (OOM/timeout behavior).
This PR makes those cases fail early with a clear validation error, so bad configurations are rejected before expensive lowering and runtime execution.

Technical Details

Added support to read total GPU global memory from HIP for the target architecture.
In attention verification, added a split_kv extra-memory estimator for output and LSE buffers, using overflow-safe arithmetic.
The verifier now compares required extra bytes against a limit and rejects oversized configurations with a clear error message that prints both required and allowed bytes.

The limit is chosen in this order:

ROCMLIR_ATTENTION_SPLITKV_MAX_EXTRA_BYTES environment override.
A device-based limit computed from GPU memory (device memory divided by 8, with safety clamping).
A fallback default when device memory cannot be queried.

Test Plan

Build MLIRRockOps and rocmlir-gen.
Run a large split_kv attention case and verify early rejection with an explicit memory-limit message.
Re-run the same case with ROCMLIR_ATTENTION_SPLITKV_MAX_EXTRA_BYTES override and verify generation succeeds.
Replay the 4 reported failing attention configurations through the same parameter sweep pipeline (gen -> driver -> runner).

Test Result

Build status: PASS.

Replay of the 4 reported configurations:

3 configurations are now INVALID at rocmlir-gen stage with explicit splitKV memory-limit errors.
1 configuration passes end-to-end.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

Adds an early-validation guard for attention splitKV configurations to prevent late OOM/timeout failures by estimating extra temporary storage and rejecting oversized cases up front, with a user override via environment variable.

Changes:

Add overflow-safe splitKV extra-memory estimation in rock.attention verification and emit a clear validation error when exceeding a limit.
Add HIP-based device global-memory query helper to derive a dynamic default limit per target architecture (with clamping and fallback).
Minor: adjust bf16 attention sweep default -RMS_threshold; treat 0-byte ROCm allocations as a no-op.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
mlir/utils/performance/parameterSweeps.py	Changes default bf16 attention sweep RMS threshold.
mlir/lib/Dialect/Rock/IR/RockDialect.cpp	Implements splitKV extra-memory estimation + verifier limit/diagnostic.
mlir/lib/Dialect/Rock/IR/AmdArchDb.cpp	Adds HIP query to retrieve total GPU global memory for an arch.
mlir/include/mlir/Dialect/Rock/IR/AmdArchDb.h	Exposes the new device global-memory query API.
external/llvm-project/mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp	Makes 0-byte `mgpuMemAlloc` return `nullptr` early.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Made-with: Cursor

Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>

Made-with: Cursor

- Replace isa<> + cast<> pairs with dyn_cast<> in verifyCommonAttnGemmParameters - Replace std::unordered_map<std::string> cache with llvm::StringMap - Extract safeGlobalMemBytes helper to remove duplicated overflow-guard logic in lookupDeviceGlobalMemorySizeBytes Made-with: Cursor

Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>

- Use dyn_cast<AttentionOp>(op.getOperation()) instead of dyn_cast<AttentionOp>(op) since op is a RockGemmGemmWrapperInterface value, not a raw Operation* - Move safeGlobalMemBytes inside the #ifndef _WIN32 guard as a lambda to avoid referencing hipDeviceProp_t where HIP headers are not included Made-with: Cursor

bogdan-petkovic force-pushed the bogdan-petkovic/attn-splitkv-limit branch from dc38c7f to b328f8e Compare April 15, 2026 12:54

mirza-halilcevic requested review from Copilot, dhernandez0, dorde-antic and umangyadav April 16, 2026 08:06

Copilot started reviewing on behalf of mirza-halilcevic April 16, 2026 08:09 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread mlir/lib/Dialect/Rock/IR/RockDialect.cpp Outdated

Comment thread mlir/lib/Dialect/Rock/IR/RockDialect.cpp

Comment thread mlir/utils/performance/parameterSweeps.py Outdated

Comment thread mlir/lib/Dialect/Rock/IR/RockDialect.cpp Outdated

bogdan-petkovic force-pushed the bogdan-petkovic/attn-splitkv-limit branch from b328f8e to adc577d Compare April 16, 2026 10:35

bogdan-petkovic marked this pull request as ready for review April 20, 2026 10:41

bogdan-petkovic requested a review from causten as a code owner April 20, 2026 10:41

bogdan-petkovic changed the title ~~Add split_kv memory checks for attention~~ [AIROCMLIR-707] Add split_kv memory checks for attention Apr 21, 2026

bogdan-petkovic force-pushed the bogdan-petkovic/attn-splitkv-limit branch from d9d09c8 to 3473ab6 Compare April 23, 2026 08:54

bogdan-petkovic added 12 commits April 27, 2026 13:27

make splitKV guard device-aware

c56a58c

Made-with: Cursor

[EXTERNAL] Return nullptr for zero-byte mgpuMemAlloc

774e0fe

Made-with: Cursor

parameterSweeps: relax RMS for split_kv attention

1cf519b

parameterSweeps: apply yapf formatting

086ba2e

jenkins: temporarily isolate attention parameter sweep

024b8b7

Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>

cache splitKV extra-storage limit per arch in verifier

72cc362

Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>

add splitKV limit verifier and env-override lit tests

20a5bce

Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>

fix splitKV extra-storage verifier when LSE is absent

bdbbb49

Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>

relax RMS threshold due to failing config

eb95c76

Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>

Merge commit 'd00e8f81557b' into temp-split-external

fd44f72

jenkins: re-enable conv_structure and perf_config parameter sweeps

3af1e0e

Made-with: Cursor

bogdan-petkovic force-pushed the bogdan-petkovic/attn-splitkv-limit branch from 48c179b to d5ca8d5 Compare April 27, 2026 13:50

bogdan-petkovic and others added 5 commits April 27, 2026 14:45

re-enable parameter sweeps

362103f

Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>

Merge branch 'develop' into bogdan-petkovic/attn-splitkv-limit

5c62e91

Merge branch 'develop' into bogdan-petkovic/attn-splitkv-limit

7ffb2f0

Merge branch 'develop' into bogdan-petkovic/attn-splitkv-limit

057404f

mirza-halilcevic and others added 3 commits May 2, 2026 14:59

Merge branch 'develop' into bogdan-petkovic/attn-splitkv-limit

c2676bd

Merge branch 'develop' into bogdan-petkovic/attn-splitkv-limit

e11b49c

Merge branch 'develop' into bogdan-petkovic/attn-splitkv-limit

a0bd68a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIROCMLIR-707] Add split_kv memory checks for attention#2343

[AIROCMLIR-707] Add split_kv memory checks for attention#2343
bogdan-petkovic wants to merge 20 commits intoROCm:developfrom
bogdan-petkovic:bogdan-petkovic/attn-splitkv-limit

bogdan-petkovic commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bogdan-petkovic commented Apr 15, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants