Skip to content

[AIROCMLIR-707] Add split_kv memory checks for attention#2343

Open
bogdan-petkovic wants to merge 20 commits intoROCm:developfrom
bogdan-petkovic:bogdan-petkovic/attn-splitkv-limit
Open

[AIROCMLIR-707] Add split_kv memory checks for attention#2343
bogdan-petkovic wants to merge 20 commits intoROCm:developfrom
bogdan-petkovic:bogdan-petkovic/attn-splitkv-limit

Conversation

@bogdan-petkovic
Copy link
Copy Markdown
Contributor

Motivation

Some attention configurations on gfx1201 with large split_kv values were consuming too much memory and failing late (OOM/timeout behavior).
This PR makes those cases fail early with a clear validation error, so bad configurations are rejected before expensive lowering and runtime execution.

Technical Details

Added support to read total GPU global memory from HIP for the target architecture.
In attention verification, added a split_kv extra-memory estimator for output and LSE buffers, using overflow-safe arithmetic.
The verifier now compares required extra bytes against a limit and rejects oversized configurations with a clear error message that prints both required and allowed bytes.

The limit is chosen in this order:

  • ROCMLIR_ATTENTION_SPLITKV_MAX_EXTRA_BYTES environment override.

  • A device-based limit computed from GPU memory (device memory divided by 8, with safety clamping).

  • A fallback default when device memory cannot be queried.

Test Plan

Build MLIRRockOps and rocmlir-gen.
Run a large split_kv attention case and verify early rejection with an explicit memory-limit message.
Re-run the same case with ROCMLIR_ATTENTION_SPLITKV_MAX_EXTRA_BYTES override and verify generation succeeds.
Replay the 4 reported failing attention configurations through the same parameter sweep pipeline (gen -> driver -> runner).

Test Result

Build status: PASS.

Replay of the 4 reported configurations:

  • 3 configurations are now INVALID at rocmlir-gen stage with explicit splitKV memory-limit errors.

  • 1 configuration passes end-to-end.

Submission Checklist

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an early-validation guard for attention splitKV configurations to prevent late OOM/timeout failures by estimating extra temporary storage and rejecting oversized cases up front, with a user override via environment variable.

Changes:

  • Add overflow-safe splitKV extra-memory estimation in rock.attention verification and emit a clear validation error when exceeding a limit.
  • Add HIP-based device global-memory query helper to derive a dynamic default limit per target architecture (with clamping and fallback).
  • Minor: adjust bf16 attention sweep default -RMS_threshold; treat 0-byte ROCm allocations as a no-op.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
mlir/utils/performance/parameterSweeps.py Changes default bf16 attention sweep RMS threshold.
mlir/lib/Dialect/Rock/IR/RockDialect.cpp Implements splitKV extra-memory estimation + verifier limit/diagnostic.
mlir/lib/Dialect/Rock/IR/AmdArchDb.cpp Adds HIP query to retrieve total GPU global memory for an arch.
mlir/include/mlir/Dialect/Rock/IR/AmdArchDb.h Exposes the new device global-memory query API.
external/llvm-project/mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp Makes 0-byte mgpuMemAlloc return nullptr early.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mlir/lib/Dialect/Rock/IR/RockDialect.cpp Outdated
Comment thread mlir/lib/Dialect/Rock/IR/RockDialect.cpp
Comment thread mlir/utils/performance/parameterSweeps.py Outdated
Comment thread mlir/lib/Dialect/Rock/IR/RockDialect.cpp Outdated
@bogdan-petkovic bogdan-petkovic force-pushed the bogdan-petkovic/attn-splitkv-limit branch from b328f8e to adc577d Compare April 16, 2026 10:35
@bogdan-petkovic bogdan-petkovic marked this pull request as ready for review April 20, 2026 10:41
@bogdan-petkovic bogdan-petkovic changed the title Add split_kv memory checks for attention [AIROCMLIR-707] Add split_kv memory checks for attention Apr 21, 2026
@bogdan-petkovic bogdan-petkovic force-pushed the bogdan-petkovic/attn-splitkv-limit branch from d9d09c8 to 3473ab6 Compare April 23, 2026 08:54
Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>
Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>
Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>
Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>
Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>
- Replace isa<> + cast<> pairs with dyn_cast<> in verifyCommonAttnGemmParameters
- Replace std::unordered_map<std::string> cache with llvm::StringMap
- Extract safeGlobalMemBytes helper to remove duplicated overflow-guard logic in lookupDeviceGlobalMemorySizeBytes

Made-with: Cursor
@bogdan-petkovic bogdan-petkovic force-pushed the bogdan-petkovic/attn-splitkv-limit branch from 48c179b to d5ca8d5 Compare April 27, 2026 13:50
bogdan-petkovic and others added 5 commits April 27, 2026 14:45
Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>
- Use dyn_cast<AttentionOp>(op.getOperation()) instead of
  dyn_cast<AttentionOp>(op) since op is a RockGemmGemmWrapperInterface
  value, not a raw Operation*
- Move safeGlobalMemBytes inside the #ifndef _WIN32 guard as a lambda
  to avoid referencing hipDeviceProp_t where HIP headers are not included

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants