Conversation
…tern matching Co-authored-by: gramalingam <10075881+gramalingam@users.noreply.github.com>
Co-authored-by: gramalingam <10075881+gramalingam@users.noreply.github.com>
This change addresses @gramalingam's feedback to return the match object (which includes success/failure status) instead of always returning None when the initial pattern match fails. This provides more consistent API behavior and makes failure information available when applicable. - Changed PatternImpl.match() to return match object on line 161 - Updated RewriteRule.try_rewrite() to use "if not match:" instead of "if match is None:" - Added test case to verify both None and failed MatchResult are handled correctly - Backward compatible: None still returned for GenericPatternMatcher, failed MatchResult for SimplePatternMatcher Co-authored-by: gramalingam <10075881+gramalingam@users.noreply.github.com>
Co-authored-by: gramalingam <10075881+gramalingam@users.noreply.github.com>
…g failure Co-authored-by: gramalingam <10075881+gramalingam@users.noreply.github.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
❌ 15 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
There was a problem hiding this comment.
Pull Request Overview
This PR enhances the GQA (Group Query Attention) fusion rule to better handle attention masks, particularly for causal masks with optional sliding window support. The changes simplify the fusion logic while making it more robust for various model patterns.
Key changes:
- Refactored causal mask pattern to support sliding window attention and multiple model variations
- Simplified GQA fusion by removing the separate GQACausalMask rule and consolidating logic
- Updated position_ids handling to use a single parameter instead of separate query/key position IDs
Comments suppressed due to low confidence (1)
onnxscript/rewriter/ort_fusions/gqa.py:136
- The sliding window functionality is implemented in the pattern (lines 69-74) but explicitly disabled in the check method. This creates untested code paths that could fail silently. Consider either removing the sliding window implementation or adding test coverage for when it's enabled.
# TODO(rama) Sliding window: not yet supported.
Does it make sense that we put a NOTE on these patterns/optimizations? |
titaiwangms
left a comment
There was a problem hiding this comment.
Is it possible to add a test, or it's already covered?
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Yes, added a comment to explain this |
I have tried it on Qwen2_5 and Mistralai and Phi2LM models. Agree, we should add test-cases. But wanted to merge it since Tomasso has changes that he will want to merge before his internship ends soon. I think some further refinements of the mask-pattern may be necessary anyway (for TinyLLM and/or Phi4). |
Expand the GQA fusion rule to handle attention mask better.