contrib: add mask/input shape consistency checks in MaxpoolWithMask::Compute#28223
Conversation
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/3ff41f92-9af2-46fc-9d9f-029cd4f52233 Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/3ff41f92-9af2-46fc-9d9f-029cd4f52233 Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR replaces a placeholder TODO in the com.microsoft::MaxpoolWithMask CPU kernel with concrete mask-vs-input shape validation to prevent unsafe indexing when mask shapes don’t match the input tensor.
Changes:
- Added runtime guards in
MaxpoolWithMask::Computeto require identical rank forXandM, and identical spatial dimensions (dims ≥ 2). - Added negative tests to ensure shape mismatches fail with clear error messages.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| onnxruntime/contrib_ops/cpu/maxpool_with_mask.h | Adds input/mask rank + spatial-dimension consistency checks in Compute. |
| onnxruntime/test/contrib_ops/maxpool_mask_test.cc | Adds two failure-case tests covering rank mismatch and spatial-dimension mismatch. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Multi-reviewer synthesis (readability + code + critical + deep)Strong cross-reviewer consensus. The new guards do close the OOB they target — deep-reviewer derived the bound rigorously: with Major1. Pre-existing mask-channel offset formula is mathematically wrong. const int32_t* m_d = M_data + (c * x_step) % total_mask_channels;Result is bounded by Hand-traced the existing passing test (
The existing test only passes because the per-channel maxima all happen to live in column 0. Move the channel-1 max to column 3+ and the existing test would fail with the buggy formula. This PR removes the Ask: either (a) tighten guard #2 to require strict 2. Guard #2 phrasing is awkward and hides the actual invariant. const bool input_has_nonzero_channels = x_shape[0] > 0 && x_shape[1] > 0;
ORT_RETURN_IF_NOT(!input_has_nonzero_channels || (m_shape[0] > 0 && m_shape[1] > 0), ...);Issues:
Cleaner equivalent (still empty-input-safe because the parallel-for body never runs when const int64_t total_channels = x_shape[0] * x_shape[1];
const int64_t total_mask_channels = m_shape[0] * m_shape[1];
ORT_RETURN_IF_NOT(total_channels == 0 || total_mask_channels > 0,
"Mask must have at least one channel when input is non-empty. ...");3. const size_t spatial_rank = x_shape.NumDimensions() - 2;
ORT_RETURN_IF_NOT(spatial_rank == pool_attrs_.kernel_shape.size(), ...);
ORT_RETURN_IF_NOT(spatial_rank >= 1 && spatial_rank <= 3, ...);Minor
Nits
Praise
RecommendationRequest changes on finding #1 (the offset formula): either tighten the contract (simplest) or fix the math + add a multi-channel test. Without that, this PR fixes the OOB but cements the silently-incorrect indexing. Findings #2 and #3 are cheap follow-ups worth folding in. Minors are nice-to-have. Reviewed by a cross-family reviewer team (Claude readability + GPT code & critical + Claude-high deep). Bounds derivation and the multi-channel offset-formula trace independently verified. |
tianleiwu
left a comment
There was a problem hiding this comment.
Review Summary
This PR correctly addresses a real safety gap by replacing a long-standing TODO with proper mask-vs-input shape validation in MaxpoolWithMask::Compute. The spatial dimension loop check is well-generalized (works for 1D/2D/3D), the N/C nonzero guard prevents modulo-by-zero crashes, and error messages are clear. Two failure-mode tests cover the key validation paths.
Minor suggestions (non-blocking):
-
N/C batch-broadcast documentation: The validation ensures mask N and C are nonzero but does not check whether they match the input's N and C. The downstream indexing uses
(c * x_step) % total_mask_channels, which silently "broadcasts" the mask whenm_shape[0] * m_shape[1] != x_shape[0] * x_shape[1]. If this modulo-based broadcasting is intentional, a brief comment would prevent future confusion. -
Missing test for N/C nonzero check: The PR adds the
input_has_nonzero_channelsvalidation path but has no test exercising the failure case (e.g., mask shape{0, 1, 8, 8}with input shape{1, 1, 8, 8}).
Overall: well-scoped, low-risk change with good test coverage.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/9e152742-abd5-4f85-a3dc-dc7073befa07 Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
tianleiwu
left a comment
There was a problem hiding this comment.
Review Summary
Second review on the updated head. Both suggestions from the prior round are addressed:
- N/C broadcasting comment: Added, documenting that the modulo-based mask broadcasting is intentional.
- N/C nonzero test:
MaxPoolWithMask_MaskEmptyBatchDimnow covers the guard against division-by-zero intotal_mask_channels.
The three-tier validation (rank equality → N/C nonzero guard → spatial dimension match) is correctly ordered and all branches have test coverage. Clean and ready to merge.
|
@xadupre, please merge/rebase main branch to pass CI. |
…copilot/check-consistency-mask-shape
…copilot/check-consistency-mask-shape
Description
Replaces the long-standing
// TODO: fix this checker latercomment inMaxpoolWithMask::Computewith real input validation. Without these checks, a mismatched mask silently causes out-of-bounds memory access.Changes:
contrib_ops/cpu/maxpool_with_mask.h— Added threeORT_RETURN_IF_NOTguards:total_mask_channels)test/contrib_ops/maxpool_mask_test.cc— Added three failure-case tests:MaxPoolWithMask_SpatialDimMismatch— mask spatial dims differ from inputMaxPoolWithMask_DimCountMismatch— mask rank differs from input rankMaxPoolWithMask_MaskEmptyBatchDim— mask N=0 with non-empty input triggers the nonzero N/C guardMotivation and Context
The mask tensor is indexed using the input's spatial step size (
x_step = height * width, etc.), so a shape mismatch leads to silent out-of-bounds reads. Additionally,total_mask_channels = m_shape[0] * m_shape[1]is used as a modulo divisor in the per-channel offset formula; if either dimension is zero while the input is non-empty, this causes undefined behaviour (division by zero). The original code had a commented-out check with aTODOacknowledging this gap; this PR closes it.