Skip to content

ONNX 1.21.0 integration#27601

Merged
titaiwangms merged 24 commits into
mainfrom
onnx-1.21.0-integration
Apr 3, 2026
Merged

ONNX 1.21.0 integration#27601
titaiwangms merged 24 commits into
mainfrom
onnx-1.21.0-integration

Conversation

@titaiwangms

Copy link
Copy Markdown
Contributor

Fix #27586

This pull request updates ONNX Runtime to support ONNX opset 26, including new operator implementations and related infrastructure changes. The most important changes are the upgrade of the ONNX dependency, addition of new opset 26 kernels (such as CumProd and BitCast), and updates to macros and versioning to ensure compatibility. Below are the key changes grouped by theme:

ONNX Dependency Upgrade:

  • Updated ONNX submodule and source references to the latest commit supporting opset 26, and changed versioning in vcpkg.json from 1.20.1 to 1.21.0. (cmake/deps.txt, cmake/external/onnx, cmake/vcpkg-ports/onnx/portfile.cmake, cmake/vcpkg-ports/onnx/vcpkg.json) [1] [2] [3] [4]

Opset 26 Kernel Support:

  • Registered new opset 26 kernels for BitCast and all supported types of CumProd in the CPU execution provider, including their instantiation and build logic. (onnxruntime/core/providers/cpu/cpu_execution_provider.cc, onnxruntime/core/providers/cpu/math/cumprod.cc, onnxruntime/core/providers/cpu/math/cumprod.h) [1] [2] [3] [4]
  • Increased the maximum supported opset version in the optimizer API from 25 to 26. (onnxruntime/core/optimizer/transpose_optimization/optimizer_api.h)

Build and Patch Updates:

  • Added a new ONNX_MINIMAL_BUILD option to ONNX CMake configuration and updated patch files for compatibility with the new ONNX version. (cmake/patches/onnx/onnx.patch, cmake/vcpkg-ports/onnx/binskim.patch) [1] [2] [3]

Macro Improvements:

  • Updated operator schema macros to use [[maybe_unused]] instead of the deprecated ONNX_UNUSED attribute, improving code clarity and modernizing macro usage. (onnxruntime/core/graph/contrib_ops/contrib_defs.h, onnxruntime/core/graph/dml_ops/dml_defs.h) [1] [2]

ONNX Dependency Upgrade

  • Updated ONNX submodule and source references to the latest commit supporting opset 26, and changed versioning in vcpkg.json from 1.20.1 to 1.21.0. [1] [2] [3] [4]

Opset 26 Kernel Support

  • Registered new opset 26 kernels for BitCast and all supported types of CumProd in the CPU execution provider, including their instantiation and build logic. [1] [2] [3] [4]
  • Increased the maximum supported opset version in the optimizer API from 25 to 26.

Build and Patch Updates

  • Added a new ONNX_MINIMAL_BUILD option to ONNX CMake configuration and updated patch files for compatibility with the new ONNX version. [1] [2] [3]

Macro Improvements

  • Updated operator schema macros to use [[maybe_unused]] instead of the deprecated ONNX_UNUSED attribute, improving code clarity and modernizing macro usage. [1] [2]

titaiwangms and others added 3 commits March 9, 2026 21:34
Update ONNX submodule to rel-1.21.0 branch (commit fbbe45b8e2).
Update cmake/deps.txt with new URL and SHA1.
Update vcpkg port (portfile.cmake, vcpkg.json) for 1.21.0.
Regenerate onnx.patch and binskim.patch for 1.21.0 CMakeLists.txt changes.
Update all 7 requirements.txt files to onnx==1.21.0.
Bump kMaxSupportedOpset from 25 to 26 in optimizer_api.h.
Fix ONNX_UNUSED macro removal (replaced with [[maybe_unused]]) in
contrib_defs.h, dml_defs.h, and test_opaque_api.cc.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BitCast (opset 26): Zero-copy tensor type reinterpretation for types
with matching bit-widths. Supports all standard numeric types.
Registered in cpu_execution_provider.cc with 17 passing tests.

CumProd (opset 26): Cumulative product along a given axis with
optional exclusive and reverse attributes. Supports float, double,
int32, int64, uint32, uint64. Identity element is 1 (vs 0 for CumSum).
Registered in cpu_execution_provider.cc with 33 passing tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add BitCast and CumProd entries to the CPU provider kernel documentation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@titaiwangms titaiwangms requested a review from Copilot March 9, 2026 22:06
Comment thread onnxruntime/core/providers/cpu/math/cumprod.cc Fixed
Comment thread onnxruntime/core/providers/cpu/tensor/bitcast_op.cc Fixed
Comment thread onnxruntime/test/providers/cpu/tensor/bitcast_op_test.cc Fixed

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/core/graph/contrib_ops/contrib_defs.h Outdated
Comment thread onnxruntime/core/graph/contrib_ops/contrib_defs.h Outdated
Comment thread onnxruntime/core/graph/dml_ops/dml_defs.h Outdated
Comment thread onnxruntime/core/graph/dml_ops/dml_defs.h Outdated
Comment thread onnxruntime/core/providers/cpu/math/cumprod.cc Outdated
Comment thread onnxruntime/core/providers/cpu/math/cumprod.cc Outdated
Comment thread onnxruntime/core/providers/cpu/tensor/bitcast_op.cc Outdated
Comment thread onnxruntime/core/providers/cpu/tensor/bitcast_op.cc Outdated
Comment thread onnxruntime/test/opaque_api/test_opaque_api.cc Outdated
Comment thread onnxruntime/test/providers/cpu/tensor/bitcast_op_test.cc Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates ONNX Runtime’s ONNX dependency and kernel surface to support ONNX opset 26 (aligned with ONNX 1.21.0), including new CPU kernels and associated CI/build/doc updates.

Changes:

  • Bumped ONNX Python dependencies and vcpkg/zip-based ONNX sources to 1.21.0 (and updated submodule ref).
  • Added new CPU opset 26 kernels (BitCast, CumProd) plus extensive unit tests.
  • Updated schema-registration macros to use [[maybe_unused]] and refreshed generated operator-kernel documentation.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tools/ci_build/github/windows/python/requirements.txt Bump CI Python ONNX requirement to 1.21.0.
tools/ci_build/github/linux/python/requirements.txt Bump CI Python ONNX requirement to 1.21.0.
tools/ci_build/github/linux/docker/scripts/requirements.txt Bump docker scripting ONNX requirement to 1.21.0.
tools/ci_build/github/linux/docker/scripts/manylinux/requirements.txt Bump manylinux image ONNX requirement to 1.21.0.
tools/ci_build/github/linux/docker/scripts/lort/requirements.txt Bump LoRT docker ONNX requirement to 1.21.0.
tools/ci_build/github/linux/docker/inference/aarch64/python/cpu/scripts/requirements.txt Bump aarch64 inference image ONNX requirement to 1.21.0.
onnxruntime/test/python/requirements.txt Bump test Python ONNX requirement to 1.21.0.
onnxruntime/test/providers/cpu/tensor/bitcast_op_test.cc Adds CPU unit tests for new BitCast kernel.
onnxruntime/test/providers/cpu/math/cumprod_test.cc Adds CPU unit tests for new CumProd kernel.
onnxruntime/test/opaque_api/test_opaque_api.cc Updates schema-registration macro to use [[maybe_unused]].
onnxruntime/core/providers/cpu/tensor/bitcast_op.h Declares new CPU BitCast kernel.
onnxruntime/core/providers/cpu/tensor/bitcast_op.cc Implements and registers opset-26 CPU BitCast.
onnxruntime/core/providers/cpu/math/cumprod.h Declares templated CPU CumProd kernel and axis helper.
onnxruntime/core/providers/cpu/math/cumprod.cc Implements and registers opset-26 CPU CumProd.
onnxruntime/core/providers/cpu/cpu_execution_provider.cc Registers new opset-26 CPU kernels into the EP registry.
onnxruntime/core/optimizer/transpose_optimization/optimizer_api.h Extends optimizer API max supported opset to 26.
onnxruntime/core/graph/dml_ops/dml_defs.h Modernizes schema macro to [[maybe_unused]].
onnxruntime/core/graph/contrib_ops/contrib_defs.h Modernizes schema macro to [[maybe_unused]].
docs/OperatorKernels.md Refreshes generated operator-kernel listing (adds opset-26 ops, alters provider sections).
cmake/vcpkg-ports/onnx/vcpkg.json Bumps vcpkg ONNX port to 1.21.0.
cmake/vcpkg-ports/onnx/portfile.cmake Updates ONNX source ref/SHA to build ONNX 1.21.0 content.
cmake/vcpkg-ports/onnx/binskim.patch Updates ONNX patch hunks for new upstream version (adds ONNX_MINIMAL_BUILD option).
cmake/patches/onnx/onnx.patch Updates ONNX patch hunks for new upstream version (adds ONNX_MINIMAL_BUILD option).
cmake/external/onnx Updates ONNX submodule commit to newer ref.
cmake/deps.txt Updates ONNX zip dependency URL/hash to newer ref.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/math/cumprod.cc
Comment thread onnxruntime/core/providers/cpu/math/cumprod.cc Outdated
Comment thread onnxruntime/core/providers/cpu/tensor/bitcast_op.cc
Comment thread docs/OperatorKernels.md
Comment thread cmake/vcpkg-ports/onnx/portfile.cmake Outdated
titaiwangms and others added 3 commits March 9, 2026 22:26
Change onnx==1.21.0 to onnx==1.21.0rc1 in all 7 requirements.txt
files since the final 1.21.0 release is not yet published.
Apply lintrunner auto-formatting fixes to whitespace/alignment.

Verified SHA1 (deps.txt) and SHA512 (vcpkg portfile) hashes match
the downloaded archives. No v1.21.0 tag exists yet — commit hash
URL is correct.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- cumprod.cc: Add #include <numeric>, validate axis tensor has exactly
  one element (0-D scalar or 1-D shape [1])
- bitcast_op.cc: Add null check for TensorTypeFromONNXEnum return value
- OperatorKernels.md: Restore DML section that was accidentally removed
  during regeneration, add BitCast and CumProd entries manually

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@titaiwangms titaiwangms reopened this Mar 10, 2026
titaiwangms and others added 2 commits March 11, 2026 17:17
ONNX 1.21.0 (onnx/onnx#7675) added stricter raw_data size validation
in ParseData<T>. The test had shape {4} but only 3 values {2, 64, 32},
which old ONNX silently ignored. Fix shape to {3}.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@titaiwangms titaiwangms requested a review from Copilot March 11, 2026 17:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/tensor/bitcast_op.cc
Comment thread onnxruntime/core/providers/cpu/tensor/bitcast_op.cc
Comment thread onnxruntime/core/providers/cpu/math/cumprod.cc Outdated
Comment thread onnxruntime/core/providers/cpu/math/cumprod.cc Outdated
Comment thread onnxruntime/core/providers/cpu/math/cumprod.cc Outdated
Comment thread tools/ci_build/github/linux/python/requirements.txt Outdated
Comment thread onnxruntime/core/providers/cpu/math/cumprod.cc Outdated
Comment thread onnxruntime/core/providers/cpu/math/cumprod.cc
titaiwangms and others added 5 commits March 12, 2026 19:56
- Update ONNX submodule, deps.txt, vcpkg portfile to rc2 commit a51ac075
- Update onnx==1.21.0rc2 in all 7 requirements.txt files
- Fix cumprod.cc review comments (namespace, ORT_ENFORCE, type, closing brace)
- Add 5 test exclusions: 4 DFT rfft/irfft tests (ORT lacks IRFFT) + 1 BitCast bool test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…broken tests

The JSONC filter only covers Python backend tests. The C++ onnx_test_runner
uses hardcoded arrays in TestCase.cc GetBrokenTests(). Add BitCast bool
and DFT rfft/irfft filters to cover the C++ test runner path.

ORT BitCast kernel doesn't register bool type, and ORT DFT kernel lacks
IRFFT (inverse real FFT) support.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Agent-signed-off: Developer (45720d0d) [claude-opus-4.6]
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ONNX 1.21.0rc2 enables _GLIBCXX_ASSERTIONS (onnx/onnx#7601) which
exposes pre-existing undefined behavior in Slice shape inference:
std::clamp(start, 0, dim_value-1) with dim_value=0 violates lo<=hi.
Add early-exit guard for both opset 10 and 11 locations in old.cc.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The onnx.patch fix must also be in binskim.patch for Windows CI builds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Covers the third std::clamp UB location in processSliceInputs.
All three sites now patched: old.cc:2646, old.cc:6329, defs.cc:792.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread onnxruntime/core/providers/cpu/math/cumprod.cc Outdated
titaiwangms and others added 2 commits March 16, 2026 19:01
- Update cmake/deps.txt: commit hash and SHA1 for rc3 zip
- Update cmake/external/onnx submodule to rc3 commit (e6c12c5fa)
- Update cmake/vcpkg-ports/onnx/portfile.cmake: REF and SHA512
- Update onnx==1.21.0rc3 in all 7 requirements.txt files
- Verified all vcpkg patches (binskim, fix-cmakelists, fix-dependency-protobuf)
  and cmake/patches/onnx/onnx.patch apply cleanly to rc3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Agent-signed-off: Developer (257e49bb) [claude-opus-4.6]
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Slice shape inference fix for dim_value==0 (tensor/old.cc and
tensor/defs.cc) was cherry-picked into ONNX rc3 natively (commit
33afebf43, PR #7739). The parameter was also renamed from 'input_rank'
to 'input_dim_size_or_value'. Remove these 3 hunks from both
onnx.patch and binskim.patch to prevent build failures.

Retained hunks: CMakeLists ONNX_MINIMAL_BUILD, Utils.cmake protobuf
warnings, GroupNormalization Deprecate removal.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Agent-signed-off: Developer (257e49bb) [claude-opus-4.6]
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@titaiwangms titaiwangms reopened this Mar 16, 2026
titaiwangms and others added 4 commits March 16, 2026 21:35
Replace the 4 sequential outer loops (forward/reverse × exclusive/non-exclusive)
with concurrency::ThreadPool::TryBatchParallelFor. Each outer iteration processes
an independent slice, making them safe to parallelize.

Refactored from sequential pointer arithmetic (input_iter++/output_iter++) to
index-based access using base offset = outer * dim * lower_dim_size, which is
required for parallel execution where iterations cannot share mutable iterators.

Agent-signed-off: Developer (257e49bb) [claude-opus-4.6]
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update ONNX dependency from 1.21.0rc3 to 1.21.0rc4 (commit c751ddbce897).
RC4 includes bug fixes (Slice SIGABRT on empty dimensions) and
security hardening (ExternalDataInfo attribute injection).

Changes:
- cmake/deps.txt: Updated archive URL and SHA1 hash
- cmake/external/onnx: Updated submodule to rc4 commit
- cmake/vcpkg-ports/onnx/portfile.cmake: Updated REF and SHA512
- 7 requirements.txt files: onnx==1.21.0rc4

Agent-signed-off: Developer (dc55daf6) [claude-opus-4.6]
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
titaiwangms added a commit that referenced this pull request Jun 16, 2026
### Integrate ONNX 1.22.0rc1 (opset 27)

Resolves #28752.

Pin: `onnx/onnx@bc3be77bec2f628788796dff60819186bacf49df`
(VERSION_NUMBER `1.22.0rc1`).
ONNX **1.21.0 → 1.22.0rc1**. Max ai.onnx opset **26 → 27**. IR version
**unchanged (13 / `0x0D`)**.

This is the **RC validation phase** of an incremental integration (same
strategy as the ONNX 1.21 bump, #27601). The formal `v1.22.0` GitHub
release is still a **draft** (no git tag yet), so re-pinning to the
released tag is deferred to **Phase 2** (see Follow-ups). Landing the RC
now validates ONNX 1.22 against ORT before ONNX publishes the formal
release.

---

### Update — ONNX 1.22.0 **FINAL** re-pin + rebase onto `upstream/main`
+ closes #28969

ONNX published the formal **`v1.22.0`** GitHub release, so this PR is
re-pinned **rc2 → FINAL** (`onnx/onnx@v1.22.0`) — the Phase-2 step
deferred in the rc1 description below. The branch was also **rebased
onto `upstream/main`** to pick up the intervening optimizer/opset-26
work. The released tag tarball is a different asset hash than the RCs,
so the vcpkg MS-internal asset mirror was re-seeded for the final tag
(otherwise `--use_vcpkg` legs 404).

**Also closes #28969** (WebGPU binary-elementwise broadcast `SIZE_MAX`
underflow). ONNX 1.22's expanded-Attention reference tests exposed a
latent WebGPU bug where a broadcast shape computed `dim - 1` on a
zero/unit dimension and underflowed to `SIZE_MAX`; the fix is included
here and the previously-skipped reference tests are re-enabled.

**Opset-27 `*CurrentOpset` test handling.** ONNX 1.22.0 FINAL ships
`DomainToVersionRange` **map-max 27** while the last *released* opset is
**26**, so **opset 27 stays under development** for the whole 1.22
cycle. Strict legs (the default, or `ALLOW_RELEASED_ONNX_OPSET_ONLY=1`)
therefore throw *"Opset 27 under development"* at model load on every
`*CurrentOpset` fusion test that builds at the max opset. These tests
now load with per-model `ModelOptions{/*allow_released_opsets_only*/
false, /*strict_shape_type_inference*/ false}`, extending the existing
`38f17243b` / GatherToSlice precedent to the rest of the `*CurrentOpset`
suite. This is **leg-agnostic** (exercises opset 27 on every CI leg, not
just the relaxed ones) and **preserves opset coverage** (vs.
`GTEST_SKIP`). Each call site is annotated with a one-line WHY +
tracking issue (#28966) so the relaxation can be removed once opset 27
is released.

`Resolves #28752` (unchanged). Closes #28969.


### Update — ONNX 1.22.0rc2 re-pin + ConvTranspose conforms to ONNX
`output_shape` spec

Since the original rc1 description below, this PR was re-pinned **rc1 →
rc2** (`onnx/onnx@b124e0188a`, `VERSION_NUMBER 1.22.0rc2`) to pick up
the upstream Xcode/iOS CMake fix (onnx#8056). rc2 also carries
onnx#8051, which tightened `convTransposeShapeInference` to reject an
`output_shape`/`output_padding` whose size does not match the number of
spatial dimensions (per the ONNX spec clarification onnx#5400). **ONNX
Runtime now conforms to that spec** instead of patching ONNX to preserve
a non-standard form.

**⚠️ Breaking change — ConvTranspose `output_shape` now follows the ONNX
spec (spatial dimensions only).** ORT previously also accepted a
non-standard `rank + 2` form that included batch and channel, i.e. `(N,
C, H, W)`. As of ONNX 1.22, a `rank + 2` `output_shape` on a
ConvTranspose whose input has a **statically-known rank** is rejected at
`Graph::Resolve` with *"Attribute output_shape has incorrect size"*.
**Migration:** specify `output_shape` with spatial dimensions only —
e.g. `{1, 1, 1, 14}` → `{1, 14}` (batch and channel are always inferred
from the input and weight, so results are identical; the kernel ignores
`N, C`). Models whose ConvTranspose input has a **dynamic/unknown rank
are unaffected** — ONNX skips the size check and ORT computes the same
result (covered by the new
`ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime` test).

**Patch inventory — supersedes "2 files, 3 hunks" below.**
`cmake/patches/onnx/onnx.patch` (and its byte-identical `binskim.patch`
mirror) carries **only** the `ONNX_MINIMAL_BUILD` option hunk and the
GroupNormalization-18 `.Deprecate()` removal — **no ConvTranspose
hunks**. rc2's strict shape-inference check is kept as-is; ORT's own
test models were conformed to the spec. The upstream archive hash,
`deps.txt`, `portfile.cmake`, `vcpkg.json`, and the submodule pin are
unchanged.

**Additional rc2 test conform.** rc2 also tightened
`convPoolShapeInference` to reject `Conv` inputs with rank < 3 (*"Input
tensor must have at least 3 dimensions"*). The hand-authored model in
`onnxruntime/test/python/quantization/test_op_split.py` declared a
spec-invalid rank-2 `Conv` input/weight; it was conformed to a valid
NCHW shape (`[6, 3]` → `[1, 1, 6, 3]`, weight → `[2, 1, 1, 1]`), keeping
the quantized-Split graph and expected outputs identical. No ORT source
change.

> This note should also seed the GitHub Release notes for the ONNX 1.22
/ opset 27 milestone and the squash-commit message.


---

### What changed (29 files)

**Version plumbing**
- `cmake/deps.txt` — onnx archive URL → rc1 commit zip + SHA1
`421e5a9afb6c41a54696e424e5b9a3796aab6821`.
- `cmake/external/onnx` — submodule → `bc3be77b`.
- `cmake/vcpkg-ports/onnx/portfile.cmake` — `REF` commit form + tar.gz
SHA512 `e0c526f5…3ce467`.
- `cmake/vcpkg-ports/onnx/vcpkg.json` — `version-semver` `1.22.0`,
`port-version` 0.
- `cmake/patches/onnx/onnx.patch` +
`cmake/vcpkg-ports/onnx/binskim.patch` — **byte-identical** rebase onto
1.22 (2 files, 3 hunks): kept the `ONNX_MINIMAL_BUILD` option
(restructured for 1.22's new `onnx_core` OBJECT-lib /
`add_subdirectory(onnx)` layout) and the GroupNormalization-18
`.Deprecate()` removal; **dropped** the `Utils.cmake` protobuf-warnings
hunk (already merged upstream in 1.22).

**Opset-27 op enablement (Range)**
- `onnxruntime/core/providers/cpu/generator/range.cc` — split into
versioned `[11, 26]` + a new unversioned `27` registration. The opset-27
kernel natively supports the existing common numeric types
(float/double/int16/int32/int64). **fp16 Range is covered** via ONNX's
Range-27 **function body**, which ORT expands into primitive ops at
partition time. **bf16 Range is deferred to that same function
expansion** — there is no native bf16 kernel, and its bf16 reference
node test (`test_range_bfloat16_type_positive_delta`, base +
`_expanded`) is not exercised by the Python/numpy ONNX backend series,
whose harness cannot materialize bf16 (`Numpy_type 256`); a native
fp16/bf16 kernel + `stash_type` handling is a follow-up (efficiency, not
correctness).
- `onnxruntime/core/providers/cpu/cpu_execution_provider.cc` — versioned
the Range forward-declare + `BuildKernelCreateInfo` entries and added
the opset-27 registration.
- **CUDA Range** — same versioned `[11, 26]` + opset-27 split as CPU
(`onnxruntime/core/providers/cuda/generator/range.cc` +
`cuda_execution_provider.cc`); GPU-verified locally: `onnx_test_runner
-e cuda` 8/8 opset-27 Range node tests pass, native Range-27 placed on
CUDAExecutionProvider (fp16/bf16 via function expansion).

**Optimizer / EP opset ceilings**
- `…/transpose_optimization/optimizer_api.h` — `kMaxSupportedOpset` **26
→ 27**.
- `coreml`/`nnapi`/`vsinpu`/`webnn` `base_op_builder.h` —
`GetMaxSupportedOpSet()` **25 → 27** (upper guard only; per-op support
checks still gate — these EPs gain no new kernels here).

**Fusion updates**
- `onnxruntime/core/optimizer/gather_fusion.cc` — GatherToSlice Range
version list `{1,11}` → `{1,11,27}`.
- `onnxruntime/core/optimizer/embed_layer_norm_fusion.cc` — add `27` to
the two Range path-matchers (`parent_path_3/4`) so embedding fusion
still matches opset-27 models.
- `onnxruntime/test/optimizer/graph_transform_test.cc` — new opset-27
GatherToSliceFusion test.

**Requirements (7 bumped)**
- All 7 CI `requirements.txt` → `onnx==1.22.0rc1` (rc1 wheel is on
PyPI). The 3 transformers pins remain frozen at `1.18.0` (unrelated to
this bump; intentionally untouched).

**Generated docs / test data**
- `js/web/docs/webgl-operators.md` — regenerated.
- `docs/OperatorKernels.md` — **surgical** edit: CPU EP **and** CUDA EP
Range rows (`27+` + `[11, 26]` continuation each); see caveats.
- `onnxruntime/test/testdata/onnx_backend_test_series_filters.jsonc` —
**comment-only**: documents why no opset-27 CPU exclusions are needed
(all opset-27 node tests pass via function expansion).

**Docs**
- `.agents/skills/onnx-opset-bump-checklist/SKILL.md` — new reusable
checklist skill distilled from this integration. Now also documents the
"bump **all** execution providers together" tradition (CPU + CUDA +
JS/DML assessment in one pass) so future opset bumps don't ship a
partial EP set.

---

### Validation (CPU EP + CUDA EP, Linux x64)

- Full build ✅
- `--minimal_build extended` build ✅ (validates the rebased
`ONNX_MINIMAL_BUILD` patch hunk independently of the vcpkg mirror path)
- `onnxruntime_test_all` ✅ — **1595 passed / 0 failed**
- `onnx_test_runner -e cpu` on the ONNX 1.22 opset-27 node tests ✅ —
**62/62 pass** via ONNX function-body expansion (run with
`ALLOW_RELEASED_ONNX_OPSET_ONLY=0`), including CausalConvWithState,
LinearAttention, and fp16/bf16 Range — despite no native kernels for
them.
- **CUDA EP (H100):** built `--use_cuda` clean in both **Debug** and
**RelWithDebInfo** ✅; `onnx_test_runner -e cuda` on the opset-27 Range
node tests ✅ — **8/8 pass**, with native Range-27 placed on
CUDAExecutionProvider (no CPU fallback) and fp16/bf16 covered via
function-body expansion.

---

### Standing caveats (please read before reviewing)

1. **CUDA EP now locally verified for Range; other GPU EPs/ops still
CI-only.** The CUDA EP was built and the opset-27 **Range** node tests
run locally on an H100 (8/8 pass). DML and the remaining GPU EPs/ops
were **not** exercised here. Function-body expansion is EP-agnostic, so
other opset-27 models are expected to run on those EPs too, but broader
GPU coverage remains a CI/follow-up item.
2. **`OperatorKernels.md` updated surgically** (CPU Range row only). A
CPU-only *full* regen would destructively wipe the CUDA/DML/other-EP
sections (the generator only emits rows for the EPs in the built
module). A correct multi-EP regen needs a build per EP and is a
follow-up.
3. **Opset 27 is "under development"** in ONNX's released-versions map.
ORT's load-time validation rejects opset-27 models unless
`ALLOW_RELEASED_ONNX_OPSET_ONLY=0` (ORT CI already sets this). The
opset-27 **schemas are always compiled in from the submodule**
regardless — this gate only affects model load-time acceptance, not
schema availability.
4. **EP `GetMaxSupportedOpSet` jumped 25 → 27** (skips 26). This is an
*upper* guard only; raising it merely lets opset-26/27 nodes reach the
per-op support checks that still gate correctness. No regression — it
also retroactively un-caps opset-26 for these EPs.
5. **iOS/macOS Xcode framework build is currently broken by an upstream
ONNX CMake regression** (the `onnx_core` OBJECT-library split in
onnx/onnx#7733 reintroduced the Xcode breakage originally fixed by
onnx/onnx#7515 for onnx/onnx#7514). This is **NOT** caused by this opset
bump. Tracked upstream at
[onnx/onnx#8053](onnx/onnx#8053). Non-Xcode
builds (Linux/Windows/Android/WASM) and all CPU/CUDA validation are
unaffected. This resolves at the **Phase 2** formal `v1.22.0` re-pin
once ONNX ships the fix.

---

### Follow-ups (explicitly NOT in this PR)

- **GPU/multi-EP coverage:** run opset-27 CUDA/DML node tests;
regenerate `OperatorKernels.md` across all EPs.
- **JS EP Range** `[11, 26]` + `27` split (currently registered
open-ended at `11`; mirror the CPU/CUDA versioned split).
- **DML Range opset-27 assessment** (DML uses its own `REG_INFO`
registration system — assess whether an opset-27 entry is needed).
- **WebGPU EP Range** opset-27 split — `range.cc` registers `Range`
`.SinceVersion(11)` open-ended, so it already claims opset-27 Range;
only the new bf16 type is unsupported and falls back via the `T`
type-constraint (function expansion). Mirror the CPU/CUDA versioned
`[11, 26]` + `27` split.
- **Native kernels:** implement CPU (and EP) `CausalConvWithState` and
`LinearAttention` kernels, and a native fp16/bf16 + `stash_type`
Range-27 kernel (replace today's function-expansion path with efficient
kernels).
- **Phase 2 — formal `v1.22.0` re-pin:** re-pin
`deps.txt`/submodule/portfile/requirements to the released tag once ONNX
publishes it (currently blocked on ONNX tagging the release); upload the
tag tarball to the vcpkg mirror. **This also restores the iOS/macOS
Xcode framework build** once the upstream onnx OBJECT-library Xcode
regression (caveat 5) is resolved and re-pinned.
- **Tooling:** fix the pre-existing crash in
`find_optimizer_opset_version_updates_required.py` (placeholder `ver`
parsed as int) so it can be relied on for future bumps.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate with ONNX 1.21.0 release branch

5 participants