[Paddle TensorRT] Fix int64 shape tensor in stack/arange converters breaking TRT>=10.8 engine build#79320
Conversation
…raphs (TRT>=10.8)
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-16 07:21:05
📋 Review 摘要
PR 概述:修复 TensorRT 10.8+ 下 stack/arange converter 中 int64 shape tensor 与 int32 shape subgraph 混用导致的 engine build 问题。
变更范围:python/paddle/tensorrt/impls/creation.py、python/paddle/tensorrt/impls/manipulation.py
影响面 Tag:[Inference]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | python/paddle/tensorrt/impls/creation.py:134 |
arange 的 runtime tensor integer 分支缺少对应回归测试 |
📝 PR 规范检查
符合规范。
总体评价
stack_converter 改用 trt_shape() 与现有 shape helper 语义一致,能把 TRT>=10 的 shape 输出统一回 int32;arange_converter 的整数分支 cast 也与 float 分支和后续 zero_tensor 的 int32 约束对齐。主要剩余风险是 arange 现有测试只覆盖常量输入,建议补一个 runtime tensor 输入用例锁住这次修复。
| quotient_tensor = f_quotient_tensor | ||
| # zero_tensor (above) is int32; on TRT>=10 the integer quotient is int64, | ||
| # which would mismatch in the trt_sub below. Cast it to int32 too. | ||
| quotient_tensor = trt_cast( |
There was a problem hiding this comment.
🟡 建议 这个 cast 修的是 start/end/step 作为 TensorRT 输入 tensor 时的非 float 分支,但现有 arange converter 测试只用常量输入(test/tensorrt/test_converter_creation.py:109-114 的 feed_list 为空)。常量折叠路径不会稳定覆盖 f_quotient_tensor 在 TRT>=10 上产生 int64 后再和 zero_tensor 做 trt_sub 的运行时 subgraph;后续删除或改坏这行 cast 时,现有测试仍可能通过。
建议修复方式:
新增一个 arange TensorRT 用例,把 start、end、step 放进 feed_list(例如三个 int64、shape 为 [1] 的输入),并设置对应的输入 shape 数据,使 pd_op.arange 的 integer 分支以 runtime tensor 进入 converter;该用例应在 TRT>=10.8 删除这行 cast 时 engine build 失败。
PR Category
Inference
PR Types
Bug fixes
Description
Problem. On TensorRT ≥ 10.8 the
Shapeop returns int64 (it returned int32 up to 10.0.1).The
pd_op.stackconverter builds its output-shape subgraph from a rawnetwork.add_shape(...)andthen concatenates that int64 tensor with the int32
add_1D_constant_layer(network, 1), which breaks theTensorRT engine build. Paddle already handles this elsewhere via
converter_utils.py::trt_shape()(whichcasts the
Shaperesult back to int32 on TRT ≥ 10, per its docstring);stack_convertersimply wasn'trouted through it.
arange_converterhas the analogous gap — only its float branch casts the quotient.Fix (2 files):
python/paddle/tensorrt/impls/manipulation.py—stack_converter: replace rawnetwork.add_shape(...)withtrt_shape(...)so the shape tensor is int32 before theadd_concatenationwith the int32 constant.python/paddle/tensorrt/impls/creation.py—arange_converter: cast the integer-branch quotient toint32 (mirroring the existing float branch).
Verification — the repo's own tests go FAIL → PASS, on Turing and Blackwell. Built the released
paddlepaddle-gpu==3.3.0(cu129) +tensorrt==10.15.1.29and rantest/tensorrt/test_converter_*.py:The existing
TestStackTRTPattern/TestStackCase2TRTPatternalready cover the path, so no new testis needed — they fail on TRT ≥ 10.8 without this change and pass with it (and pass on TRT < 10.8 either
way, where
Shapestill returns int32, which is why this regressed silently on CI). In larger graphs thesame int64 shape tensor instead trips
Error Code 4: ... incompatible types Int32 and Int64(observedwith
PP-FormulaNet_plus-L).Note on
arange: the existing arange tests use constant inputs and pass before and after, so theydon't independently exercise this; the integer-branch cast is included as the obvious consistency fix
matching the float branch (the same issue surfaces at the model level, e.g.
PP-DocLayout-L). Happy tosplit it into a follow-up if preferred.
Regression sources:
pd_op.stack(#68839),pd_op.arange(#68757). Good cherry-pick candidate forrelease/3.3/release/3.4.Fixes #79319.
是否引起精度变化 (Does this change precision?)
否 (No). Shape/index tensors are bounded well within int32 range; the change only makes the TensorRT
operand dtype consistent so the engine builds, and is numerically equivalent.