bug描述 Describe the Bug
On TensorRT ≥ 10.8, the PIR→TensorRT converter for pd_op.stack builds its output-shape subgraph
from a raw int64 Shape tensor and combines it with int32 shape constants, which breaks the
TensorRT engine build. (TensorRT 10.8 changed the Shape op's output dtype from int32 → int64.) The
same latent inconsistency exists in the integer branch of pd_op.arange. This makes the native
Paddle-TensorRT path (run_mode='trt_fp16') unusable for any model containing stack (e.g.
PP-FormulaNet, PP-OCRv5_server_rec, PP-DocLayout-L).
Minimal reproducer — the repository's own converter tests fail on TensorRT ≥ 10.8:
# env: paddlepaddle-gpu==3.3.0 (cu129) + tensorrt==10.15.1.29, CUDA 12.x, any NVIDIA GPU
python -m pytest -v -s test/tensorrt/test_converter_manipulation.py -k "Stack"
# equivalently, any stack under the native TRT path on TRT>=10.8:
import numpy as np, paddle
from paddlex import create_model, PaddlePredictorOption # or paddle.tensorrt.export.convert
opt = PaddlePredictorOption(); opt.run_mode = "trt_fp16"
# build any model whose graph contains a `stack` (e.g. PP-OCRv5_server_rec) -> engine build fails
Observed behavior: the engine build fails. In the unit test the int64 shape tensor breaks TensorRT's
symbolic shape analysis, so network.add_concatenation(...) returns None and the converter raises:
============================= test session starts ==============================
test_converter_manipulation.py::TestStackTRTPattern::test_trt_result
[TRT] [E] [graphShapeAnalyzer.cpp::checkCalculationStatusSanity::2127] Error Code 2: Internal Error
(Assertion !isInFlight(p.second.symbolicRep) failed. ... graphShapeAnalyzer.cpp:2127)
FAILED
test_converter_manipulation.py::TestStackCase2TRTPattern::test_trt_result
[TRT] [E] [graphShapeAnalyzer.cpp::checkCalculationStatusSanity::2127] Error Code 2: Internal Error
(Assertion !isInFlight(p.second.symbolicRep) failed. ... graphShapeAnalyzer.cpp:2127)
FAILED
=================================== FAILURES ===================================
E AttributeError: 'NoneType' object has no attribute 'axis'
python/paddle/tensorrt/impls/manipulation.py:999: AttributeError
=========== 2 failed, 2 passed, 82 deselected ============
In larger graphs the same int64 shape tensor instead trips the explicit type error (observed with
PP-FormulaNet_plus-L):
[TRT] [E] ITensor::getDimensions: Error Code 4: API Usage Error
(..._pd_op.stack->after_shape_tensor(...): concat input tensors 0 and 1 have
incompatible types Int32 and Int64 In validateTypes at .../concatenationLayer.cpp:137)
Expected behavior: the converter should build the engine (the shape tensor should be int32, as the
rest of the shape subgraph expects).
其他补充信息 Additional Supplementary Information
Root cause. Paddle already anticipates the TRT-10 Shape→int64 change in
python/paddle/tensorrt/converter_utils.py::trt_shape(), which casts the Shape result back to int32
on TRT ≥ 10 (docstring: "casting the shape result(int64) from TRT10 back to int32 … Many existing
paddle op kernels only support input shape tensor as int32"). But stack_converter calls raw
network.add_shape(...) instead of trt_shape(...), so the int64 shape tensor flows into
add_concatenation next to an int32 add_1D_constant_layer(network, 1). The arange_converter has the
analogous issue: only its float branch casts the quotient to int32; the integer branch leaves it int64.
Environment. Paddle 3.3.0 (cu129) and current develop; TensorRT ≥ 10.8 (reproduced on 10.15.1.29);
CUDA 12.x. Reproduced identically on Turing (Tesla T4, sm_75) and Blackwell (RTX PRO 6000, sm_120) —
arch-independent (it is a dtype issue in the Python converters, not a kernel issue). It is dormant on
TensorRT < 10.8 (where Shape still returns int32), which is why the existing tests pass on current CI.
Regression sources: the pd_op.stack converter (#68839) and the pd_op.arange converter (#68757).
Fix. Route the stack shape subgraph through the existing trt_shape() helper, and cast the arange
integer quotient. A PR follows and will reference this issue.
bug描述 Describe the Bug
On TensorRT ≥ 10.8, the PIR→TensorRT converter for
pd_op.stackbuilds its output-shape subgraphfrom a raw int64
Shapetensor and combines it with int32 shape constants, which breaks theTensorRT engine build. (TensorRT 10.8 changed the
Shapeop's output dtype from int32 → int64.) Thesame latent inconsistency exists in the integer branch of
pd_op.arange. This makes the nativePaddle-TensorRT path (
run_mode='trt_fp16') unusable for any model containingstack(e.g.PP-FormulaNet,PP-OCRv5_server_rec,PP-DocLayout-L).Minimal reproducer — the repository's own converter tests fail on TensorRT ≥ 10.8:
Observed behavior: the engine build fails. In the unit test the int64 shape tensor breaks TensorRT's
symbolic shape analysis, so
network.add_concatenation(...)returnsNoneand the converter raises:In larger graphs the same int64 shape tensor instead trips the explicit type error (observed with
PP-FormulaNet_plus-L):[TRT] [E] ITensor::getDimensions: Error Code 4: API Usage Error (..._pd_op.stack->after_shape_tensor(...): concat input tensors 0 and 1 have incompatible types Int32 and Int64 In validateTypes at .../concatenationLayer.cpp:137)Expected behavior: the converter should build the engine (the shape tensor should be int32, as the
rest of the shape subgraph expects).
其他补充信息 Additional Supplementary Information
Root cause. Paddle already anticipates the TRT-10
Shape→int64 change inpython/paddle/tensorrt/converter_utils.py::trt_shape(), which casts theShaperesult back to int32on TRT ≥ 10 (docstring: "casting the shape result(int64) from TRT10 back to int32 … Many existing
paddle op kernels only support input shape tensor as int32"). But
stack_convertercalls rawnetwork.add_shape(...)instead oftrt_shape(...), so the int64 shape tensor flows intoadd_concatenationnext to an int32add_1D_constant_layer(network, 1). Thearange_converterhas theanalogous issue: only its float branch casts the quotient to int32; the integer branch leaves it int64.
Environment. Paddle 3.3.0 (cu129) and current
develop; TensorRT ≥ 10.8 (reproduced on 10.15.1.29);CUDA 12.x. Reproduced identically on Turing (Tesla T4, sm_75) and Blackwell (RTX PRO 6000, sm_120) —
arch-independent (it is a dtype issue in the Python converters, not a kernel issue). It is dormant on
TensorRT < 10.8 (where
Shapestill returns int32), which is why the existing tests pass on current CI.Regression sources: the
pd_op.stackconverter (#68839) and thepd_op.arangeconverter (#68757).Fix. Route the stack shape subgraph through the existing
trt_shape()helper, and cast the arangeinteger quotient. A PR follows and will reference this issue.