Skip to content

[PIR-TensorRT] pd_op.stack / pd_op.arange converters fail the TensorRT engine build on TRT >= 10.8 (int64 shape tensor / Int32-Int64 mismatch) #79319

@WheelWell9876

Description

@WheelWell9876

bug描述 Describe the Bug

On TensorRT ≥ 10.8, the PIR→TensorRT converter for pd_op.stack builds its output-shape subgraph
from a raw int64 Shape tensor and combines it with int32 shape constants, which breaks the
TensorRT engine build. (TensorRT 10.8 changed the Shape op's output dtype from int32 → int64.) The
same latent inconsistency exists in the integer branch of pd_op.arange. This makes the native
Paddle-TensorRT path (run_mode='trt_fp16') unusable for any model containing stack (e.g.
PP-FormulaNet, PP-OCRv5_server_rec, PP-DocLayout-L).

Minimal reproducer — the repository's own converter tests fail on TensorRT ≥ 10.8:

# env: paddlepaddle-gpu==3.3.0 (cu129) + tensorrt==10.15.1.29, CUDA 12.x, any NVIDIA GPU
python -m pytest -v -s test/tensorrt/test_converter_manipulation.py -k "Stack"
# equivalently, any stack under the native TRT path on TRT>=10.8:
import numpy as np, paddle
from paddlex import create_model, PaddlePredictorOption   # or paddle.tensorrt.export.convert
opt = PaddlePredictorOption(); opt.run_mode = "trt_fp16"
# build any model whose graph contains a `stack` (e.g. PP-OCRv5_server_rec) -> engine build fails

Observed behavior: the engine build fails. In the unit test the int64 shape tensor breaks TensorRT's
symbolic shape analysis, so network.add_concatenation(...) returns None and the converter raises:

============================= test session starts ==============================
test_converter_manipulation.py::TestStackTRTPattern::test_trt_result
[TRT] [E] [graphShapeAnalyzer.cpp::checkCalculationStatusSanity::2127] Error Code 2: Internal Error
        (Assertion !isInFlight(p.second.symbolicRep) failed. ... graphShapeAnalyzer.cpp:2127)
FAILED
test_converter_manipulation.py::TestStackCase2TRTPattern::test_trt_result
[TRT] [E] [graphShapeAnalyzer.cpp::checkCalculationStatusSanity::2127] Error Code 2: Internal Error
        (Assertion !isInFlight(p.second.symbolicRep) failed. ... graphShapeAnalyzer.cpp:2127)
FAILED
=================================== FAILURES ===================================
E       AttributeError: 'NoneType' object has no attribute 'axis'
python/paddle/tensorrt/impls/manipulation.py:999: AttributeError
=========== 2 failed, 2 passed, 82 deselected ============

In larger graphs the same int64 shape tensor instead trips the explicit type error (observed with
PP-FormulaNet_plus-L):

[TRT] [E] ITensor::getDimensions: Error Code 4: API Usage Error
   (..._pd_op.stack->after_shape_tensor(...): concat input tensors 0 and 1 have
    incompatible types Int32 and Int64  In validateTypes at .../concatenationLayer.cpp:137)

Expected behavior: the converter should build the engine (the shape tensor should be int32, as the
rest of the shape subgraph expects).

其他补充信息 Additional Supplementary Information

Root cause. Paddle already anticipates the TRT-10 Shape→int64 change in
python/paddle/tensorrt/converter_utils.py::trt_shape(), which casts the Shape result back to int32
on TRT ≥ 10 (docstring: "casting the shape result(int64) from TRT10 back to int32 … Many existing
paddle op kernels only support input shape tensor as int32"
). But stack_converter calls raw
network.add_shape(...)
instead of trt_shape(...), so the int64 shape tensor flows into
add_concatenation next to an int32 add_1D_constant_layer(network, 1). The arange_converter has the
analogous issue: only its float branch casts the quotient to int32; the integer branch leaves it int64.

Environment. Paddle 3.3.0 (cu129) and current develop; TensorRT ≥ 10.8 (reproduced on 10.15.1.29);
CUDA 12.x. Reproduced identically on Turing (Tesla T4, sm_75) and Blackwell (RTX PRO 6000, sm_120)
arch-independent (it is a dtype issue in the Python converters, not a kernel issue). It is dormant on
TensorRT < 10.8 (where Shape still returns int32), which is why the existing tests pass on current CI.

Regression sources: the pd_op.stack converter (#68839) and the pd_op.arange converter (#68757).

Fix. Route the stack shape subgraph through the existing trt_shape() helper, and cast the arange
integer quotient. A PR follows and will reference this issue.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions