[PIR-TensorRT] pd_op.stack / pd_op.arange converters fail the TensorRT engine build on TRT >= 10.8 (int64 shape tensor / Int32-Int64 mismatch)

### bug描述 Describe the Bug

On **TensorRT ≥ 10.8**, the PIR→TensorRT converter for `pd_op.stack` builds its output-shape subgraph
from a **raw int64 `Shape` tensor** and combines it with int32 shape constants, which breaks the
TensorRT engine build. (TensorRT 10.8 changed the `Shape` op's output dtype from int32 → int64.) The
same latent inconsistency exists in the integer branch of `pd_op.arange`. This makes the native
Paddle-TensorRT path (`run_mode='trt_fp16'`) unusable for any model containing `stack` (e.g.
`PP-FormulaNet`, `PP-OCRv5_server_rec`, `PP-DocLayout-L`).

**Minimal reproducer** — the repository's *own* converter tests fail on TensorRT ≥ 10.8:

```shell
# env: paddlepaddle-gpu==3.3.0 (cu129) + tensorrt==10.15.1.29, CUDA 12.x, any NVIDIA GPU
python -m pytest -v -s test/tensorrt/test_converter_manipulation.py -k "Stack"
```

```python
# equivalently, any stack under the native TRT path on TRT>=10.8:
import numpy as np, paddle
from paddlex import create_model, PaddlePredictorOption   # or paddle.tensorrt.export.convert
opt = PaddlePredictorOption(); opt.run_mode = "trt_fp16"
# build any model whose graph contains a `stack` (e.g. PP-OCRv5_server_rec) -> engine build fails
```

**Observed behavior:** the engine build fails. In the unit test the int64 shape tensor breaks TensorRT's
symbolic shape analysis, so `network.add_concatenation(...)` returns `None` and the converter raises:

```shell
============================= test session starts ==============================
test_converter_manipulation.py::TestStackTRTPattern::test_trt_result
[TRT] [E] [graphShapeAnalyzer.cpp::checkCalculationStatusSanity::2127] Error Code 2: Internal Error
        (Assertion !isInFlight(p.second.symbolicRep) failed. ... graphShapeAnalyzer.cpp:2127)
FAILED
test_converter_manipulation.py::TestStackCase2TRTPattern::test_trt_result
[TRT] [E] [graphShapeAnalyzer.cpp::checkCalculationStatusSanity::2127] Error Code 2: Internal Error
        (Assertion !isInFlight(p.second.symbolicRep) failed. ... graphShapeAnalyzer.cpp:2127)
FAILED
=================================== FAILURES ===================================
E       AttributeError: 'NoneType' object has no attribute 'axis'
python/paddle/tensorrt/impls/manipulation.py:999: AttributeError
=========== 2 failed, 2 passed, 82 deselected ============
```

In larger graphs the **same** int64 shape tensor instead trips the explicit type error (observed with
`PP-FormulaNet_plus-L`):

```shell
[TRT] [E] ITensor::getDimensions: Error Code 4: API Usage Error
   (..._pd_op.stack->after_shape_tensor(...): concat input tensors 0 and 1 have
    incompatible types Int32 and Int64  In validateTypes at .../concatenationLayer.cpp:137)
```

**Expected behavior:** the converter should build the engine (the shape tensor should be int32, as the
rest of the shape subgraph expects).

### 其他补充信息 Additional Supplementary Information

**Root cause.** Paddle already anticipates the TRT-10 `Shape`→int64 change in
`python/paddle/tensorrt/converter_utils.py::trt_shape()`, which casts the `Shape` result back to int32
on TRT ≥ 10 (docstring: *"casting the shape result(int64) from TRT10 back to int32 … Many existing
paddle op kernels only support input shape tensor as int32"*). But `stack_converter` calls **raw
`network.add_shape(...)`** instead of `trt_shape(...)`, so the int64 shape tensor flows into
`add_concatenation` next to an int32 `add_1D_constant_layer(network, 1)`. The `arange_converter` has the
analogous issue: only its float branch casts the quotient to int32; the integer branch leaves it int64.

**Environment.** Paddle 3.3.0 (cu129) and current `develop`; TensorRT ≥ 10.8 (reproduced on 10.15.1.29);
CUDA 12.x. Reproduced **identically on Turing (Tesla T4, sm_75) and Blackwell (RTX PRO 6000, sm_120)** —
arch-independent (it is a dtype issue in the Python converters, not a kernel issue). It is dormant on
TensorRT < 10.8 (where `Shape` still returns int32), which is why the existing tests pass on current CI.

**Regression sources:** the `pd_op.stack` converter (#68839) and the `pd_op.arange` converter (#68757).

**Fix.** Route the stack shape subgraph through the existing `trt_shape()` helper, and cast the arange
integer quotient. A PR follows and will reference this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PIR-TensorRT] pd_op.stack / pd_op.arange converters fail the TensorRT engine build on TRT >= 10.8 (int64 shape tensor / Int32-Int64 mismatch) #79319

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[PIR-TensorRT] pd_op.stack / pd_op.arange converters fail the TensorRT engine build on TRT >= 10.8 (int64 shape tensor / Int32-Int64 mismatch) #79319

Description

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions