Skip to content

[ONNX] ExportedProgram with register_buffer + in-place copy fails on torch.onnx.export #178868

@pilmokim

Description

@pilmokim

🐛 Describe the bug

torch.onnx.export fails with ValueError when exporting an ExportedProgram that contains register_buffer + in-place tensor assignment.

The root cause is in _handle_call_function_node_with_lowering() in torch/onnx/_internal/exporter/_core.py. When aten.copy.default is translated to ONNX, op.CastLike(src, self) returns the same IR value object as the input (identity passthrough for same-dtype), but line ~141 unconditionally renames it with outputs.name = node.name, destroying the original placeholder name.

Error message

ValueError: Key 'b_prompt_feat' does not match the name of the value 'copy'.
Please use the value.name as the key.

Reproduction

import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.register_buffer('prompt_feat', torch.zeros(1, 4, 8))
        self.linear = nn.Linear(8, 8)

    def forward(self, x):
        out = torch.zeros(1, 4, 8)
        out[:, :, :] = self.prompt_feat.to(x.device)
        return self.linear(out + x)

model = Model().eval()
x = torch.randn(1, 4, 8)

ep = torch.export.export(model, (x,))
torch.onnx.export(ep, args=(x,), f="test.onnx")  # ValueError

Root cause analysis

1. Export creates buffer placeholder

torch.export converts self.prompt_feat to a graph placeholder named b_prompt_feat:

graph_signature.inputs_to_buffers: 'b_prompt_feat' → 'prompt_feat'

2. ONNX decomposition creates aten.copy.default

The in-place copy_ is decomposed into:

[b_prompt_feat]  placeholder [1, 4, 8] float32
[slice]          aten.slice(zeros, ...)
[copy]           aten.copy.default(slice, b_prompt_feat)   ← both float32
[slice_scatter]  aten.slice_scatter(zeros, copy, ...)

3. aten_copy ONNX handler returns input by reference

# torchlib: aten_copy
@torch_op("aten::copy", trace_only=True)
def aten_copy(self, src, non_blocking=False):
    return op.CastLike(src, self)

When src and self have the same dtype (both float32), CastLike returns the same IR value object as src — no new node is created.

4. outputs.name = node.name destroys the placeholder name

In _handle_call_function_node_with_lowering():

# line ~139-141
node_name_to_values[node.name] = outputs   # values['copy'] = value_A
outputs.name = node.name                    # value_A.name = 'copy'  ← BUG

Since outputs is the same object as values['b_prompt_feat'], this overwrites the placeholder's name from 'b_prompt_feat' to 'copy'.

5. Initializer registration fails

# _exported_program_to_onnx_program, line ~140
model.graph.initializers['b_prompt_feat'] = value
# key='b_prompt_feat' but value.name='copy' → ValueError

Suggested fix

In _handle_call_function_node_with_lowering(), avoid renaming if the output value is already used by a previous node:

# Before (line ~141):
outputs.name = node.name

# After:
if outputs.producer() is not None:
    # Output is a new value produced by a new ONNX node — safe to rename
    outputs.name = node.name
# else: output is an existing value (identity passthrough) — do not rename

Alternatively, ensure aten_copy always creates a new ONNX node (e.g. op.Identity(op.CastLike(src, self))).

Versions

  • torch: 2.9.0 ~ 2.11.0 (all reproduce the same bug)
  • onnx-ir: 0.1.12, 0.2.0 (both reproduce)
  • Python: 3.10

cc @justinchuby @titaiwangms

Versions

tested 2.9, 2.10, 2.11

Metadata

Metadata

Labels

actionablebot-triagedThis is a label only to be used by the auto triage botmodule: onnxRelated to torch.onnxtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

Inbox

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions