Describe the bug
I get this error:
Unhandled exception: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
---> Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:NotImplemented] Failed to find kernel for MemcpyToHost(1) (node Memcpy). Kernel not found
at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus)
at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer)
at Microsoft.ML.OnnxRuntime.InferenceSession..ctor(String modelPath, SessionOptions options)
This is a minimum test (also mentioned here).
class MinTest(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
top_probs, top = x.max(1, keepdim=True)
return top_probs, top
This is the exported model:

I can us this function to do basic cleanup and the graph would be the same:
def optimize_graph(onnxfile, onnxfile_optimized=None):
import onnxruntime as rt
if not onnxfile_optimized:
onnxfile_optimized = onnxfile[:-5] + "_optimized.onnx" # ONNX optimizer is broken, using ORT to optimzie
sess_options = rt.SessionOptions()
sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_ENABLE_BASIC
sess_options.optimized_model_filepath = onnxfile_optimized
_ = rt.InferenceSession(onnxfile, sess_options, providers=['CPUExecutionProvider'])
return onnxfile_optimized
But if I use CUDA EP for clean up:
def optimize_graph(onnxfile, onnxfile_optimized=None):
import onnxruntime as rt
if not onnxfile_optimized:
onnxfile_optimized = onnxfile[:-5] + "_optimized.onnx" # ONNX optimizer is broken, using ORT to optimzie
sess_options = rt.SessionOptions()
sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_ENABLE_BASIC
sess_options.optimized_model_filepath = onnxfile_optimized
_ = rt.InferenceSession(onnxfile, sess_options, providers=['CUDAExecutionProvider'])
return onnxfile_optimized
The graph adds this problematic node

System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows
- ONNX Runtime installed from (source or binary): 1.10
- ONNX Runtime version: 1.10
- Python version:
- Visual Studio version (if applicable):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version:
- GPU model and memory:
Expected behavior
Optimization should not add that extra incompatible node
Describe the bug
I get this error:
This is a minimum test (also mentioned here).
This is the exported model:
I can us this function to do basic cleanup and the graph would be the same:
But if I use CUDA EP for clean up:
The graph adds this problematic node
System information
Expected behavior
Optimization should not add that extra incompatible node