Skip to content

External tensors fail to load when only a file name is provided #542

@mneilly

Description

@mneilly

When the location key of the external data field contains a plain file name the tensors are not loaded and onnx2trt fails with the following:

$ onnx2trt -o dlrm_s_pytorch.trt dlrm_s_pytorch.onnx 
----------------------------------------------------------------
Input filename:   dlrm_s_pytorch.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.8
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
Parsing model
[2020-10-15 05:32:39 WARNING] [TRT]/local/tensorRT/onnx-tensorrt/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[2020-10-15 05:32:39   ERROR] [TRT]/local/tensorRT/onnx-tensorrt/onnx2trt_utils.cpp:1312: Failed to open file: 
ERROR: /local/tensorRT/onnx-tensorrt/ModelImporter.cpp:92 In function parseGraph:
[8] Assertion failed: convertOnnxWeights(initializer, &weights, ctx)

The attached dlrm model uses external tensors where location is "bot_l.0.weight" and the expectation is that the weights will be loaded from the same directory as the model.

The following output from onnx python shows the values of location for the weights which contain only the file name and no path:

>>> import onnx
>>> m = onnx.load('dlrm_s_pytorch.onnx')
>>> [(i.name, i.data_type, i.data_location, i.external_data) for i in m.graph.initializer if i.name.endswith('weight')]
[('bot_l.0.weight', 1, 1, [key: "location"
value: "bot_l.0.weight"
]), ('bot_l.2.weight', 1, 1, [key: "location"
value: "bot_l.2.weight"
]), ('emb_l.0.weight', 1, 1, [key: "location"
value: "emb_l.0.weight"
]), ('emb_l.1.weight', 1, 1, [key: "location"
value: "emb_l.1.weight"
]), ('emb_l.2.weight', 1, 1, [key: "location"
value: "emb_l.2.weight"
]), ('emb_l.3.weight', 1, 1, [key: "location"
value: "emb_l.3.weight"
]), ('top_l.0.weight', 1, 1, [key: "location"
value: "top_l.0.weight"
]), ('top_l.2.weight', 1, 1, [key: "location"
value: "top_l.2.weight"
]), ('top_l.4.weight', 1, 0, [])]

Note that this onnx model does appear to have both external_data and raw_data for the weights but that doesn't appear to effect this issue.

The zip file contains the top and bottom MLP weights but excludes the embedding tables since they would make the zip file too large to upload here.

dlrm-external-tensors.zip

The following change appears to resolve the issue:

$ git diff
diff --git a/onnx2trt_utils.cpp b/onnx2trt_utils.cpp
index ceff92d..0c905a6 100644
--- a/onnx2trt_utils.cpp
+++ b/onnx2trt_utils.cpp
@@ -1306,6 +1306,10 @@ bool parseExternalWeights(IImporterContext* ctx, std::string file, std::string p
     {
         path.replace(slash + 1, path.size() - (slash + 1), file);
     }
+    else
+    {
+        path = file;
+    }
     std::ifstream relPathFile(path, std::ios::binary | std::ios::ate);
     if (!relPathFile)
     {

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions