Torchscript Trace slower with C++ runtime environment.

I traced the BERT model from PyTorchTransformers library and getting the following results for 10 iterations.
a) Using Python runtime for running the forward: 979,292 µs

```
import time
model = torch.jit.load('models_backup/2_2.pt')
x = torch.randint(2000, (1, 14), dtype=torch.long, device='cpu')
start = time.time()
for i in range(10):
    model(x)
end = time.time()
print((end - start)*1000000, "µs")
```
b) Using C++ runtime for running the forward: 3,333,758 µs which is almost 3x of what Python

  ```
torch::Tensor x = torch::randint(index_max, {1, inputsize}, torch::dtype(torch::kInt64).device(torch::kCPU));
  input.push_back(x);
  #endif
  // Execute the model and turn its output into a tensor.
  auto outputs = module->forward(input).toTuple();
  auto start = chrono::steady_clock::now();
  for (int16_t i = 0; i<10; ++i)
  {
    outputs = module->forward(input).toTuple();
  }
  auto end = chrono::steady_clock::now();
  cout << "Elapsed time in microseconds : " 
		<< chrono::duration_cast<chrono::microseconds>(end - start).count()
		<< " µs" << endl;
```
@thomwolf  any suggestions on what am I missing ? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torchscript Trace slower with C++ runtime environment. #902

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Torchscript Trace slower with C++ runtime environment. #902

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions