Skip to content

Torchscript Trace slower with C++ runtime environment. #902

@sukuya

Description

@sukuya

I traced the BERT model from PyTorchTransformers library and getting the following results for 10 iterations.
a) Using Python runtime for running the forward: 979,292 µs

import time
model = torch.jit.load('models_backup/2_2.pt')
x = torch.randint(2000, (1, 14), dtype=torch.long, device='cpu')
start = time.time()
for i in range(10):
    model(x)
end = time.time()
print((end - start)*1000000, "µs")

b) Using C++ runtime for running the forward: 3,333,758 µs which is almost 3x of what Python

torch::Tensor x = torch::randint(index_max, {1, inputsize}, torch::dtype(torch::kInt64).device(torch::kCPU));
input.push_back(x);
#endif
// Execute the model and turn its output into a tensor.
auto outputs = module->forward(input).toTuple();
auto start = chrono::steady_clock::now();
for (int16_t i = 0; i<10; ++i)
{
  outputs = module->forward(input).toTuple();
}
auto end = chrono::steady_clock::now();
cout << "Elapsed time in microseconds : " 
  	<< chrono::duration_cast<chrono::microseconds>(end - start).count()
  	<< " µs" << endl;

@thomwolf any suggestions on what am I missing ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions