Cuda NN inference. Example: ResNet18 in source/resnet18_main.cpp.
- Convolution - with/without bias, arbitrary padding, arbitrary stride.
- Linear - with/without bias.
- BatchNorm.
- ReLU.
- MaxPool - arbitrary padding, arbitrary stride.
- AvgPool - arbitrary padding, arbitrary stride.
- Tensor operations:
- common operations (+, -, *, /).
- transpose - arbitrary number of dimentions, arbitrary axes permutation.
- reshape.
- Relu, add_relu.
git clone https://github.com/shiyegao/CudaResnet18.git
cd CudaResnet18
This step is already finished in the folder weights. We use utils/check_npy.py to change a onnx file into npy files as for model weights.
If you want to use another onnx file as inputs, you should change the corresponding 'dic' and 'root' in utils/check_npy.py and run
python ./utils/check_npy.py # YOU DO NOT NEED TO RUN THIS CODE!
After installation and preparation, we need to build the codes. If you are already under the repository folder, just run
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j
After building, we can run the resnet_cuda codes if we are under build.
./Release/cuda_proj --weights_dir ../weights/
There are 4 folders which contain our main codes. We will introduce the function of each part.
CudaResnet18
|----include
|----*.hpp
|----*.cuh
|----*.h
|----source
|----*.cu
|----*.cc
|----*.cpp
|----utils
|----check_npy.py
|----weights
|----conv1.bias.npy
|----conv1.weight.npy
|----......
|----CMakeLists.txt
|----README.md
|----resnet18.onnx
|----resnet18Input.txt
|----resnet18Output.txt
Head files of cu and cpp are saved here.
Files of cu and cpp are saved here.
More details are in source/README.md.
The file to change .onnx file into .npy weights is saved here.
Weights files are saved here for model weights loading.
CMakeLists.txtis for building the project before running Resnet18.README.mdis for code reading.resnet18.onnxis the file saving the Resnet18 model and its weights.resnet18Input.txtis the input file for test.resnet18Output.txtis the standard output file for comparison with our output.