This repository provides scripts, utilities, and sample code for profiling GPU workloads on the KAUST Ibex cluster. It is designed to help users benchmark, analyze, and optimize GPU-accelerated machine learning and deep learning applications using tools such as NVIDIA Nsight Systems (nsys) and nvprof.
src/— Source code and scripts for profiling and benchmarking.*.py— Python scripts for running and profiling ML workloads (e.g.,train.py,rapids_tsne.py).*.sh— SLURM batch scripts and shell utilities for submitting jobs to Ibex (e.g.,rapids_tsne.sh,check_for_the_nsys.sh).cuda_matmul/— Example CUDA code and Makefile for matrix multiplication profiling.
data/— Input datasets and logs for profiling runs.- Contains sample of output logs from profiling tools.
test/— (Reserved for test scripts or test data.)LICENSE— License information for the repository.
- Profiling Scripts: Ready-to-use SLURM scripts for running and profiling PyTorch and RAPIDS workloads on NVIDIA GPUs.
- Sample Workloads: Example scripts for deep learning (ResNet50 training) and RAPIDS cuML t-SNE dimensionality reduction.
- Data Access: Scripts use
/ibex/reference/CV/tinyimagenet/trainto access required datasets. This access needs to be granted by theIbexadministrators. - NVTX Annotations: Key code sections are annotated for detailed profiling with NVIDIA tools.
- Clone the repository to your Ibex home or scratch directory.
- Review and edit the SLURM scripts in
src/to match your environment or profiling needs. - Submit jobs using
sbatch(e.g.,sbatch src/rapids_tsne.sh). - Analyze the output and profiling results in the generated logs and output files.
- Access to the KAUST Ibex cluster (or similar SLURM-based GPU cluster)
- NVIDIA GPU (V100 or compatible)
- Modules: CUDA, PyTorch, RAPIDS cuML, Nsight Systems, etc. (see SLURM scripts for specific versions)
- Primary Author: D-Barradas
- Contributors: Please see commit history for additional contributors.
This repository is licensed under the terms of the LICENSE file provided.
- KAUST Ibex support team
- RAPIDS and PyTorch open-source communities
For questions or suggestions, please open an issue or contact the repository author.