Skip to content

Latest commit

 

History

History
52 lines (35 loc) · 2.41 KB

File metadata and controls

52 lines (35 loc) · 2.41 KB

GPU Profiling on Ibex

This repository provides scripts, utilities, and sample code for profiling GPU workloads on the KAUST Ibex cluster. It is designed to help users benchmark, analyze, and optimize GPU-accelerated machine learning and deep learning applications using tools such as NVIDIA Nsight Systems (nsys) and nvprof.

Repository Structure

  • src/ — Source code and scripts for profiling and benchmarking.
    • *.py — Python scripts for running and profiling ML workloads (e.g., train.py, rapids_tsne.py).
    • *.sh — SLURM batch scripts and shell utilities for submitting jobs to Ibex (e.g., rapids_tsne.sh, check_for_the_nsys.sh).
    • cuda_matmul/ — Example CUDA code and Makefile for matrix multiplication profiling.
  • data/ — Input datasets and logs for profiling runs.
    • Contains sample of output logs from profiling tools.
  • test/ — (Reserved for test scripts or test data.)
  • LICENSE — License information for the repository.

Main Features

  • Profiling Scripts: Ready-to-use SLURM scripts for running and profiling PyTorch and RAPIDS workloads on NVIDIA GPUs.
  • Sample Workloads: Example scripts for deep learning (ResNet50 training) and RAPIDS cuML t-SNE dimensionality reduction.
  • Data Access: Scripts use /ibex/reference/CV/tinyimagenet/train to access required datasets. This access needs to be granted by the Ibex administrators.
  • NVTX Annotations: Key code sections are annotated for detailed profiling with NVIDIA tools.

Getting Started

  1. Clone the repository to your Ibex home or scratch directory.
  2. Review and edit the SLURM scripts in src/ to match your environment or profiling needs.
  3. Submit jobs using sbatch (e.g., sbatch src/rapids_tsne.sh).
  4. Analyze the output and profiling results in the generated logs and output files.

Requirements

  • Access to the KAUST Ibex cluster (or similar SLURM-based GPU cluster)
  • NVIDIA GPU (V100 or compatible)
  • Modules: CUDA, PyTorch, RAPIDS cuML, Nsight Systems, etc. (see SLURM scripts for specific versions)

Authorship

  • Primary Author: D-Barradas
  • Contributors: Please see commit history for additional contributors.

License

This repository is licensed under the terms of the LICENSE file provided.

Acknowledgments

  • KAUST Ibex support team
  • RAPIDS and PyTorch open-source communities

Contact

For questions or suggestions, please open an issue or contact the repository author.