I recently came across the need to profile sections inside a single CUDA kernel. We wanted to figure out which subpart of the kernel consumes how much time. In Bactria terms, that would mean that we could enter and leave phases and sectors inside CUDA device code.
Is such a feature planned or in scope of bactria?
I recently came across the need to profile sections inside a single CUDA kernel. We wanted to figure out which subpart of the kernel consumes how much time. In Bactria terms, that would mean that we could enter and leave phases and sectors inside CUDA device code.
Is such a feature planned or in scope of bactria?