-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
[epic] Use custom CUDA stream for the entire codebase. #12122
Copy link
Copy link
Open
Description
This will be a long refactoring task. The objective is to enable the use of a custom CUDA stream to improve control over asynchronous memory allocation and to enable stream-specific device.
We have support for device ordinal cuda:1. This has been a pain point for XGBoost, yet it's a widely used feature. In CUDA 13, streams are implicitly attached to the device during creation. As a result, if we can use a custom stream, we can remove the C API guard and avoid initializing the CUDA context.
Plan:
- Provide optional context parameter in all storage class, including:
- DeviceUVector
- HostDeviceVector
- Tensor
- TemporaryArray
- Use stream-oriented memory allocation in the booster class.
- Use stream-oriented memory allocation in the DMatrix classes.
- Provide synchronization between the DMatrix and the booster.
- Remove the set device when a device ordinal is not provided.
- Remove C API guard. Verify that XGBoost doesn't initialize the CUDA context when CUDA is not used.
PRs:
Related:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels