Skip to content

[epic] Use custom CUDA stream for the entire codebase. #12122

@trivialfis

Description

@trivialfis

This will be a long refactoring task. The objective is to enable the use of a custom CUDA stream to improve control over asynchronous memory allocation and to enable stream-specific device.

We have support for device ordinal cuda:1. This has been a pain point for XGBoost, yet it's a widely used feature. In CUDA 13, streams are implicitly attached to the device during creation. As a result, if we can use a custom stream, we can remove the C API guard and avoid initializing the CUDA context.

Plan:

  • Provide optional context parameter in all storage class, including:
    • DeviceUVector
    • HostDeviceVector
    • Tensor
    • TemporaryArray
  • Use stream-oriented memory allocation in the booster class.
  • Use stream-oriented memory allocation in the DMatrix classes.
  • Provide synchronization between the DMatrix and the booster.
  • Remove the set device when a device ordinal is not provided.
  • Remove C API guard. Verify that XGBoost doesn't initialize the CUDA context when CUDA is not used.

PRs:

Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions