Optional context for the host device vector. by trivialfis · Pull Request #12121 · dmlc/xgboost

trivialfis · 2026-03-23T10:20:50Z

This is the first PR to add an optional parameter to the HostDeviceVector and the Tensor class.
ref: #12122

Copilot

Pull request overview

This PR introduces an optional Context parameter to HostDeviceVector and linalg::Tensor APIs to enable future stream-aware CUDA behavior (custom streams / device-ordinal handling) and begins updating call sites to pass context through.

Changes:

Extend HostDeviceVector and linalg::Tensor interfaces to accept an optional Context const*.
Update GPU transfer/sync paths in HostDeviceVector to use a stream from CUDAContext when provided.
Adjust predictor and CUDA unit tests to pass Context into the updated APIs.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
include/xgboost/host_device_vector.h	Adds optional `Context` parameters to most `HostDeviceVector` APIs.
src/common/host_device_vector.cu	Wires stream selection for copies/fill and updates method signatures to accept context.
src/common/host_device_vector.cc	Updates non-CUDA fallback implementation to match new signatures.
include/xgboost/linalg.h	Threads optional context through `Tensor`/views and `MakeTensorView` helpers.
src/predictor/predictor.cc	Passes predictor context into `HostDeviceVector` operations for predictions init/fill/copy.
tests/cpp/common/test_host_device_vector.cu	Updates tests to construct and pass `Context` through updated APIs.
tests/cpp/common/test_linalg.h	Updates linear algebra dispatch helper to pass `Context` into views/host views.
tests/cpp/common/test_linalg.cu	Updates CUDA linalg tests to pass context into `Tensor` and `View` calls.

Comments suppressed due to low confidence (1)

src/common/host_device_vector.cu:43

HostDeviceVectorImpl(size_t, T, DeviceOrd, CUDAContext const* ctx) takes ctx but doesn't use it. This can trigger -Wunused-parameter warnings (often treated as errors) and is confusing given the new stream-aware API. Consider removing the parameter name (or marking it [[maybe_unused]] / casting to void), or actually using it where appropriate.

  HostDeviceVectorImpl(size_t size, T v, DeviceOrd device, CUDAContext const* ctx)
      : device_(device) {
    if (device.IsCUDA()) {
      gpu_access_ = GPUAccess::kWrite;
      SetDevice();
      data_d_->resize(size, v);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/common/host_device_vector.cu

Copilot

Pull request overview

Copilot reviewed 33 out of 33 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (2)

src/common/device_compression.cu:372

In CompressSnappy, DeviceUVector now requires a stream for mutations. p_out->resize(...) (and the earlier p_out->clear() in the same function) should take the CUDA context stream to avoid compilation issues and to ensure allocations happen on the intended stream for cudaMallocAsync-backed allocators.

  SafeNvComp(nvcompBatchedSnappyCompressGetMaxOutputChunkSize(
      std::min(max_in_nbytes, chunk_size), nvcomp_batched_snappy_opts, &max_out_nbytes));
  p_out->resize(max_out_nbytes * n_chunks);
  std::vector<void*> h_out_ptrs(n_chunks);

src/data/ellpack_page_source.cu:235

d_page is allocated using ctx.CUDACtx()->Stream(), but the subsequent cudaMemcpyAsync to/from that buffer omits the stream parameter (defaults to legacy stream). This can introduce cross-stream races when a custom stream is used; pass ctx.CUDACtx()->Stream() to the memcpy (and keep stream usage consistent for the earlier host-cache memcpy as well).

      auto d_page = common::MakeFixedVecWithCudaMalloc<common::CompressedByteT>(&ctx, remaining);
      if (remaining > 0) {
        dh::safe_cuda(cudaMemcpyAsync(d_page.data(), old_impl->gidx_buffer.data() + n_bytes,
                                      remaining, cudaMemcpyDefault));
      }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/common/ref_resource_view.cuh

include/xgboost/linalg.h

include/xgboost/host_device_vector.h

src/tree/gpu_hist/evaluate_splits.cu

src/common/device_compression.cu

src/data/ellpack_page_raw_format.cu

trivialfis requested a review from Copilot March 23, 2026 10:21

Copilot started reviewing on behalf of trivialfis March 23, 2026 10:23 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

src/common/host_device_vector.cu Outdated Show resolved Hide resolved

trivialfis mentioned this pull request Mar 23, 2026

[epic] Use custom CUDA stream for the entire codebase. #12122

Open

10 tasks

trivialfis requested a review from Copilot March 24, 2026 10:46

Copilot started reviewing on behalf of trivialfis March 24, 2026 10:46 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

trivialfis added 21 commits April 2, 2026 00:37

hdv.

0c11207

tensor.

eb373e4

Use it in the predictor.

9fac4f3

lint.

d262eb8

sycl, remove dft.

c5e1aa5

lint.

4759f6a

device uvector ctor.

42d1203

resize.

1c65de0

win.

136678d

Remove the default.

9f9febd

rest of the pipeline.

12ca8e7

more coverage.

a0adb24

cleanups.

ad57ebf

nvcomp.

ca5dd25

lint.

913e3a3

More.

442a2e2

sketch.

98186c3

tree.

821eaa8

lint.

23c59fd

sycl.

5baa480

note.

dbac855

trivialfis force-pushed the ctx-stream-hdv branch from a790207 to dbac855 Compare April 1, 2026 16:38

trivialfis mentioned this pull request Apr 1, 2026

Use context for CUDA external memory DMatrix. #12137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optional context for the host device vector.#12121

Optional context for the host device vector.#12121
trivialfis wants to merge 21 commits intodmlc:masterfrom
trivialfis:ctx-stream-hdv

trivialfis commented Mar 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

trivialfis commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

trivialfis commented Mar 23, 2026 •

edited

Loading