Use context for CUDA external memory DMatrix. by trivialfis · Pull Request #12137 · dmlc/xgboost

trivialfis · 2026-04-01T18:05:48Z

The change by #12121 is too significant, starting with a smaller place.

Use a sync policy to let the caller handle stream sync. This avoids flaky lifetime management in the dtor.

Copilot

Pull request overview

This PR threads Context through CUDA external-memory ELLPACK/IterativeDMatrix paths so page creation, cache IO, and raw format read/write can use the caller-provided CUDA context (instead of constructing ad-hoc contexts / default-stream assumptions).

Changes:

Add Context const* plumbing to IterativeDMatrix::Load, EllpackPage construction, EllpackPageRawFormat, and host-cache write paths.
Introduce MakePage() hooks in format policies to centralize page allocation (supporting context-aware page types).
Update CUDA tests to use the new context-aware constructors/signatures.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/cpp/data/test_iterative_dmatrix.cu	Update `IterativeDMatrix::Load` call to pass `Context`.
tests/cpp/data/test_ellpack_page_raw_format.cu	Update ELLPACK raw-format tests for context-aware `SetCuts` and `EllpackPage` construction.
src/data/sparse_page_source.h	Add `MakePage()` to format policy and use it during prefetch page allocation.
src/data/iterative_dmatrix.h	Change `IterativeDMatrix::Load` signature to accept `Context const*`.
src/data/iterative_dmatrix.cu	Pass context into `EllpackPage` construction and ELLPACK raw format read/write + `Load`.
src/data/iterative_dmatrix.cc	Update non-CUDA stub `Load` signature to match header.
src/data/gradient_index_page_source.h	Add `MakePage()` for gradient index format policy.
src/data/extmem_quantile_dmatrix.cc	Initialize and use `fmat_ctx_` consistently for CPU/GPU init and category sync.
src/data/ellpack_page.h	Make `EllpackPage` require an explicit `Context const*` for empty-page construction.
src/data/ellpack_page.cuh	Make `EllpackPageImpl` require context and store a `Context const*`.
src/data/ellpack_page.cu	Wire context into `EllpackPageImpl` construction and use context stream in destructor sync.
src/data/ellpack_page.cc	Update non-CUDA dummy `EllpackPage` ctor signature.
src/data/ellpack_page_source.h	Store context in `EllpackFormatPolicy`, pass it into `EllpackPageRawFormat`, and create pages with it.
src/data/ellpack_page_source.cu	Require context for host-cache write and for constructing pages/impls during fetch/caching.
src/data/ellpack_page_raw_format.h	Add `Context const*` member and update ctor signature.
src/data/ellpack_page_raw_format.cu	Use provided context/stream for device copies, cache IO, and sync.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T18:34:57Z

src/data/iterative_dmatrix.cu

  if (!ellpack_) {
-    ellpack_.reset(new EllpackPage());
+    ellpack_.reset(new EllpackPage{&fmat_ctx_});
    if (ctx->IsCUDA()) {
      this->Info().feature_types.SetDevice(ctx->Device());
      *ellpack_->Impl() =


EllpackPageImpl now stores a raw Context const* (used by its destructor for stream sync). In this code path, the else branch below constructs a temporary CUDA context via ctx->MakeCUDA() and passes its address into EllpackPageImpl, which will leave impl->ctx_ dangling once the branch exits and can trigger use-after-free later (e.g. in ~EllpackPageImpl()). Ensure the context pointer stored inside EllpackPageImpl always outlives the page (e.g. persist the CUDA context in a longer-lived owner like fmat_ctx_, or avoid storing a raw Context* in EllpackPageImpl and store a device/stream handle instead).

src/data/iterative_dmatrix.cu

src/data/ellpack_page.cu

Copilot · 2026-04-01T18:34:58Z

src/data/ellpack_page_raw_format.cu

  *vec = common::MakeFixedVecWithCudaMalloc<T>(n);
  dh::safe_cuda(
-      cudaMemcpyAsync(vec->data(), ptr, n_bytes, cudaMemcpyDefault, curt::DefaultStream()));
+      cudaMemcpyAsync(vec->data(), ptr, n_bytes, cudaMemcpyDefault, ctx->CUDACtx()->Stream()));
  return true;


ReadDeviceVec now takes a Context const* and immediately uses ctx->CUDACtx()->Stream() for cudaMemcpyAsync, but it never validates that ctx is non-null / CUDA, nor does it ensure the current CUDA device matches ctx's ordinal. This can lead to invalid stream/device usage when the caller's current device differs from ctx. Add a CHECK(ctx && ctx->IsCUDA()) and set the device (e.g. curt::SetDevice(ctx->Ordinal())) before allocating/copying.

Copilot · 2026-04-01T18:34:59Z

src/data/ellpack_page_raw_format.cu

  if (this->param_.prefetch_copy || !has_hmm_ats_) {
-    RET_IF_NOT(ReadDeviceVec(fi, &impl->gidx_buffer));
+    RET_IF_NOT(ReadDeviceVec(ctx_, fi, &impl->gidx_buffer));
  } else {


EllpackPageRawFormat::Read uses ctx_ (stored in the format object) for device copies and stream sync, but there’s no precondition check that ctx_ is non-null / CUDA or that the active device matches ctx_. Since the constructor now accepts an arbitrary context pointer, add validation (and potentially a device set) at the start of Read/Write to prevent null deref or cross-device stream usage.

RAMitchell

I find it a bit weird passing a raw pointer all the time without any guardrails for its state, but I guess we strongly guarantee that there is only ever one context and it always exists.

trivialfis · 2026-04-07T20:19:07Z

I find it a bit weird passing a raw pointer all the time without any guardrails for its state

That's indeed a significant issue. I would like to use a shared ptr instead at some point.

trivialfis requested review from RAMitchell and Copilot and removed request for Copilot April 1, 2026 18:26

Copilot started reviewing on behalf of trivialfis April 1, 2026 18:27 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

trivialfis added 6 commits April 2, 2026 05:04

Pass the member context.

e2028d8

Use context in ExtMemQuantileDMatrix.

d9fbab9

lint.

7e8a6cc

cpu build.

115dea7

Fix.

5fb7aeb

Fix

54dd6a7

trivialfis force-pushed the ext-ctx branch from 0614baf to 54dd6a7 Compare April 1, 2026 21:05

trivialfis added 5 commits April 2, 2026 05:14

Fix.

cd03fbc

Use explicit stream sync.

1242879

CPU build.

65e7486

lint.

101add0

lint.

63370db

RAMitchell approved these changes Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use context for CUDA external memory DMatrix.#12137

Use context for CUDA external memory DMatrix.#12137
trivialfis wants to merge 11 commits intodmlc:masterfrom
trivialfis:ext-ctx

trivialfis commented Apr 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

RAMitchell left a comment

Uh oh!

trivialfis commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

trivialfis commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

RAMitchell left a comment

Choose a reason for hiding this comment

Uh oh!

trivialfis commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

trivialfis commented Apr 1, 2026 •

edited

Loading