Skip to content

Extract weighted quantile cut fixes from #12129#12146

Merged
RAMitchell merged 10 commits intodmlc:masterfrom
RAMitchell:quantile-cut-query-fixes
Apr 8, 2026
Merged

Extract weighted quantile cut fixes from #12129#12146
RAMitchell merged 10 commits intodmlc:masterfrom
RAMitchell:quantile-cut-query-fixes

Conversation

@RAMitchell
Copy link
Copy Markdown
Member

@RAMitchell RAMitchell commented Apr 7, 2026

Summary

This PR extracts a focused subset of the weighted quantile accuracy work from #12129.
It addresses the weighted cut extraction issue tracked in #12139.

It fixes two discrete issues that affect weighted cut quality:

  • the GPU adapter sketch batch path could underestimate rows in a batch by rounding down
  • final histogram cuts on both CPU and GPU were generated by treating a final prune-to-cut-count step as the cut set, instead of querying the working summary at target ranks

What This Changes

  • use DivRoundUp(sketch_batch_num_elements, num_cols) in the GPU adapter sketch path
  • add WQSummary::QueryCutValues(max_bin) to materialize histogram cuts directly from the working summary
  • generate CPU histogram cuts by querying the working summary directly
  • generate GPU histogram cuts by querying the working summary directly
  • tighten the shared weighted rank-error tolerance from 15.0 to 10.0

Why

The weighted summary is built to answer rank queries. Querying the working summary directly gives better final cuts than pruning the summary down to cut-count size and treating the retained entries as the cuts.

On the weighted reproducer used during debugging, normalized cut rank error dropped from about 3.06 to about 0.25, which is roughly 2 * epsilon with the current kFactor = 8.

The GPU batch sizing fix also removes an off-by-one style underestimate in the adapter path when converting batch element counts into rows.

Tolerance Note

This PR reduces the shared weighted rank-error tolerance from 15.0 to 10.0.

That value is empirical: it leaves some margin over the focused weighted CPU/GPU coverage on this extracted branch while still being materially tighter than the old bound.

Testing

Ran locally:

  • ./build-cpu/testxgboost --gtest_filter='Quantile.*:HistUtil.*'
  • ./build-cuda-local/testxgboost --gtest_filter='HistUtil.*:GPUQuantile.*'

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extracts and applies a subset of the weighted-quantile accuracy work to improve histogram cut quality, focusing on (1) correct GPU adapter sketch batch sizing and (2) generating final CPU/GPU histogram cuts by querying the working weighted summary at target ranks instead of treating a pruned summary as the cut set.

Changes:

  • Fix GPU adapter sketch batch row estimation by using DivRoundUp(sketch_batch_num_elements, num_cols).
  • Add rank-query helpers on weighted summaries and use query-based cut extraction on CPU and GPU.
  • Extend/adjust C++ quantile tests to validate weighted summary query bounds and cut rank error; tighten shared weighted tolerance to 10.0.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/common/hist_util.cuh Fixes GPU sketch batch row estimation by rounding up when converting element budget to rows.
src/common/quantile.h Adds Query, QueryRanks, and shared QueryCutValues helper for query-based cut extraction.
src/common/quantile.cc Switches CPU histogram cut extraction to query the working summary; removes final prune-to-max_bin+1 step.
src/common/quantile.cu Switches GPU histogram cut extraction to query the working summary (currently via a host-side extraction path).
tests/cpp/common/test_hist_util.h Relaxes shared weighted normalized rank-error tolerance from 15.0 to 10.0.
tests/cpp/common/test_quantile.cc Adds CPU-side weighted summary query-bound tests for push/sorted/merged scenarios.
tests/cpp/common/test_quantile.cu Adds GPU-side weighted summary query-bound test and a weighted cut rank-error test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@RAMitchell RAMitchell requested review from Copilot and trivialfis April 7, 2026 19:34
@RAMitchell RAMitchell marked this pull request as ready for review April 7, 2026 19:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@RAMitchell
Copy link
Copy Markdown
Member Author

This PR improves quantile sketch accuracy overall, but that also shifts some downstream model behavior, which exposed a couple of overly strict tests.

For test_absolute_error, the model output still looks reasonable, but the check that the target responses are sufficiently independent was based on a brittle hard threshold, so I removed that assertion.

For the ranking normalization test, the failure was by a very small margin, and the condition itself is not theoretically justified: there is no general guarantee that the normalized result must be strictly worse than the unnormalized one. I removed that assertion as well.

@RAMitchell RAMitchell merged commit d84637e into dmlc:master Apr 8, 2026
78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants