Extract weighted quantile cut fixes from #12129#12146
Extract weighted quantile cut fixes from #12129#12146RAMitchell merged 10 commits intodmlc:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extracts and applies a subset of the weighted-quantile accuracy work to improve histogram cut quality, focusing on (1) correct GPU adapter sketch batch sizing and (2) generating final CPU/GPU histogram cuts by querying the working weighted summary at target ranks instead of treating a pruned summary as the cut set.
Changes:
- Fix GPU adapter sketch batch row estimation by using
DivRoundUp(sketch_batch_num_elements, num_cols). - Add rank-query helpers on weighted summaries and use query-based cut extraction on CPU and GPU.
- Extend/adjust C++ quantile tests to validate weighted summary query bounds and cut rank error; tighten shared weighted tolerance to
10.0.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/common/hist_util.cuh |
Fixes GPU sketch batch row estimation by rounding up when converting element budget to rows. |
src/common/quantile.h |
Adds Query, QueryRanks, and shared QueryCutValues helper for query-based cut extraction. |
src/common/quantile.cc |
Switches CPU histogram cut extraction to query the working summary; removes final prune-to-max_bin+1 step. |
src/common/quantile.cu |
Switches GPU histogram cut extraction to query the working summary (currently via a host-side extraction path). |
tests/cpp/common/test_hist_util.h |
Relaxes shared weighted normalized rank-error tolerance from 15.0 to 10.0. |
tests/cpp/common/test_quantile.cc |
Adds CPU-side weighted summary query-bound tests for push/sorted/merged scenarios. |
tests/cpp/common/test_quantile.cu |
Adds GPU-side weighted summary query-bound test and a weighted cut rank-error test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
This PR improves quantile sketch accuracy overall, but that also shifts some downstream model behavior, which exposed a couple of overly strict tests. For For the ranking normalization test, the failure was by a very small margin, and the condition itself is not theoretically justified: there is no general guarantee that the normalized result must be strictly worse than the unnormalized one. I removed that assertion as well. |
Summary
This PR extracts a focused subset of the weighted quantile accuracy work from #12129.
It addresses the weighted cut extraction issue tracked in #12139.
It fixes two discrete issues that affect weighted cut quality:
What This Changes
DivRoundUp(sketch_batch_num_elements, num_cols)in the GPU adapter sketch pathWQSummary::QueryCutValues(max_bin)to materialize histogram cuts directly from the working summary15.0to10.0Why
The weighted summary is built to answer rank queries. Querying the working summary directly gives better final cuts than pruning the summary down to cut-count size and treating the retained entries as the cuts.
On the weighted reproducer used during debugging, normalized cut rank error dropped from about
3.06to about0.25, which is roughly2 * epsilonwith the currentkFactor = 8.The GPU batch sizing fix also removes an off-by-one style underestimate in the adapter path when converting batch element counts into rows.
Tolerance Note
This PR reduces the shared weighted rank-error tolerance from
15.0to10.0.That value is empirical: it leaves some margin over the focused weighted CPU/GPU coverage on this extracted branch while still being materially tighter than the old bound.
Testing
Ran locally:
./build-cpu/testxgboost --gtest_filter='Quantile.*:HistUtil.*'./build-cuda-local/testxgboost --gtest_filter='HistUtil.*:GPUQuantile.*'