Skip to content

Preserve exact weighted values in sorted sketch#12148

Open
RAMitchell wants to merge 1 commit intodmlc:masterfrom
RAMitchell:cpu-sorted-weighted-exact-summary-fix
Open

Preserve exact weighted values in sorted sketch#12148
RAMitchell wants to merge 1 commit intodmlc:masterfrom
RAMitchell:cpu-sorted-weighted-exact-summary-fix

Conversation

@RAMitchell
Copy link
Copy Markdown
Member

Summary

Fix the CPU sorted weighted sketch path when the sketch budget is already large enough to
retain every unique feature value.

Previously, SetPruneSorted(...) still applied weighted goal selection in this regime, which
could drop exact values unnecessarily. This change emits the exact weighted summary instead.

Testing

  • cmake -S <worktree> -B <worktree>/build-cpu -DUSE_CUDA=OFF -DGOOGLE_TEST=ON -DUSE_DMLC_GTEST=ON
  • cmake --build <worktree>/build-cpu --target testxgboost -j35
  • <worktree>/build-cpu/testxgboost --gtest_filter='HistUtil.SortedWeightedExactCuts:Quantile.*:HistUtil.*'

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the CPU sorted weighted-quantile sketch path to preserve all unique feature values when the sketch budget (max_size) is already large enough to retain them, avoiding unnecessary drops caused by weighted goal-selection.

Changes:

  • Add an early-exit path in WQSummary::SetPruneSorted to emit an exact weighted summary when unique_values <= max_size.
  • Add a regression test ensuring sorted-column sketching matches row-based sketching in the exact-retention regime for weighted data.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/common/quantile.h Emits an exact weighted summary (no goal-selection pruning) when all unique values fit in budget.
tests/cpp/common/test_hist_util.cc Adds HistUtil.SortedWeightedExactCuts to validate sorted vs row sketch cut equivalence when budget retains all unique values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants