Skip to content

Move internal shap algorithms into separate namespace.#11985

Merged
RAMitchell merged 21 commits intodmlc:masterfrom
RAMitchell:shap-interface
Feb 25, 2026
Merged

Move internal shap algorithms into separate namespace.#11985
RAMitchell merged 21 commits intodmlc:masterfrom
RAMitchell:shap-interface

Conversation

@RAMitchell
Copy link
Copy Markdown
Member

No description provided.

{"device", ctx->IsSycl() ? "cpu" : ctx->DeviceName()}};
}

gbm::GBTreeModel LoadGBTreeModel(Learner* learner, Context const* ctx,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is quite annoying to get a GBTreeModel out of a booster.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would like to split up the concept of "model" and everything else like "optimizers/tree builders". But it might be too much for XGBoost at its current state.

@RAMitchell
Copy link
Copy Markdown
Member Author

I am having two major difficulties with this PR. The categorical recoding logic is complicated and I don't want to carry this through to the interpretability module. The other is to simply get the tree ensemble out of learner which seems somehow impossible in the current setup.

@trivialfis
Copy link
Copy Markdown
Member

I can look into simplifying the categorical features after holiday.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors SHAP-related implementation details out of the CPU/GPU predictors into a new internal xgboost::interpretability namespace, aiming to reduce duplication and better isolate interpretability code paths.

Changes:

  • Introduces new internal SHAP dispatch API (interpretability::ShapValues, ShapInteractionValues, ApproxFeatureImportance) with CPU and CUDA implementations.
  • Refactors CPU/GPU predictors to delegate SHAP/interaction contribution computation to the new interpretability layer.
  • Extracts shared CPU/GPU data access utilities into dedicated headers and updates SHAP tests to use the new entry points (with reduced runtime).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/cpp/predictor/test_shap.cc Updates SHAP tests to call new interpretability SHAP entry points and reduces dataset sizes/iters.
src/predictor/interpretability/shap.h Adds internal SHAP dispatch header (CPU/CUDA routing).
src/predictor/interpretability/shap.cc Adds CPU implementations for SHAP values, interactions, and approximate importance.
src/predictor/interpretability/shap.cu Adds CUDA implementations for SHAP values/interactions using GPUTreeShap.
src/predictor/gpu_predictor.cu Removes inlined GPU SHAP logic and delegates to interpretability SHAP functions; reuses extracted GPU accessors.
src/predictor/gpu_data_accessor.cuh New shared GPU sparse/ellpack access helpers (used by GPU predictor and SHAP).
src/predictor/data_accessor.h New shared CPU batch-to-FVec access helpers (used by CPU predictor and CPU SHAP).
src/predictor/cpu_predictor.cc Removes inlined CPU SHAP logic and delegates to interpretability SHAP functions; uses extracted accessors.
src/gbm/gbtree.h Formatting-only adjustments.
src/gbm/gbtree.cc Formatting-only adjustments.
include/xgboost/predictor.h Macro formatting-only adjustment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


HostDeviceVector<float> shap_values;
learner->Predict(p_dmat, false, &shap_values, 0, 0, false, false, true, false, false);
interpretability::ShapValues(dmat->Ctx(), p_dmat.get(), &shap_values, *gbtree, 0, {}, 0, 0);
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tree_weights parameter is a pointer type (std::vector<float> const*), but the test is passing {} (an empty brace-initializer). This will not compile correctly. Pass nullptr instead to indicate no tree weights.

Copilot uses AI. Check for mistakes.
Comment on lines +212 to +213
interpretability::ShapInteractionValues(dmat->Ctx(), p_dmat.get(), &shap_interactions, *gbtree, 0,
{}, false);
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tree_weights parameter is a pointer type (std::vector<float> const*), but the test is passing {} (an empty brace-initializer). This will not compile correctly. Pass nullptr instead to indicate no tree weights.

Copilot uses AI. Check for mistakes.
@RAMitchell RAMitchell changed the title [WIP] Move internal shap algorithms into separate namespace. Move internal shap algorithms into separate namespace. Feb 17, 2026
@RAMitchell RAMitchell marked this pull request as ready for review February 17, 2026 13:44
@trivialfis
Copy link
Copy Markdown
Member

trivialfis commented Feb 20, 2026

I haven't looked into the details yet. But it would be great if we could establish some conventions here:

  • Do we need the gpu_ prefix for CUDA files? My preference is that the .cu(h) file extension is sufficient and doesn't need additional annotations.
  • Since you are trying to separate the SHAP module, should it be outside of the predictor directory?

@RAMitchell
Copy link
Copy Markdown
Member Author

Since you are trying to separate the SHAP module, should it be outside of the predictor directory?

That was my first attempt but it depends on all of the data accessors in prediction, in particular recoding.

@trivialfis
Copy link
Copy Markdown
Member

trivialfis commented Feb 22, 2026

The categorical recoding logic is complicated and I don't want to carry this through to the interpretability module.

Would you like to share a bit more on the complication from a caller's perspective for the SHAP module?

That was my first attempt but it depends on all of the data accessors in prediction, in particular recoding.

I think you have extracted them out?

@trivialfis
Copy link
Copy Markdown
Member

Please let me know if you would like me to give the refactoring a try.

@RAMitchell
Copy link
Copy Markdown
Member Author

I originally wanted to have a separate interpretability folder, but these new functions remain coupled to prediction. The shap values all need to add up to the margin prediction. You could put the data loading accessors elsewhere (data/utils) but I think they end up only used by shap and prediction. Let me know if you have better ideas.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@RAMitchell RAMitchell merged commit 90fa894 into dmlc:master Feb 25, 2026
82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants