Add V5 leaf-extraction QuadratureSHAP — faster than TreeSHAP at every depth#1
Open
yupbank wants to merge 1 commit intoshapley-value-algorithmsfrom
Open
Add V5 leaf-extraction QuadratureSHAP — faster than TreeSHAP at every depth#1yupbank wants to merge 1 commit intoshapley-value-algorithmsfrom
yupbank wants to merge 1 commit intoshapley-value-algorithmsfrom
Conversation
… depth
Add a third SHAP algorithm option ("v5shap") alongside treeshap and
quadratureshap. V5 uses leaf-extraction with three optimizations that
eliminate the small-tree regression seen in QuadratureSHAP:
1. Leaf-extraction instead of edge-telescoping: defers SHAP contribution
extraction to leaves using precomputed subtree feature masks, skipping
redundant extract_term calls at internal nodes.
2. Dynamic Q = max(ceil(depth/2), 2): adapts quadrature points to tree
depth. Depth-4 trees use Q=2 (2 FMAs/node), matching TreeSHAP's O(D)
cost. Deep trees ramp up automatically.
3. float32 + precomputed subtree masks: matches TreeSHAP's data type,
halves bandwidth. Subtree masks computed once per tree, not per sample.
Benchmark (50 trees, 256 samples, nthread=1, vs Lundberg TreeSHAP):
depth=3 (15 nodes): 1.37x faster
depth=4 (29 nodes): 1.32x faster
depth=8 (182 nodes): 1.24x faster
depth=12 (238 nodes): 1.27x faster
depth=16 (238 nodes): 1.31x faster
breast_cancer (35 nodes): 1.85x faster
Max accuracy diff vs TreeSHAP: ~1e-7 (within 1e-4 test tolerance).
V5 is also 1.2-1.4x faster than QuadratureSHAP across all configs.
Usage: bst.set_param({"shap_algorithm": "v5shap"})
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This adds a third SHAP algorithm option (
v5shap) that addresses the small-tree regression found in dmlc/xgboost#12106. V5 (leaf-extraction) is 1.13–1.85x faster than Lundberg's TreeSHAP at every tree depth tested, including the small/sparse trees where QuadratureSHAP (V6-style edge-telescoping) loses.Motivation
QuadratureSHAP in PR dmlc#12106 showed regressions on small trees:
Root cause: fixed O(Q=8) per-node quadrature overhead exceeds TreeSHAP's O(D²) cost when D is small.
What V5 does differently
Leaf-extraction — defers SHAP extraction to leaves using precomputed subtree feature masks (
subtree_feats[node × n_features]). Tracks "pending" features along root→leaf paths, extracts only when features leave the path. Avoids redundantextract_termat internal nodes.Dynamic Q = max(⌈depth/2⌉, 2) — adapts quadrature points to tree depth. Depth-4 trees use Q=2 (2 FMAs/node), matching TreeSHAP's O(D=4) loop cost. Deep trees ramp to Q=8+.
float32 + precomputed masks — all H-buffer work in float32 (matching TreeSHAP's precision). Subtree masks computed once per tree (not per sample), eliminating the main per-call overhead.
Benchmark results
50 trees, 256 test samples,
nthread=1, synthetic 50-feature data + breast_cancer:V5 is also 1.2–1.4x faster than QuadratureSHAP at every depth.
Accuracy: max diff vs TreeSHAP ~1e-7. Additivity error ~2e-7.
Changes
src/predictor/interpretability/shap.cc— V5 algorithm (~420 lines):ComputeSubtreeFeats,TreeShapV5,V5ShapValues, precomputed GL rules for Q=2..16src/predictor/interpretability/shap.h—V5ShapValuesdeclarationsrc/predictor/cpu_predictor.cc—v5shapselector dispatchtests/cpp/predictor/test_shap.cc—V5ShapMatchesTreeShapCPUtest (10 features, 1e-4 tolerance)Usage
Test plan
sum(contribs) == predictionverifiedV5ShapMatchesTreeShapCPU)🤖 Generated with Claude Code