Expose objective functions to the Python interface.#12059
Expose objective functions to the Python interface.#12059trivialfis wants to merge 24 commits intodmlc:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a Python class-based interface for XGBoost’s built-in objectives (e.g. reg:squarederror), allowing users to pass objective instances (with parameters) into xgboost.train / scikit-learn wrappers, and updates docs/tests and the multi-output reduced-gradient demo accordingly.
Changes:
- Add
_BuiltInObjectivewrappers and concrete objective classes inxgboost.objective, plus param serialization via_stringify. - Extend
xgboost.trainand scikit-learn.fit()to accept_BuiltInObjectiveinstances by translating them into booster params. - Add/update tests and documentation for the new objective API and updated reduced-gradient guidance.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/python/test_objective.py | Adds a CPU test entrypoint for the new objective test suite. |
| tests/python/test_multi_target.py | Adds a CPU test calling the new built-in-objective split-grad test helper. |
| tests/python-gpu/test_gpu_objective.py | Adds a GPU test entrypoint for the new objective test suite. |
| python-package/xgboost/training.py | Teaches train() to accept _BuiltInObjective and set objective params on the Booster. |
| python-package/xgboost/testing/objective.py | New shared test suite validating objective wrappers across regression/classification/ranking/survival. |
| python-package/xgboost/testing/multi_target.py | Updates custom objective base class usage and adds a test helper for subclassing built-in objectives. |
| python-package/xgboost/sklearn.py | Allows scikit-learn wrappers to accept _BuiltInObjective instances. |
| python-package/xgboost/objective.py | Adds _BuiltInObjective, _stringify, and many built-in objective wrapper classes; removes TreeObjective. |
| python-package/xgboost/core.py | Uses _stringify in set_param and simplifies objective handling around split_grad. |
| doc/tutorials/multioutput.rst | Updates guidance to inherit from Objective (and claims built-in objectives are usable too). |
| doc/python/python_api.rst | Adds xgboost.objective to the Python API reference. |
| demo/guide-python/multioutput_reduced_gradient.py | Updates demo to subclass RegSquaredError instead of reimplementing LS gradients manually. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
python-package/xgboost/training.py
Outdated
| builtin_obj = None | ||
| if isinstance(obj, _BuiltInObjective): | ||
| builtin_obj = obj | ||
| obj = None | ||
|
|
||
| bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model) | ||
| start_iteration = 0 | ||
|
|
||
| if builtin_obj is not None: | ||
| for key, value in builtin_obj.flat_params().items(): | ||
| bst.set_param(key, value) | ||
|
|
There was a problem hiding this comment.
train() now special-cases _BuiltInObjective, but cv() still forwards obj directly into Booster.update(). Passing a built-in objective instance to xgboost.cv(obj=...) will hit _BuiltInObjective.__call__() and raise at runtime. Please add analogous handling in cv() (extract flat_params(), apply to each fold’s booster via set_param, and set obj=None).
RAMitchell
left a comment
There was a problem hiding this comment.
Exposing objectives is great - but the split gradient method doesn't make sense to me. It implies that you would want a different method of approximating gradients for each objective type, but it seems the same techniques apply (SVD, random projection).
|
The alternative is an independent reducer as sketch boost, which I did took during initial implementation. My final choice of using the objective as the vehicle to deliver the feature as I saw the technique as using different loss functions for split and leaf values, instead of using the same loss with dimension reduction applied on the gradient. This view was briefly mentioned in the document. The split loss may or may not be derived from dimension reduction techniques. I'm open to change if the view is considered not productive. The interface is marked as wip, we can change. Then, it's not difficult to compose either, just define a reducer with the |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
python-package/xgboost/sklearn.py:1783
- In
XGBClassifier.fit, whenself.objectiveis a_BuiltInObjective, you pass the instance through totrain()while also overridingparamsfor multiclass (objective/num_class).train()will later applybuiltin_obj.flat_params()onto the Booster, which can override the multiclass-derivedparams(notablynum_class) with values from the objective instance, potentially producing an inconsistent configuration (or silently training with the wrongnum_class). Please add validation/synchronization between the inferredn_classes_and anynum_classcarried by the objective instance, or avoid passing the instance intotrain()and instead merge its flattened params intoparamsbefore training.
obj: Optional[Union[PlainObj, _BuiltInObjective]] = None
if isinstance(self.objective, _BuiltInObjective):
obj = self.objective
params["objective"] = self.objective.name
elif callable(self.objective):
obj = _objective_decorator(self.objective)
params["objective"] = "binary:logistic"
if self.n_classes_ > 2:
# Switch to using a multiclass objective in the underlying XGB instance
if params.get("objective", None) != "multi:softmax":
params["objective"] = "multi:softprob"
params["num_class"] = self.n_classes_
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Do you have an example or reference for this? If you want I can help implement a simple random projection then we have just one parameter. |
It's a perspective. You can see the gradient derived from dimension reduction as a different loss gradient: class LsObjSum(LsObjMean):
def __init__(self, device: str) -> None:
super().__init__(device=device)
def __call__(
self, iteration: int, y_pred: np.ndarray, dtrain: xgb.DMatrix
) -> Tuple[np.ndarray, np.ndarray]:
self.dtrain = dtrain
self.y_pred = y_pred
return super().__call__(iteration, y_pred, dtrain)
def split_grad(
self, iteration: int, grad: np.ndarray, hess: np.ndarray
) -> Tuple[np.ndarray, np.ndarray]:
# Derive the gradient from the mean of the output values instead of using the mean of the gradient
y = self.dtrain.get_label()
y_pred = self.y_pred
y_mean = np.mean(y, axis=1)
p_mean = np.mean(y_pred, axis=1)
grad = p_mean - y_mean
hess = np.ones(grad.shape)
return grad, hess
I'm not sure I want to make random projection the only choice, even if I drop the |
|
I reverted the changes for the vector-leaf. This PR now only handles exposing the objective functions. |
It's just an example that one can see the gradient this way.
It's certainly one of the candidates. But my point is, why put in this restriction? |
|
@RAMitchell I have reverted all changes related to the |
RAMitchell
left a comment
There was a problem hiding this comment.
LGTM. I assume documentation comes at a later stage? We will probably have to render all the math formulas in the docs.
| X, y, _ = make_regression(100, 5, use_cupy=device == "cuda") | ||
| dm = DMatrix(X, label=y) | ||
|
|
||
| for obj_inst, obj_name in [ |
There was a problem hiding this comment.
I guess we can't use pytest parameterize with this organisation of tests?
In legate-boost I had parameterisation over device and objectives.
There was a problem hiding this comment.
There are a couple of options, like parametrization, pytest subtests, or a simple loop. I don't have a strong preference for this specific test, as it's unlikely we need to isolate the tests to reproduce the issues. If one of them fails, all of the objectives should fail.
| assert obj_inst.name == obj_name | ||
|
|
||
|
|
||
| def check_equivalence(device: Device) -> None: |
There was a problem hiding this comment.
Would be nice to see this parameterised over objectives.
| @@ -2280,21 +2278,14 @@ def train_one_iter(grad: NumpyOrCupy, hess: NumpyOrCupy) -> None: | |||
| vgrad: Optional[ArrayLike] | |||
There was a problem hiding this comment.
Just curious what the latency differences between using the python interface or internal. One possible simplification could be to always use the python, there is just one code path.
There was a problem hiding this comment.
The internal assumes split gradient must be available, and only does an extra leaf value computation at the end of iteration if there's an extra leaf value gradient. This is easy to implement as we only need one extra step.
But that's not intuitive to users since the algorithm creates an extra "split gradient", at least that's the mental model. So, the interface assumes a value gradient is available, as in normal gradient boosting. The assumption is switched here.
There was a problem hiding this comment.
I misread the original question. There's no difference in latency, it's just setting parameter. Yes, I think a single code path would be super nice.
I'm thinking if it's possible to reuse the document instead of copying and pasting it into Python files. |

Ref #7693 #9043
Learneryet. My plan is to make an independent C API to create objectives with thesplit_gradcustomization. This will have to wait until the plan is clear.TreeObjective, since the linear model is now deprecated.cvfunction is not supported yet.I named the classes
RegSquaredErrorinstead ofSquaredErrorto maintain consistency with the C++ naming. But I'm open to suggestions.