Skip to content

Expose objective functions to the Python interface.#12059

Open
trivialfis wants to merge 24 commits intodmlc:masterfrom
trivialfis:py-obj-1
Open

Expose objective functions to the Python interface.#12059
trivialfis wants to merge 24 commits intodmlc:masterfrom
trivialfis:py-obj-1

Conversation

@trivialfis
Copy link
Copy Markdown
Member

@trivialfis trivialfis commented Mar 3, 2026

Ref #7693 #9043

  • Expose all objective functions to the Python interface.
  • No change in the Learner yet. My plan is to make an independent C API to create objectives with the split_grad customization. This will have to wait until the plan is clear.
  • Metrics are not exposed. I'm focusing on getting vector-leaf to work with built-in objectives.
  • Remove the TreeObjective, since the linear model is now deprecated.
  • cv function is not supported yet.
  • Naming of the parameters are described in the referenced issue

I named the classes RegSquaredError instead of SquaredError to maintain consistency with the C++ naming. But I'm open to suggestions.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Python class-based interface for XGBoost’s built-in objectives (e.g. reg:squarederror), allowing users to pass objective instances (with parameters) into xgboost.train / scikit-learn wrappers, and updates docs/tests and the multi-output reduced-gradient demo accordingly.

Changes:

  • Add _BuiltInObjective wrappers and concrete objective classes in xgboost.objective, plus param serialization via _stringify.
  • Extend xgboost.train and scikit-learn .fit() to accept _BuiltInObjective instances by translating them into booster params.
  • Add/update tests and documentation for the new objective API and updated reduced-gradient guidance.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/python/test_objective.py Adds a CPU test entrypoint for the new objective test suite.
tests/python/test_multi_target.py Adds a CPU test calling the new built-in-objective split-grad test helper.
tests/python-gpu/test_gpu_objective.py Adds a GPU test entrypoint for the new objective test suite.
python-package/xgboost/training.py Teaches train() to accept _BuiltInObjective and set objective params on the Booster.
python-package/xgboost/testing/objective.py New shared test suite validating objective wrappers across regression/classification/ranking/survival.
python-package/xgboost/testing/multi_target.py Updates custom objective base class usage and adds a test helper for subclassing built-in objectives.
python-package/xgboost/sklearn.py Allows scikit-learn wrappers to accept _BuiltInObjective instances.
python-package/xgboost/objective.py Adds _BuiltInObjective, _stringify, and many built-in objective wrapper classes; removes TreeObjective.
python-package/xgboost/core.py Uses _stringify in set_param and simplifies objective handling around split_grad.
doc/tutorials/multioutput.rst Updates guidance to inherit from Objective (and claims built-in objectives are usable too).
doc/python/python_api.rst Adds xgboost.objective to the Python API reference.
demo/guide-python/multioutput_reduced_gradient.py Updates demo to subclass RegSquaredError instead of reimplementing LS gradients manually.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +184 to +195
builtin_obj = None
if isinstance(obj, _BuiltInObjective):
builtin_obj = obj
obj = None

bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
start_iteration = 0

if builtin_obj is not None:
for key, value in builtin_obj.flat_params().items():
bst.set_param(key, value)

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

train() now special-cases _BuiltInObjective, but cv() still forwards obj directly into Booster.update(). Passing a built-in objective instance to xgboost.cv(obj=...) will hit _BuiltInObjective.__call__() and raise at runtime. Please add analogous handling in cv() (extract flat_params(), apply to each fold’s booster via set_param, and set obj=None).

Copilot uses AI. Check for mistakes.
@trivialfis trivialfis changed the title Expose objective functions to the Python interface. [wip] Expose objective functions to the Python interface. Mar 3, 2026
@trivialfis trivialfis marked this pull request as draft March 3, 2026 10:58
Copy link
Copy Markdown
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing objectives is great - but the split gradient method doesn't make sense to me. It implies that you would want a different method of approximating gradients for each objective type, but it seems the same techniques apply (SVD, random projection).

@trivialfis
Copy link
Copy Markdown
Member Author

trivialfis commented Mar 3, 2026

The alternative is an independent reducer as sketch boost, which I did took during initial implementation.

My final choice of using the objective as the vehicle to deliver the feature as I saw the technique as using different loss functions for split and leaf values, instead of using the same loss with dimension reduction applied on the gradient. This view was briefly mentioned in the document. The split loss may or may not be derived from dimension reduction techniques.

I'm open to change if the view is considered not productive. The interface is marked as wip, we can change.

Then, it's not difficult to compose either, just define a reducer with the split_grad as a mixin for your objective class.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

python-package/xgboost/sklearn.py:1783

  • In XGBClassifier.fit, when self.objective is a _BuiltInObjective, you pass the instance through to train() while also overriding params for multiclass (objective/num_class). train() will later apply builtin_obj.flat_params() onto the Booster, which can override the multiclass-derived params (notably num_class) with values from the objective instance, potentially producing an inconsistent configuration (or silently training with the wrong num_class). Please add validation/synchronization between the inferred n_classes_ and any num_class carried by the objective instance, or avoid passing the instance into train() and instead merge its flattened params into params before training.
            obj: Optional[Union[PlainObj, _BuiltInObjective]] = None
            if isinstance(self.objective, _BuiltInObjective):
                obj = self.objective
                params["objective"] = self.objective.name
            elif callable(self.objective):
                obj = _objective_decorator(self.objective)
                params["objective"] = "binary:logistic"

            if self.n_classes_ > 2:
                # Switch to using a multiclass objective in the underlying XGB instance
                if params.get("objective", None) != "multi:softmax":
                    params["objective"] = "multi:softprob"
                params["num_class"] = self.n_classes_


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@RAMitchell
Copy link
Copy Markdown
Member

My final choice of using the objective as the vehicle to deliver the feature as I saw the technique as using different loss functions for split and leaf values, instead of using the same loss with dimension reduction applied on the gradient. This view was briefly mentioned in the document. The split loss may or may not be derived from dimension reduction techniques.

Do you have an example or reference for this?

If you want I can help implement a simple random projection then we have just one parameter.

@trivialfis
Copy link
Copy Markdown
Member Author

trivialfis commented Mar 4, 2026

Do you have an example or reference for this?

It's a perspective. You can see the gradient derived from dimension reduction as a different loss gradient: $\partial L_{split}$, $\partial L_{leaf}$, there's no need to put a strict link between them that says dimension reduction must be employed to obtain one from the other. To come up with a simple example for this demo:

class LsObjSum(LsObjMean):
    def __init__(self, device: str) -> None:
        super().__init__(device=device)

    def __call__(
        self, iteration: int, y_pred: np.ndarray, dtrain: xgb.DMatrix
    ) -> Tuple[np.ndarray, np.ndarray]:
        self.dtrain = dtrain
        self.y_pred = y_pred
        return super().__call__(iteration, y_pred, dtrain)
    def split_grad(
        self, iteration: int, grad: np.ndarray, hess: np.ndarray
    ) -> Tuple[np.ndarray, np.ndarray]:
       # Derive the gradient from the mean of the output values instead of using the mean of the gradient
        y = self.dtrain.get_label()
        y_pred = self.y_pred
        y_mean = np.mean(y, axis=1)
        p_mean = np.mean(y_pred, axis=1)
        grad = p_mean - y_mean
        hess = np.ones(grad.shape)
        return grad, hess

If you want I can help implement a simple random projection then we have just one parameter.

I'm not sure I want to make random projection the only choice, even if I drop the split_grad view. That seems like an unnecessary restriction, out of so many reduction techniques out there. I don't think this simplifies or enhances XGBoost.

@trivialfis
Copy link
Copy Markdown
Member Author

I reverted the changes for the vector-leaf. This PR now only handles exposing the objective functions.

@RAMitchell
Copy link
Copy Markdown
Member

This split gradient above is always an underestimate of split gain due to this inequality
image

Random projection is unbiased and will converge in expection, always, for every loss. Doesn't it just solve the problem?

@trivialfis
Copy link
Copy Markdown
Member Author

This split gradient above is always an underestimate of split gain due to this inequality

It's just an example that one can see the gradient this way.

Random projection is unbiased and will converge in expection, always, for every loss. Doesn't it just solve the problem?

It's certainly one of the candidates. But my point is, why put in this restriction?

@trivialfis trivialfis marked this pull request as ready for review March 4, 2026 17:12
@trivialfis trivialfis changed the title [wip] Expose objective functions to the Python interface. Expose objective functions to the Python interface. Mar 4, 2026
@trivialfis
Copy link
Copy Markdown
Member Author

trivialfis commented Mar 4, 2026

@RAMitchell I have reverted all changes related to the split_grad, this PR now only handles exposing the objective. Please help take another look when you are available. Some notes have been added to the PR description.

Copy link
Copy Markdown
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I assume documentation comes at a later stage? We will probably have to render all the math formulas in the docs.

X, y, _ = make_regression(100, 5, use_cupy=device == "cuda")
dm = DMatrix(X, label=y)

for obj_inst, obj_name in [
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can't use pytest parameterize with this organisation of tests?

In legate-boost I had parameterisation over device and objectives.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of options, like parametrization, pytest subtests, or a simple loop. I don't have a strong preference for this specific test, as it's unlikely we need to isolate the tests to reproduce the issues. If one of them fails, all of the objectives should fail.

assert obj_inst.name == obj_name


def check_equivalence(device: Device) -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to see this parameterised over objectives.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -2280,21 +2278,14 @@ def train_one_iter(grad: NumpyOrCupy, hess: NumpyOrCupy) -> None:
vgrad: Optional[ArrayLike]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious what the latency differences between using the python interface or internal. One possible simplification could be to always use the python, there is just one code path.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal assumes split gradient must be available, and only does an extra leaf value computation at the end of iteration if there's an extra leaf value gradient. This is easy to implement as we only need one extra step.

But that's not intuitive to users since the algorithm creates an extra "split gradient", at least that's the mental model. So, the interface assumes a value gradient is available, as in normal gradient boosting. The assumption is switched here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misread the original question. There's no difference in latency, it's just setting parameter. Yes, I think a single code path would be super nice.

@trivialfis
Copy link
Copy Markdown
Member Author

We will probably have to render all the math formulas in the docs.

I'm thinking if it's possible to reuse the document instead of copying and pasting it into Python files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants