Skip to content

[RFC] Exposing objectives and metrics as part of the API. #7693

@trivialfis

Description

@trivialfis

This is for discussion about whether should we expose our internal objective functions to high-level API. From a Python user's perspective, the functionality would look like what's in sklearn except that it will be a class instead of a function (or a function factory, which might be confusing for others who are used to pass a simple function):

from xgboost.objective import PseudoHuber
import xgboost

xgboost.XGBRegressor(objective=PseudoHuber(delta=10.0), tree_method="hist")

when the parameter is not needed (delta=10.0 in above), one can continue to use the old string representation.

Motivation

As we are having more and more objectives and metrics, the current approach of using only string names is becoming cumbersome since each of these objectives might have its own hyper-parameter, examples are the fair xgboost we have been working on, also the pseudo-Huber that we need to improve by introducing the slope. Other examples are #5859 which can have multiple distributions to select from.

Aside from exposing the internal objects, we are also helping others who want to define their own custom objectives. Currently, we only accept callable as custom objective, which is not sufficient for many cases since the objective also decides the link function and the number of output targets. If we have a class interface we can help our users to simplify their custom objective functions.

Other things we might need to explore

We might add more features to the objective in the future, for instance, each objective might have the ability to modify tree leaves after each iteration. Features like this can help mitigate the effect of zero hessian objectives like MAE. Can we expose this to custom objectives too? Also, how do we make sure this restructure plays nicely with 3-party tools like hyper-parameter optimization libraries or autograd? Lastly, due to the current objective being an internal structure, even if we expose them to users we might not want to state its stability for a reasonabe period of time.

related

#10144

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions