[c++] Add `survival_cox` objective for Cox proportional hazards modelling by ohines · Pull Request #7212 · lightgbm-org/LightGBM

ohines · 2026-03-27T14:21:25Z

Overview:

Adds Cox Proportional Hazards loss requested in Cox Proportional Hazard Regression #1837 and with several up-votes in Feature Requests & Voting Hub #2302.
Also added a metric to compute Harrell's concordance (C-index) popular in survival analysis.
These can be implemented with custom losses and metrics, but the computation to pre-sort the data and compute Breslow baseline hazards is a bit fiddly (especially with tied times), so a built in implementation is nice.
For context: I was using a custom python+numba implementation in a data analysis, which motivates this PR.

Naming:

I wasn't sure what to call the objective. I went for survival_cox with cox, cox_ph, and survival as aliases.
Similar for the negative partial log likelihood metric. I went for survival_cox_nll with aliases cox_nll, and survival_nll.

Gradient implementation in XGBoost
C-index: scikit-survival docs and implementation

jameslamb

Thanks for your interest in LightGBM. Someone will review this when we have time.

Until then, please:

update this branch to latest master
fix all the linting issues with pre-commit run --all-files

jameslamb · 2026-04-01T02:56:02Z

examples/python-guide/survival_example.py

+import lightgbm as lgb
+
+# Load FLCHAIN dataset (serum free light chain and mortality)
+data = fetch_openml("flchain", version=1, as_frame=True, parser="auto")


The AppVeyor builds are failing like this:

TypeError: fetch_openml() got an unexpected keyword argument 'parser'

https://ci.appveyor.com/project/guolinke/lightgbm/builds/53791302/job/oj3cfvbuifsjc7au?fullLog=true

Those jobs use a very old scikit-learn (1.0), which I guess must not have had that. Can you please figure out a more portable pattern? A different dataset, omitting the parser argument, something like that?

Thanks for pointing this out. I wasn't sure how to debug the appveyor failing tests.
You are right - parser is not required so removed it

AppVeyor fails again with

sklearn.datasets._openml.OpenMLError: Dataset flchain with version 1 not found

I will look into it. It seems that fetch_openml is marked as experimental in the scikit learn version 1.0 docs. Is there a particular reason that we test this version which was released in 2021?

In general, we try to support a wide range of lightgbm's main dependencies, for the benefit of users who can't easily upgrade to newer versions (e.g. they're using managed environments like Databricks notebooks or constrained to older operating systems).

We'd prefer to have a compelling reason to bump a runtime floor, and "makes this example in documentation easier to test" isn't that compelling, in my opinion.

That said we do already have a Linux job testing an even older scikit-learn:

LightGBM/.ci/pip-envs/requirements-oldest.txt

Line 12 in a7d00a9

scikit-learn==0.24.2

So I wouldn't be opposed to updating the pin for Python 3.9 environments like the one on Appveyor. That could be done here:

LightGBM/.ci/conda-envs/ci-core-py39.txt

Line 31 in a7d00a9

scikit-learn=1.0.*

I'd support trying to bump that up to a newer scikit-learn if you'd like. But it'll probably require pinning more than just scikit-learn, so might take a bit of trial and error.

I tested this on my local machine with python=3.10.20 and sklearn=1.0.2 and it works fine. Is it possible that the runner does not have access to open ml?

These are the api calls that are made (generated by adding a print statement here)

downloading data from https://openml.org/api/v1/json/data/list/data_name/flchain/limit/2/data_version/1 downloading data from https://openml.org/api/v1/json/data/46161 downloading data from https://openml.org/api/v1/json/data/features/46161 downloading data from https://openml.org/api/v1/json/data/qualities/46161 downloading data from https://openml.org/data/v1/download/22120605

In any case I can just replace the example to use synthetic data.

Updated in eb083b1

I tested this on my local machine with python=3.10.20 and sklearn=1.0.2 and it works fine. Is it possible that the runner does not have access to open ml?

That job uses Python 3.9, not 3.10. It's the standard Appveyor runner for open source projects and should have full access to the internet.

I suspect that maybe that "not found" error is actually from a broad try-catch and that something else in the environment (like some other dependency version) is causing it to fail.

The approach with synthetic data looks good to me!

Thanks for working through that.

jameslamb

Thanks, I went through this more thoroughly and left some more suggestions, please do say them.

I don't feel qualified to review the objective and metric implementations... once all my other suggestions are addressed, I can try to recruit another maintainer (or an outside reviewer) to look those over.

tests/python_package_test/test_engine.py

jameslamb · 2026-04-02T03:09:14Z

tests/python_package_test/utils.py



+@lru_cache(maxsize=None)
+def load_survival():


Suggested change

def load_survival():

def make_survival(*, n_samples, random_state):

Since this is generating random data, not loading an existing dataset, let's follow the existing conventions here and call it make_* instead of load_*.

And can you please parameterize at least the number of samples and the random seed?

jameslamb · 2026-04-02T03:11:44Z

tests/python_package_test/utils.py

+    n = 500
+    p = 5
+    censoring_rate = 0.3
+    rng = np.random.RandomState(seed=42)
+    X = rng.randn(n, p)
+    log_hazard = X[:, 0] + 0.5 * X[:, 1]
+    times = rng.exponential(np.exp(-log_hazard))
+    censor_times = rng.exponential(np.median(times) / censoring_rate, n)


Suggested change

n = 500

p = 5

censoring_rate = 0.3

rng = np.random.RandomState(seed=42)

X = rng.randn(n, p)

log_hazard = X[:, 0] + 0.5 * X[:, 1]

times = rng.exponential(np.exp(-log_hazard))

censor_times = rng.exponential(np.median(times) / censoring_rate, n)

n_features = 5

censoring_rate = 0.3

rng = np.random.RandomState(seed=42)

X = rng.randn(n_samples, n_features)

log_hazard = X[:, 0] + 0.5 * X[:, 1]

times = rng.exponential(np.exp(-log_hazard))

censor_times = rng.exponential(np.median(times) / censoring_rate, n_samples)

Let's please use more informative variable names, and match the names used in other functions in this file.

jameslamb · 2026-04-02T03:13:40Z

src/metric/metric.cpp

+    } else if (type == std::string("survival_cox_nll")) {
+      Log::Warning("Metric survival_cox_nll is not implemented in cuda version. Fall back to evaluation on CPU.");
+      return new CoxNLLMetric(config);
+    } else if (type == std::string("concordance_index") || type == std::string("c_index")) {


Please update this mapping in the R package as well:

LightGBM/R-package/R/metrics.R

Line 9 in a7d00a9

.METRICS_HIGHER_BETTER <- function() {

If you're comfortable writing R code we'd welcome new tests in the R package too, but at a minimum that mapping should be updated so the R package's early stopping behavior will be correct.

jameslamb · 2026-04-02T03:15:47Z

docs/Parameters.rst


+   -  survival analysis application
+
+      -  ``survival_cox``, `Cox proportional hazards <https://en.wikipedia.org/wiki/Proportional_hazards_model>`__ partial likelihood with Breslow's method for ties, aliases: ``survival``, ``cox``, ``cox_ph``


Are these aliases used in other projects or research?

If not, let's please not use any aliases for this objective. Aliases add complexity and maintenance burden, and I'd especially like to avoid committing survival like this in case other survival objectives are added in the future.

I agree about removing the name survival.

In XGBoost the relavant objective and metric aresurvival:cox and cox-nloglik.

In Scikit-survival the relevant function is CoxPHSurvivalAnalysis

In R Survival package the relevant function is coxph with ties=“breslow”

In the Lifelines package the relevant class is CoxPHFitter

In stats theory, the metric is sometimes referred to as a "partial likelihood"

Awesome, thanks for those links! That's exactly the type of thing I was looking for. Based on that, I'm happy with dropping survival but keeping cox and cox_ph.

jameslamb · 2026-04-02T03:58:38Z

examples/python-guide/survival_example.py

+def load_survival():
+    """Generate synthetic survival data with signed-time label convention."""
+    n = 500
+    p = 5
+    censoring_rate = 0.3
+    rng = np.random.RandomState(seed=42)
+    X = rng.randn(n, p)
+    log_hazard = X[:, 0] + 0.1 * X[:, 1]
+    times = rng.exponential(np.exp(-log_hazard))
+    censor_times = rng.exponential(np.median(times) / censoring_rate, n)
+    observed = times <= censor_times
+    y = np.where(observed, np.minimum(times, censor_times), -censor_times)
+    return X.astype(np.float64), y.astype(np.float64)
+
+
+X, y = load_survival()


Suggested change

def load_survival():

"""Generate synthetic survival data with signed-time label convention."""

n = 500

p = 5

censoring_rate = 0.3

rng = np.random.RandomState(seed=42)

X = rng.randn(n, p)

log_hazard = X[:, 0] + 0.1 * X[:, 1]

times = rng.exponential(np.exp(-log_hazard))

censor_times = rng.exponential(np.median(times) / censoring_rate, n)

observed = times <= censor_times

y = np.where(observed, np.minimum(times, censor_times), -censor_times)

return X.astype(np.float64), y.astype(np.float64)

X, y = load_survival()

def make_survival(*, n_samples, n_features, censoring_rate, random_state):

"""Generate synthetic survival data with signed-time label convention."""

rng = np.random.RandomState(seed=random_state)

X = rng.randn(n_samples, n_features)

log_hazard = X[:, 0] + 0.1 * X[:, 1]

times = rng.exponential(np.exp(-log_hazard))

censor_times = rng.exponential(np.median(times) / censoring_rate, n_features)

observed = times <= censor_times

y = np.where(observed, np.minimum(times, censor_times), -censor_times)

return X.astype(np.float64), y.astype(np.float64)

X, y = load_survival(n_samples=500, n_features=5, censoring_rate=0.3, random_state=42)

Similar to my comments on the test code... let's please use more informative variable names, and let's make some of these things configurable so people can experiment with different configurations.

jameslamb · 2026-04-02T04:09:38Z

tests/python_package_test/test_engine.py

+    assert "survival_cox_nll" in evals_result["val"]
+    assert "concordance_index" in evals_result["val"]


Suggested change

assert "survival_cox_nll" in evals_result["val"]

assert "concordance_index" in evals_result["val"]

assert set(evals_result["val"].keys()) == {"survival_cox_nll", "concordance_index"}

Let's make this stricter and test for exact equivalence. As I think you noticed, LightGBM automatically adds a metric based on the loss function you choose. This stricter test could catch problems like the wrong metric accidentally being added when the survival_cox objective is used.

jameslamb · 2026-04-02T04:10:52Z

tests/python_package_test/test_engine.py

+    assert "concordance_index" in evals_result["val"]
+    assert len(evals_result["val"]["survival_cox_nll"]) == 50
+    # concordance index should be above random (0.5) for this easy problem
+    assert evals_result["val"]["concordance_index"][-1] > 0.55


Can you please also add a test on the value of survival_cox_nll? If that metric just returned -1000 for every iteration right now, no test failure would alert us to that.

jameslamb · 2026-04-02T04:15:41Z

examples/python-guide/survival_example.py

+import lightgbm as lgb
+
+# Load FLCHAIN dataset (serum free light chain and mortality)
+data = fetch_openml("flchain", version=1, as_frame=True, parser="auto")


I tested this on my local machine with python=3.10.20 and sklearn=1.0.2 and it works fine. Is it possible that the runner does not have access to open ml?

That job uses Python 3.9, not 3.10. It's the standard Appveyor runner for open source projects and should have full access to the internet.

I suspect that maybe that "not found" error is actually from a broad try-catch and that something else in the environment (like some other dependency version) is causing it to fail.

The approach with synthetic data looks good to me!

Thanks for working through that.

Co-authored-by: James Lamb <jaylamb20@gmail.com>

ohines added 11 commits March 26, 2026 21:31

first commit

eeb817f

update

2bbd2d3

update example

6e41a22

simplify example

9204894

rename to survival

150e355

fix test name

0ec49da

run pre-commits

106c97d

remove divide by zero guards

53711ad

fixes

8b65dc0

use real data in example and fix higher is better bug

9789100

simplify print

734e85c

ohines requested review from StrikerRUS, borchero, guolinke, jameslamb, jmoralez and shiyu1994 as code owners March 27, 2026 14:21

ohines changed the title ~~Add survival_cox objective for Cox proportional hazards modelling~~ [c++] Add survival_cox objective for Cox proportional hazards modelling Mar 27, 2026

jameslamb added in progress feature labels Mar 27, 2026

jameslamb reviewed Mar 29, 2026

View reviewed changes

jameslamb added the awaiting review label Mar 29, 2026

ohines added 2 commits March 30, 2026 08:35

Merge branch 'master' into oh-survival

fa1308f

fix precommits

465b2d8

jameslamb reviewed Apr 1, 2026

View reviewed changes

ohines added 5 commits April 1, 2026 15:05

remove parse argument

15b3864

Merge branch 'master' into oh-survival

00e94f8

simplify indexing for max and sum operations

ae39518

modify NeedAccuratePrediction

d54662c

use synthetic data in example

eb083b1

update metrics.R

0051861

jameslamb requested changes Apr 2, 2026

View reviewed changes

ohines and others added 9 commits April 2, 2026 09:12

Update tests/python_package_test/test_engine.py

03ad7aa

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update tests/python_package_test/test_engine.py

e3059c3

Co-authored-by: James Lamb <jaylamb20@gmail.com>

rename load_survival to make_survival

11ffcf9

update test criteria

0b1dba1

remove survival alias

a0eb4db

update python tests

2037974

add R test

f3ca82f

Merge branch 'master' into oh-survival

314a5e8

fix R lint issue

0628799

	def load_survival():
	def make_survival(*, n_samples, random_state):


		- survival analysis application

		- ``survival_cox``, `Cox proportional hazards <https://en.wikipedia.org/wiki/Proportional_hazards_model>`__ partial likelihood with Breslow's method for ties, aliases: ``survival``, ``cox``, ``cox_ph``

		assert "survival_cox_nll" in evals_result["val"]
		assert "concordance_index" in evals_result["val"]

	assert "survival_cox_nll" in evals_result["val"]
	assert "concordance_index" in evals_result["val"]
	assert set(evals_result["val"].keys()) == {"survival_cox_nll", "concordance_index"}

Conversation

ohines commented Mar 27, 2026

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants