Skip to content

[python-package] Added tests on Booster.shuffle_model()#7168

Open
daguirre11 wants to merge 1 commit intolightgbm-org:masterfrom
daguirre11:test-booster-shuffle-models
Open

[python-package] Added tests on Booster.shuffle_model()#7168
daguirre11 wants to merge 1 commit intolightgbm-org:masterfrom
daguirre11:test-booster-shuffle-models

Conversation

@daguirre11
Copy link
Copy Markdown
Contributor

Contributes to #7031
BEFORE
Screenshot 2026-02-24 at 1 31 13 PM

AFTER
Screenshot 2026-02-24 at 1 23 14 PM

2 line difference in coverage for /lightgbm/basic.py

From my understanding shuffle_models literally just reorders the trees by calling the C API LGBM_BoosterShuffleModels. The predictions will be the same but the actual model itself will be different, hence, why model_to_string() is different before and after (Tree=0 and Tree=1 switch positions). Please let me know if I am understanding this correctly.

Copy link
Copy Markdown
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this! This test should be made significantly stronger to give us high confidence that the behavior of shuffle_models() is correct.

I left some guidance in comments. But if you're feeling like it's too much for you to investigate right now, please let me know and we can close this so someone else can contribute it.

Comment on lines +1109 to +1115
train_set = lgb.Dataset(X_train, label=y_train)
booster = lgb.Booster(
params={"objective": "binary", "verbose": -1},
train_set=train_set,
)
for _ in range(2):
booster.update()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
train_set = lgb.Dataset(X_train, label=y_train)
booster = lgb.Booster(
params={"objective": "binary", "verbose": -1},
train_set=train_set,
)
for _ in range(2):
booster.update()
booster = lgb.train(
params={
"objective": "binary",
"num_iterations": 10,
"num_leaves": 7,
"verbose": -1,
},
train_set=lgb.Dataset(X, label=y),
)

Let's use lgb.train() for this instead of a for loop and an update please, and let's make the model smaller so the test is faster.

model_str_before = booster.model_to_string()
booster.shuffle_models(start_iteration=0, end_iteration=-1)
model_str_after = booster.model_to_string()
assert model_str_before != model_str_after
Copy link
Copy Markdown
Member

@jameslamb jameslamb Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a very strong test. For example, it'd pass if shuffle_models() corrupted the model in some serious and incorrect way. This should be made much stricter.

To do that, you'll have to look a bit deeper into what the function is doing. Start with the docstring:

Parameters
----------
start_iteration : int, optional (default=0)
The first iteration that will be shuffled.
end_iteration : int, optional (default=-1)
The last iteration that will be shuffled.
If <= 0, means the last available iteration.

The test should train 10 trees (for example) and:

  1. omit the first 2 trees, and confirm that their placement is not changed
  2. omit the final tree, and confirm that its placement isn't changed
  3. confirm that the set of trees is identical and only the ordering is different
  4. confirm that booster.predict() (with start_iteration left at its default) produces identical results before and after (ordering should not affect the predictions if you predict with all trees)
  5. check that the expected behavior happens if start_iteration is negative or end_iteration is larger than the number of trees in the model

Comment on lines +1104 to +1108
X_train, _, y_train, _ = train_test_split(
*load_breast_cancer(return_X_y=True),
test_size=0.1,
random_state=42,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
X_train, _, y_train, _ = train_test_split(
*load_breast_cancer(return_X_y=True),
test_size=0.1,
random_state=42,
)
X, y = load_breast_cancer(return_X_y=True)

The test isn't using the held-out validation data, let's skip the unnecessary train-test splitting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb sorry for the late response, I will address all your responses on PRs today. Thank you for reviewing!

@jameslamb jameslamb changed the title [python-package] Added unit test for Booster shuffle models [python-package] Added tests on Booster.shuffle_model() Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants