[python-package] fix misleading redundant parameter warnings in Booster.refit() by arjunprakash027 · Pull Request #7124 · lightgbm-org/LightGBM

arjunprakash027 · 2026-01-11T21:45:57Z

Fix Misleading Redundant Parameter Warnings in `Booster.refit()`

Problem

As discussed in #6793 , the Booster.refit() method raises misleading warnings about redundant parameters being passed to the internal Dataset constructor. The issue occurs when parameters like categorical_feature or label_column are stored in the Booster's internal params dictionary and then inadvertently passed both as keyword arguments and within the params dictionary to Dataset(), triggering false-positive warnings.

For additional context, see #6793 (comment)

Solution

This PR modifies Booster.refit() to implement parameter routing:

Parameter Inspection: Uses inspect.signature() to dynamically identify which parameters belong to the Dataset constructor by examining Dataset._lazy_init.
Parameter Routing: Checks if a Dataset-related parameter (such as categorical_feature or label_column) exists in the Booster's internal params but has not been explicitly overridden by the user in the refit() call. When a parameter qualifies for routing (i.e., it's a Dataset parameter with a default value in the refit() signature), it is moved from new_params to the local variable scope for the Dataset constructor. This ensures each parameter is passed exactly once to the internal Dataset object, eliminating the redundant warning.

I've tried to resolve the warning issue without changing the actual behavior of the refitting process.

jameslamb

Thanks for all the investigation you've done so far! Leaving a blocking review, as I'd like the opportunity to look into this and suggest a different fix.

This patch with a double-nested if block inside a for loop and uses of inspect and locals() looks quite complex, and I'm worried it'd be difficult to modify correctly in the future.

arjunprakash027 · 2026-02-01T18:52:22Z

Thanks @jameslamb!
I agree, The complexity of code I wrote is overkill for this fix. I can try and write a simple fix, and also investigate if the problem is booster.refit() specific or if its an general problem.
Or, I can wait for your suggestion for a different fix.

jameslamb

Sorry for the long time to review... I'm ready now to work with you on this. Please see my suggestions and let me know if you have any questions.

jameslamb · 2026-04-02T02:59:46Z

python-package/lightgbm/basic.py

+        args_names = inspect.signature(Dataset._lazy_init).parameters.keys()
+        refit_signature = inspect.signature(self.__class__.refit).parameters
+        for lazy_init_args in args_names:
+            if lazy_init_args in refit_signature:
+                default_val = refit_signature[lazy_init_args].default
+                current_val = locals().get(lazy_init_args)
+                is_default = current_val is default_val
+
+                if is_default:
+                    locals()[lazy_init_args] = new_params.get(lazy_init_args, current_val)
+                    new_params.pop(lazy_init_args, None)


Suggested change

args_names = inspect.signature(Dataset._lazy_init).parameters.keys()

refit_signature = inspect.signature(self.__class__.refit).parameters

for lazy_init_args in args_names:

if lazy_init_args in refit_signature:

default_val = refit_signature[lazy_init_args].default

current_val = locals().get(lazy_init_args)

is_default = current_val is default_val

if is_default:

locals()[lazy_init_args] = new_params.get(lazy_init_args, current_val)

new_params.pop(lazy_init_args, None)

# 'categorical_feature' can end up in self.params when a Booster

# is created from a model string or file... pre-process to ensure it's passed

# via a keyword argument to the Dataset constructor instead of 'params'.

if "categorical_feature" in new_params:

cat_features_from_params = new_params.pop("categorical_feature")

if categorical_feature == "auto" or cat_features_from_params == categorical_feature:

categorical_feature = cat_features_from_params

else:

error_msg = (

"'categorical_feature' value passed to Booster.refit() is different from "

"'categorical_feature' value found in Booster.params. "

"Preferring the value passed via keyword argument. "

"Using refit() to change which columns are treated as categorical is not supported. "

"If you have a valid use case for this, please open an issue at https://github.com/lightgbm-org/LightGBM/issues."

)

raise LightGBMError(error_msg)

I've investigated this see #6793 (comment)

This code using locals() and inspect.signature() is quite complex, especially just to avoid a warning.

categorical_feature is the only value that conflicts between both parameters: in the model string and the signature of Dataset._lazy_init()... let's just add specific handling for that.

(note: we don't need to care about the parameter aliases like cat_feature, categorical_column, etc. listed at https://lightgbm.readthedocs.io/en/latest/Parameters.html#categorical_feature, because only the main non-alias parameters are written to model strings / files).

Proposing the following:

use this code I've suggested above, which reconciles the categorical_feature keyword arg and entry in self.params

add a unit test called test_refit_does_not_warn_about_categorical_features in test_basic.py testing the following:

no warnings raised (use something like the reproducible example from [python-package] Booster.refit() raises a misleading warning when using categorical features #6793 (comment))

Booster.params["categorical_feature"] is unchanged

the expected error message is raised if the keyword argument and value in params differ

refit()

Please let me know if you have any questions. If you don't have time or interest in continuing this, please let me know and I'll push the changes to your branch here...You've put so much effort into this already and your investigation helped us to find the root cause, so one way or another I want you to get credit for this commit to LightGBM.

Hey @jameslamb - Seems doable and I have no doubts until now.
I'll make the changes and notify.
Thanks for alternative suggestion for the fix

I can also write the test for this right?

Hey @jameslamb,

I've implemented the workaround we discussed to handle the categorical_feature parameter when loading a model from a string/text file.

Alongside the fix, I've added a new test (test_refit_does_not_warn_about_categorical_features). The test verifies three key things:

That the categorical_feature loaded from the text file matches the one passed during the initial Dataset creation.

That passing a conflicting categorical_feature list during refit() correctly raises a LightGBMError with the expected message.

That refit() functions normally (without warnings/errors) when relying on the default parameter fallback.

For the test setup, I made sure to use the existing rng fixture in conftest.py for seeded data generation, and utilized PyTest's builtin tmp_path fixture for automatic creation and cleanup of the sample_model.txt file.

Let me know if the coding style looks alright to you or if you need me to adjust anything!

Oh and, the below command will run the test
pytest 'tests/python_package_test/test_basic.py::test_refit_does_not_warn_about_categorical_features'

Thanks for your work on this!

I went to review tonight and realized there were some additional changes needed, mainly to handle LightGBM's "parameter aliases". See https://lightgbm.readthedocs.io/en/latest/Parameters.html#categorical_feature ... any of categorical_feature, cat_feature, categorical_column, cat_column, categorical_features found in params all mean the same thing, and that has to be accounted for.

There is also a surprising (and slightly inconsistent) behavior I stumbled over where lightgbm is manually updating params using an alias ("categorical_column"),

LightGBM/python-package/lightgbm/basic.py

Line 2170 in d14c4ba

params["categorical_column"] = sorted(categorical_indices)

...something we try hard not to do in the library, especially because the "main" (non-alias) parameter names are written to model strings / files.

The changes needed require a pretty deep understanding of LightGBM's internals, so I decided to just push them directly myself: 9500c4d

Please take a look when you have time, and ask any questions you have. If you don't see any issues, and if CI passes, I'll merge this. (I'd usually ask for a review from another maintainer, but the repo isn't very active right now and this is a small change in a relatively rarely-used part of the API, so I think it's ok).

Makes sense to me! thank you for the change.
Rookie mistake, I did not consider the alias and column name to index resolution done by lgbm before thinking about the fix.
Test looks better now (I was bit concerned about using a temp dir, its fixed now, thank you)

No problem, thanks for looking and for getting us this far!

It just happened that fixing this required going fairly deep into LightGBM... the "parameter alias" thing is complex. It's a frequent source of bugs, and I even found another one while working through this (will address it in a follow-up PR).

If you're interested in contributing more after this, I'd be happy to suggest some other areas that would help you gain familiarity with the project more gradually.

Ofcourse James, I'll be open to helping and contributing!
Thank you

… add validation test case

jameslamb · 2026-04-04T06:10:30Z

One other note... in the future when you contribute to git-based projects, create a new branch for each of your contributions instead of using the default branch (in this case, master).

When this PR is merged, master here and master on your fork will have incompatible histories, and you'll probably want to delete your fork and create a new one.

arjunprakash027 added 2 commits January 11, 2026 21:21

improve refit param handling in basic.py

b4a66f0

improve refit param handling in basic.py

3ea385c

arjunprakash027 requested review from StrikerRUS, borchero, guolinke, jameslamb, jmoralez and shiyu1994 as code owners January 11, 2026 21:45

jameslamb and others added 4 commits January 17, 2026 21:13

Merge branch 'master' into master

17bad8c

Merge branch 'master' into master

f3ab324

Merge branch 'master' into master

67a6a96

Merge branch 'master' into master

20eed43

jameslamb mentioned this pull request Feb 1, 2026

[python-package] Booster.refit() raises a misleading warning when using categorical features #6793

Open

Merge branch 'master' into master

3edef84

jameslamb added in progress fix labels Feb 1, 2026

jameslamb requested changes Feb 1, 2026

View reviewed changes

arjunprakash027 added 6 commits February 3, 2026 11:21

Merge branch 'microsoft:master' into master

f02b6b1

Merge branch 'master' into master

7a19f12

Merge branch 'master' into master

88f9ffd

Merge branch 'master' into master

27e6570

Merge branch 'master' into master

4ff5157

Merge branch 'master' into master

cc15929

jameslamb mentioned this pull request Mar 13, 2026

[python-package] Fix misleading warning in Booster.refit() with categorical features #7196

Closed

arjunprakash027 and others added 3 commits March 16, 2026 02:28

Merge branch 'lightgbm-org:master' into master

0d50493

Merge branch 'lightgbm-org:master' into master

d3f7897

Merge branch 'master' into master

c2bbbb6

jameslamb requested changes Apr 2, 2026

View reviewed changes

fix: prevent categorical_feature conflicts during Booster.refit() and…

0dc6eb4

… add validation test case

arjunprakash027 and others added 2 commits April 2, 2026 20:59

Merge branch 'master' into master

22d5582

handle aliases, rework tests

9500c4d

keep already-updated params

fdf87da

jameslamb removed the in progress label Apr 4, 2026

-        args_names = inspect.signature(Dataset._lazy_init).parameters.keys()
-        refit_signature = inspect.signature(self.__class__.refit).parameters
-        for lazy_init_args in args_names:
-            if lazy_init_args in refit_signature:
-                default_val = refit_signature[lazy_init_args].default
-                current_val = locals().get(lazy_init_args)
-                is_default = current_val is default_val
-                if is_default:
-                    locals()[lazy_init_args] = new_params.get(lazy_init_args, current_val)
-                    new_params.pop(lazy_init_args, None)
+        # 'categorical_feature' can end up in self.params when a Booster
+        # is created from a model string or file... pre-process to ensure it's passed
+        # via a keyword argument to the Dataset constructor instead of 'params'.
+        if "categorical_feature" in new_params:
+            cat_features_from_params = new_params.pop("categorical_feature")
+            if categorical_feature == "auto" or cat_features_from_params == categorical_feature:
+                categorical_feature = cat_features_from_params
+            else:
+                error_msg = (
+                    "'categorical_feature' value passed to Booster.refit() is different from  "
+                    "'categorical_feature' value found in Booster.params. "
+                    "Preferring the value passed via keyword argument. "
+                    "Using refit() to change which columns are treated as categorical is not supported. "
+                    "If you have a valid use case for this, please open an issue at https://github.com/lightgbm-org/LightGBM/issues."
+                )
+                raise LightGBMError(error_msg)

Conversation

arjunprakash027 commented Jan 11, 2026 • edited by jameslamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix Misleading Redundant Parameter Warnings in Booster.refit()

Problem

Solution

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

arjunprakash027 commented Feb 1, 2026

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameslamb commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arjunprakash027 commented Jan 11, 2026 •

edited by jameslamb

Loading

Fix Misleading Redundant Parameter Warnings in `Booster.refit()`