1.4.0 release (#462)

mmschlk · Copilot · web-flow · commit e405ef33269f · 2025-10-31T09:48:53.000+01:00
* moves sparse_transform imports into function calls

* change ci pipeline

* removes override calls

* removes checkmarks because windows is sad and does not like colors :(

* moves ProxySPEX up in the README.md

* updated pyproject.toml

* updated CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Copilot &lt;175728472+Copilot@users.noreply.github.com&gt;

---------

Co-authored-by: Copilot &lt;175728472+Copilot@users.noreply.github.com&gt;
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -31,20 +31,32 @@ jobs:
     # ----------------------------------------------------------------------------------------------
     install_and_import_shapiq:
         name: Install and import check shapiq
-        runs-on: ubuntu-latest
+        strategy:
+            fail-fast: false
+            matrix:
+                include:
+                    - os: ubuntu-latest
+                      python-version: "3.10"
+                    - os: ubuntu-latest
+                      python-version: "3.13"
+                    - os: windows-latest
+                      python-version: "3.12"
+                    - os: macos-latest
+                      python-version: "3.12"
+        runs-on: ${{ matrix.os }}
         steps:
             -   uses: actions/checkout@v5
             -   name: Set up Python and uv
                 uses: astral-sh/setup-uv@v7
                 with:
-                    python-version: "3.12"
+                    python-version: ${{ matrix.python-version }}
                     enable-cache: true
             -   name: Create uv virtual environment
                 run: uv venv
             -   name: Install shapiq package
                 run: uv run --no-sync uv pip install .
             -   name: Test import
-                run: uv run --no-sync python -c "import shapiq; print('✅ shapiq imported successfully')"
+                run: uv run --no-sync python -c "import shapiq; print('shapiq imported successfully')"
     # ----------------------------------------------------------------------------------------------
     # Install and Import Check
     # ----------------------------------------------------------------------------------------------
@@ -65,7 +77,7 @@ jobs:
             -   name: Install dependencies
                 run: uv sync --no-dev --group all_ml
             -   name: Test import of shapiq_games
-                run: uv run --no-sync python -c "import shapiq_games; print('✅ shapiq_games imported successfully')"
+                run: uv run --no-sync python -c "import shapiq_games; print('shapiq_games imported successfully')"
     # ----------------------------------------------------------------------------------------------
     # Unit Tests with Matrix
     # ----------------------------------------------------------------------------------------------
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,22 +1,31 @@
 # Changelog
 
-## Development
-
-### Introducing ProxySPEX
-Adds the ProxySPEX approximator for efficient computation of sparse interaction values using the new ProxySPEX algorithm.
-For further details refer to: Butler, L., Kang, J.S., Agarwal, A., Erginbas, Y.E., Yu, Bin, Ramchandran, K. (2025). ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs https://arxiv.org/pdf/2505.17495
-
-
-### Introducing ProductKernelExplainer
-The ProductKernelExplainer is a new model-specific explanation method for Product Kernel based machine learning model, such as  Gaussian Processes or Support Vector Machines.
-For further details refer to:  https://arxiv.org/abs/2505.16516
+## v1.4.0 (2025-10-31)
+
+### Introducing ProxySPEX [#442](https://github.com/mmschlk/shapiq/pull/442)
+Adds the [`ProxySPEX`](https://arxiv.org/pdf/2505.17495) [approximator](https://github.com/mmschlk/shapiq/blob/main/src/shapiq/approximator/sparse/proxyspex.py) for efficient computation of sparse interaction values using the new ProxySPEX algorithm.
+ProxySPEX is a direct extension of the [SPEX](https://openreview.net/pdf?id=UQpYmaBGwB) algorithm, which uses clever fourier representations of the value function and analysis to identify the most relevant interactions (in terms of `Moebius` coefficients) and transforms them into summary scores (Shapley interactions).
+One of the key innovations of ProxySPEX compared to SPEX is the use of a proxy model that approximates the original value function (uses a LightGBM model internally).
+**Notably,** to run ProxySPEX, users have to install the `lightgbm` package in their environment.
+For further details we refer to the paper, which will be presented at NeurIPS'2025: Butler, L., Kang, J.S., Agarwal, A., Erginbas, Y.E., Yu, Bin, Ramchandran, K. (2025). ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs. [arxiv](https://arxiv.org/pdf/2505.17495)
+
+### Introducing ProductKernelExplainer [#431](https://github.com/mmschlk/shapiq/pull/431)
+The `ProductKernelExplainer` is a new model-specific explanation method for machine learning models that utilize Product Kernels, such as Gaussian Processes and Support Vector Machines.
+Similar to the TreeExplainer, it uses a specific computation scheme that leverages the structure of the underlying product kernels to efficiently compute exact Shapley values.
+**Note**, this explainer is only able to compute Shapley values (not higher-order interactions yet).
+For further details we refer to the paper: Mohammadi, M., Chau, S.-L., Muandet, K. Computing Exact Shapley Values in Polynomial Time for Product-Kernel Methods. [arxiv](https://arxiv.org/abs/2505.16516)
+
+### New Conditional Imputation Methods [#435](https://github.com/mmschlk/shapiq/pull/435)
+Based on traditional statistical methods, we implemented two new conditional imputation methods named `GaussianImputer` and `GaussianCopulaImputer` within the `shapiq.imputer` module.
+Both imputation methods are designed to handle missing feature imputation in a way that respects the underlying data distribution with the assumption that the data follows a multivariate Gaussian distribution (`GaussianImputer`) or can be represented with Gaussian copulas (`GaussianCopulaImputer`).
+In practice, this assumption may often be violated, but these methods can still provide reasonable imputations in many scenarios and serve as a useful benchmark enabling easier research in the field of conditional imputation for Shapley value explanations.
 
 ### Shapiq Statically Typechecked [#430](https://github.com/mmschlk/shapiq/pull/430)
 We have introduced static type checking to `shapiq` using [Pyright](https://github.com/microsoft/pyright), and integrated it into our `pre-commit` hooks.
 This ensures that type inconsistencies are caught early during development, improving code quality and maintainability.
 Developers will now benefit from immediate feedback on type errors, making the codebase more robust and reliable as it evolves.
 
-### Separation of `shapiq` into `shapiq`, `shapiq_games`, and `shapiq-benchmark`
+### Separation of `shapiq` into `shapiq`, `shapiq_games`, and `shapiq-benchmark` [#459](https://github.com/mmschlk/shapiq/issues/459)
 We have begun the process of modularizing the `shapiq` package by splitting it into three distinct packages: `shapiq`, `shapiq_games`, and `shapiq-benchmark`.
 
 - The `shapiq` package now serves as the core library. It contains the main functionality, including approximators, explainers, computation routines, interaction value logic, and plotting utilities.
@@ -25,25 +34,32 @@ We have begun the process of modularizing the `shapiq` package by splitting it i
 
 This restructuring aims to improve maintainability and development scalability. The core `shapiq` package will continue to receive the majority of updates and enhancements, and keeping it streamlined ensures better focus and usability. Meanwhile, separating games and benchmarking functionality allows these components to evolve more independently while maintaining compatibility through clearly defined dependencies.
 
+### List of All New Features
+- adds the ProxySPEX (Proxy Sparse Explanation) module in `approximator.sparse` for even more efficient computation of sparse interaction values [#442](https://github.com/mmschlk/shapiq/pull/442)
+- uses `predict_logits` method of sklearn-like classifiers if available in favor of `predict_proba` to support models that also offer logit outputs like TabPFNClassifier for better interpretability of the explanations [#426](https://github.com/mmschlk/shapiq/issues/426)
+- adds the `shapiq.explainer.ProductKernelExplainer` for model-specific explanation of Product Kernel based models like Gaussian Processes and Support Vector Machines. [#431](https://github.com/mmschlk/shapiq/pull/431)
+- adds the `GaussianImputer` and `GaussianCopulaImputer` classes to the `shapiq.imputer` module for conditional imputation based on Gaussian assumptions. [#435](https://github.com/mmschlk/shapiq/pull/435)
+- speeds up the imputation process in `MarginalImputer` by dropping an unnecessary loop [#449](https://github.com/mmschlk/shapiq/pull/449)
+- makes `n_players` argument of `shapiq.ExactComputer` optional when a `shapiq.Game` object is passed [#388](https://github.com/mmschlk/shapiq/issues/388)
+
+### Removed Features and Breaking Changes
+- removes the ability to load `InteractionValues` from pickle files. This is now deprecated and will be removed in the next release. Use `InteractionValues.save(..., as_json=True)` to save interaction values as JSON files instead. [#413](https://github.com/mmschlk/shapiq/issues/413)
+- removes `coalition_lookup` and `value_storage` properties from `shapiq.Game` since the seperated view on game values and coalitions they belong to is now outdated. Use the `shapiq.Game.game_values` dictionary instead. [#430](https://github.com/mmschlk/shapiq/pull/430)
+- reorders the arguments of `shapiq.ExactComputer`'s constructor to have `n_players` be optional if a `shapiq.Game` object is passed. [#388](https://github.com/mmschlk/shapiq/issues/388)
+
+### Bugfixes
+- fixes a bug where RegressionFBII approximator was throwing an error when the index was `'BV'` or `'FBII'`.[#420](https://github.com/mmschlk/shapiq/pull/420)
+- allows subtraction and addition of `InteractionValues` objects with different `index` attributes by ignoring and raising a warning instead of an error. The resulting `InteractionValues` object will have the `index` of the first object. [#423](https://github.com/mmschlk/shapiq/pull/423)
+
 ### Maintenance and Development
 - refactored the `shapiq.Games` and `shapiq.InteractionValues` API by adding an interactions and game_values dictionary as the main data structure to store the interaction scores and game values. This allows for more efficient storage and retrieval of interaction values and game values, as well as easier manipulation of the data. [#419](https://github.com/mmschlk/shapiq/pull/419)
 - addition and subtraction of InteractionValues objects (via `shapiq.InteractionValues.__add__`) now also works for different indices, which will raise a warning and will return a new InteractionValues object with the index set of the first. [#422](https://github.com/mmschlk/shapiq/pull/422)
 - refactors the `shapiq.ExactComputer` to allow for initialization without passing n_players when a `shapiq.Game` object is passed [#388](https://github.com/mmschlk/shapiq/issues/388). Also introduces a tighter type hinting for the `index` parameter using `Literal` types. [#450](https://github.com/mmschlk/shapiq/pull/450)
+- removes zeros from the `InteractionValues.coalition_lookup` from the `MoebiusConverter` for better memory efficiency. [#369](https://github.com/mmschlk/shapiq/issues/369)
 
 ### Docs
 - added an example notebook for `InteractionValues`, highlighting *Initialization*, *Modification*, *Visualization* and *Save and Loading*.
-
-### Bugfixes
-- fixes a bug where RegressionFBII approximator was throwing an error when the index was `'BV'` or `'FBII'`.[#420](https://github.com/mmschlk/shapiq/pull/420)
-
-### All New Features
-- adds the ProxySPEX (Proxy Sparse Explanation) module in `approximator.sparse` for even more efficient computation of sparse interaction values [#442](https://github.com/mmschlk/shapiq/pull/442)
-- uses `predict_logits` method of sklearn-like classifiers if available in favor of `predict_proba` to support models that also offer logit outputs like TabPFNClassifier for better interpretability of the explanations [#426](https://github.com/mmschlk/shapiq/issues/426)
-- adds the `shapiq.explainer.ProductKernelExplainer` for model-specific explanation of Product Kernel based models like Gaussian Processes and Support Vector Machines. [#431](https://github.com/mmschlk/shapiq/pull/431)
-
-### Removed Features
-- removes the ability to load `InteractionValues` from pickle files. This is now deprecated and will be removed in the next release. Use `InteractionValues.save(..., as_json=True)` to save interaction values as JSON files instead. [#413](https://github.com/mmschlk/shapiq/issues/413)
-- removes `coalition_lookup` and `value_storage` properties from `shapiq.Game` since the seperated view on game values and coalitions they belong to is now outdated. Use the `shapiq.Game.game_values` dictionary instead. [#430](https://github.com/mmschlk/shapiq/pull/430)
+- makes API reference docs more consistent by adding missing docstrings and improving existing ones across the package. [#420](https://github.com/mmschlk/shapiq/pull/420), [#437](https://github.com/mmschlk/shapiq/issues/437), [#452](https://github.com/mmschlk/shapiq/issues/452) among others.
 
 ## v1.3.2 (2025-10-14)
 
diff --git a/README.md b/README.md
@@ -117,6 +117,28 @@ interaction_values.plot_force(feature_names=...)
   <img width="800px" src="https://raw.githubusercontent.com/mmschlk/shapiq/main/docs/source/_static/images/motivation_sv_and_si.png" alt="An example Force Plot for the California Housing Dataset with Shapley Interactions">
 </p>
 
+### Use ProxySPEX (Proxy SParse EXplainer) <img src="https://raw.githubusercontent.com/mmschlk/shapiq/main/docs/source/_static/images/spex_logo.png" alt="spex_logo" align="right" height="75px"/>
+For large-scale use-cases you can also check out the [👓``ProxySPEX``](https://shapiq.readthedocs.io/en/latest/api/shapiq.approximator.sparse.html#shapiq.approximator.sparse.SPEX) approximator.
+
+```python
+# load your data and model with large number of features
+data, model, n_features = ...
+
+# use the ProxySPEX approximator directly
+approximator = shapiq.ProxySPEX(n=n_features, index="FBII", max_order=2)
+fbii_scores = approximator.approximate(budget=2000, game=model.predict)
+
+# or use ProxySPEX with an explainer
+explainer = shapiq.Explainer(
+    model=model,
+    data=data,
+    index="FBII",
+    max_order=2,
+    approximator="proxyspex"  # specify ProxySPEX as approximator
+)
+explanation = explainer.explain(data[0])
+```
+
 ### Visualize feature interactions
 
 A handy way of visualizing interaction scores up to order 2 are network plots.
@@ -162,28 +184,6 @@ fsii_values.plot_force()               # plot the force plot
   <img width="800px" src="https://raw.githubusercontent.com/mmschlk/shapiq/main/docs/source/_static/images/fsii_tabpfn_force_plot_example.png" alt="Force Plot of FSII values as derived from the example tabpfn notebook">
 </p>
 
-### Use ProxySPEX (Proxy SParse EXplainer) <img src="https://raw.githubusercontent.com/mmschlk/shapiq/main/docs/source/_static/images/spex_logo.png" alt="spex_logo" align="right" height="75px"/>
-For large-scale use-cases you can also check out the [👓``ProxySPEX``](https://shapiq.readthedocs.io/en/latest/api/shapiq.approximator.sparse.html#shapiq.approximator.sparse.SPEX) approximator.
-
-```python
-# load your data and model with large number of features
-data, model, n_features = ...
-
-# use the ProxySPEX approximator directly
-approximator = shapiq.ProxySPEX(n=n_features, index="FBII", max_order=2)
-fbii_scores = approximator.approximate(budget=2000, game=model.predict)
-
-# or use ProxySPEX with an explainer
-explainer = shapiq.Explainer(
-    model=model,
-    data=data,
-    index="FBII",
-    max_order=2,
-    approximator="proxyspex"  # specify ProxySPEX as approximator
-)
-explanation = explainer.explain(data[0])
-```
-
 
 ## 📖 Documentation with tutorials
 The documentation of ``shapiq`` can be found at https://shapiq.readthedocs.io.
diff --git a/pyproject.toml b/pyproject.toml
@@ -26,11 +26,13 @@ dependencies = [
 ]
 authors = [
     {name = "Maximilian Muschalik", email = "Maximilian.Muschalik@lmu.de"},
+    {name = "Santo M. A. R. Thies", email = "S.Thies@campus.lmu.de"},
     {name = "Hubert Baniecki"},
     {name = "Fabian Fumagalli"},
 ]
 maintainers = [
     {name = "Maximilian Muschalik", email = "Maximilian.Muschalik@lmu.de"},
+    {name = "Santo M. A. R. Thies", email = "S.Thies@campus.lmu.de"},
 ]
 license = "MIT"
 classifiers = [
diff --git a/src/shapiq/imputer/gaussian_copula_imputer.py b/src/shapiq/imputer/gaussian_copula_imputer.py
@@ -3,7 +3,6 @@
 from __future__ import annotations
 
 from typing import TYPE_CHECKING, cast
-from typing_extensions import override
 
 import numpy as np
 from scipy.stats import norm, rankdata
@@ -35,7 +34,6 @@ class GaussianCopulaImputer(GaussianImputer):
 
     More specifically, values will be clipped to the range ``[epsilon, 1 - epsilon]``."""
 
-    @override
     def __init__(
         self,
         model: (object | Game | Callable[[npt.NDArray[np.floating]], npt.NDArray[np.floating]]),
@@ -46,6 +44,26 @@ def __init__(
         random_state: int | None = None,
         verbose: bool = False,
     ) -> None:
+        """Initializes the GaussianCopulaImputer.
+
+        Args:
+            model: The model to explain as a callable function expecting a data points as input and
+                returning the model's predictions.
+
+            data: The background data to use for the explainer as a two-dimensional array with shape
+                ``(n_samples, n_features)``.
+
+            x: The explanation point as a ``np.ndarray`` of shape ``(1, n_features)`` or
+                ``(n_features,)``.
+
+            sample_size: The number of Monte Carlo samples to draw from the conditional background
+                data for imputation.
+
+            random_state: An optional random seed for reproducibility.
+
+            verbose: A flag to enable verbose imputation, which will print a progress bar for model
+                evaluation. Note that this can slow down the imputation process.
+        """
         super().__init__(
             model=model,
             data=data,
diff --git a/src/shapiq/imputer/gaussian_imputer.py b/src/shapiq/imputer/gaussian_imputer.py
@@ -3,7 +3,6 @@
 from __future__ import annotations
 
 from typing import TYPE_CHECKING, cast
-from typing_extensions import override
 
 import numpy as np
 from numpy.random import default_rng
@@ -47,14 +46,22 @@ def __init__(
         """Initializes the class.
 
         Args:
-            model: The model to explain as a callable function expecting data points as input and
+            model: The model to explain as a callable function expecting a data points as input and
                 returning the model's predictions.
-            data: The background data to use for the explainer as a ``np.ndarray`` of shape ``(n_samples, n_features)``.
-            x: The explanation point as a ``np.ndarray`` of shape ``(1, n_features)`` or ``(n_features,)``. Defaults to ``None``.
-            sample_size: Number of Monte Carlo samples for imputation. Defaults to ``100``.
-            random_state: The random state to use for sampling. Defaults to ``None``.
-            verbose: A flag to enable verbose imputation, which will print a progress bar for model evaluation.
-                Note that this can slow down the imputation process. Defaults to ``False``.
+
+            data: The background data to use for the explainer as a two-dimensional array with shape
+                ``(n_samples, n_features)``.
+
+            x: The explanation point as a ``np.ndarray`` of shape ``(1, n_features)`` or
+                ``(n_features,)``.
+
+            sample_size: The number of Monte Carlo samples to draw from the conditional background
+                data for imputation.
+
+            random_state: An optional random seed for reproducibility.
+
+            verbose: A flag to enable verbose imputation, which will print a progress bar for model
+                evaluation. Note that this can slow down the imputation process.
 
         Raises:
             CategoricalFeatureError: If the background data contains any categorical features.
@@ -207,7 +214,6 @@ def _sample_monte_carlo(
 
         return samples_all_coalitions
 
-    @override
     def value_function(self, coalitions: npt.NDArray[np.bool]) -> npt.NDArray[np.floating]:
         """Imputes the missing values of a data point and gets predictions for all coalitions.
 

Original file line number	Diff line number	Diff line change
`@@ -26,11 +26,13 @@ dependencies = [`
`26`	`26`	`]`
`27`	`27`	`authors = [`
`28`	`28`	`{name = "Maximilian Muschalik", email = "Maximilian.Muschalik@lmu.de"},`
	`29`	`+ {name = "Santo M. A. R. Thies", email = "S.Thies@campus.lmu.de"},`
`29`	`30`	`{name = "Hubert Baniecki"},`
`30`	`31`	`{name = "Fabian Fumagalli"},`
`31`	`32`	`]`
`32`	`33`	`maintainers = [`
`33`	`34`	`{name = "Maximilian Muschalik", email = "Maximilian.Muschalik@lmu.de"},`
	`35`	`+ {name = "Santo M. A. R. Thies", email = "S.Thies@campus.lmu.de"},`
`34`	`36`	`]`
`35`	`37`	`license = "MIT"`
`36`	`38`	`classifiers = [`