Skip to content

[jvm-packages] XGBoost4j-Spark fails on Spark 4.0 with NoSuchMethodError in Param constructor #12135

@wbo4958

Description

@wbo4958

Description

XGBoost4j-Spark (3.3.0-SNAPSHOT, Scala 2.13 build) fails immediately when used with Apache Spark 4.0.0 due to a binary incompatibility in org.apache.spark.ml.param.Param. The issue is resolved when running against Spark 4.1.0 (which includes the upstream fix SPARK-52259).

Environment

  • XGBoost: xgboost4j-spark-gpu_2.13-3.3.0-SNAPSHOT (built 2025-03-31)
  • OS: Linux (Ubuntu)

Test Results

Spark Version Scala Result Notes
4.0.0 2.13.16 FAIL NoSuchMethodError on Param constructor (SPARK-52259)
4.1.0 2.13.17 PASS XGBoostClassifier trains and predicts correctly

Error

Instantiating XGBoostClassifier fails with:

java.lang.NoSuchMethodError: 'void org.apache.spark.ml.param.Param.<init>(org.apache.spark.ml.util.Identifiable, java.lang.String, java.lang.String, scala.Function1)'
  at ml.dmlc.xgboost4j.scala.spark.params.TreeBoosterParams.$init$(TreeBoosterParams.scala:89)

Steps to Reproduce

// In spark-shell (Spark 4.0.0) with --jars xgboost4j-spark-gpu_2.13-3.3.0-SNAPSHOT.jar
import ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier

val est = new XGBoostClassifier()  // <-- Fails here

Root Cause Analysis

In Spark 4.0.0, the primary constructor of org.apache.spark.ml.param.Param was changed to include a ClassTag context bound (SPARK-51217):

// Spark 3.x
class Param[T](val parent: String, val name: String, val doc: String, val isValid: T => Boolean)

// Spark 4.0.0 - added ClassTag context bound, which changes the JVM constructor signature
class Param[T: ClassTag](val parent: String, val name: String, val doc: String, val isValid: T => Boolean)

The ClassTag context bound adds an implicit ClassTag[T] parameter to the JVM constructor, changing the bytecode signature from <init>(String, String, String, Function1)V to <init>(String, String, String, Function1, ClassTag)V. This also affects all auxiliary constructors including this(parent: Identifiable, ...).

XGBoost's TreeBoosterParams (and other param traits) create params like:

final val samplingMethod = new Param[String](this, "sampling_method", "...",
    ParamValidators.inArray(Array("uniform", "gradient_based")))

This calls the Param(Identifiable, String, String, Function1) constructor which no longer exists at the bytecode level in Spark 4.0.0.

This was acknowledged as a Spark bug and fixed in SPARK-52259, included in Spark 4.0.1 (released 2025-09-02). The fix removes the ClassTag context bound and restores the original constructor signatures.

Spark 4.1.0 Test Detail

Using the same XGBoost JAR (built against Spark 3.5) on Spark 4.1.0, XGBoostClassifier trained a multi-class classification model on the Iris dataset (100 rounds, 4 workers) and produced correct predictions:

scala> val est = new XGBoostClassifier().setNumWorkers(4).setNumRound(100).setLabelCol(labelCol)
val est: ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier = xgbc_0e9d45a85333

scala> val model = est.fit(trainDataset)
[17:00:50] [0]  train-mlogloss:0.73157925999933671
...
[17:00:50] [99] train-mlogloss:0.01364579657468224
val model: ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel = xgbc_0e9d45a85333

scala> model.transform(validationDataset).show()
+------------+-----------+------------+-----------+-----+-----------------+--------------------+--------------------+----------+
|sepal_length|sepal_width|petal_length|petal_width|class|         features|       rawPrediction|         probability|prediction|
+------------+-----------+------------+-----------+-----+-----------------+--------------------+--------------------+----------+
|         4.5|        2.3|         1.3|        0.3|  0.0|[4.5,2.3,1.3,0.3]|[2.66192507743835...|[0.99191308021545...|       0.0|
|         4.8|        3.1|         1.6|        0.2|  0.0|[4.8,3.1,1.6,0.2]|[2.66192507743835...|[0.99287110567092...|       0.0|
...

Request

  1. Document Spark 4.0 support status. Currently XGBoost builds against Spark 3.5.3 (spark.version in pom.xml). Our testing confirms the JAR works on Spark 4.1.0 (and should work on 4.0.1+), but users need official clarity on whether this is supported or if a dedicated Spark 4.0 build profile is needed.

  2. Consider adding a Spark 4.0 CI build to catch future incompatibilities, as Spark 4.0 introduces other changes (Java 17 minimum, Scala 2.13 only, Spark Connect ML, etc.).

  3. Warn users about Spark 4.0.0 specifically. Users must use Spark 4.0.1+ (or 4.1.0+) due to the upstream SPARK-52259 bug. Spark 4.0.0 is broken for all third-party ML libraries that use Param.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions