-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
[jvm-packages] XGBoost4j-Spark fails on Spark 4.0 with NoSuchMethodError in Param constructor #12135
Description
Description
XGBoost4j-Spark (3.3.0-SNAPSHOT, Scala 2.13 build) fails immediately when used with Apache Spark 4.0.0 due to a binary incompatibility in org.apache.spark.ml.param.Param. The issue is resolved when running against Spark 4.1.0 (which includes the upstream fix SPARK-52259).
Environment
- XGBoost:
xgboost4j-spark-gpu_2.13-3.3.0-SNAPSHOT(built 2025-03-31) - OS: Linux (Ubuntu)
Test Results
| Spark Version | Scala | Result | Notes |
|---|---|---|---|
| 4.0.0 | 2.13.16 | FAIL | NoSuchMethodError on Param constructor (SPARK-52259) |
| 4.1.0 | 2.13.17 | PASS | XGBoostClassifier trains and predicts correctly |
Error
Instantiating XGBoostClassifier fails with:
java.lang.NoSuchMethodError: 'void org.apache.spark.ml.param.Param.<init>(org.apache.spark.ml.util.Identifiable, java.lang.String, java.lang.String, scala.Function1)'
at ml.dmlc.xgboost4j.scala.spark.params.TreeBoosterParams.$init$(TreeBoosterParams.scala:89)
Steps to Reproduce
// In spark-shell (Spark 4.0.0) with --jars xgboost4j-spark-gpu_2.13-3.3.0-SNAPSHOT.jar
import ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier
val est = new XGBoostClassifier() // <-- Fails hereRoot Cause Analysis
In Spark 4.0.0, the primary constructor of org.apache.spark.ml.param.Param was changed to include a ClassTag context bound (SPARK-51217):
// Spark 3.x
class Param[T](val parent: String, val name: String, val doc: String, val isValid: T => Boolean)
// Spark 4.0.0 - added ClassTag context bound, which changes the JVM constructor signature
class Param[T: ClassTag](val parent: String, val name: String, val doc: String, val isValid: T => Boolean)The ClassTag context bound adds an implicit ClassTag[T] parameter to the JVM constructor, changing the bytecode signature from <init>(String, String, String, Function1)V to <init>(String, String, String, Function1, ClassTag)V. This also affects all auxiliary constructors including this(parent: Identifiable, ...).
XGBoost's TreeBoosterParams (and other param traits) create params like:
final val samplingMethod = new Param[String](this, "sampling_method", "...",
ParamValidators.inArray(Array("uniform", "gradient_based")))This calls the Param(Identifiable, String, String, Function1) constructor which no longer exists at the bytecode level in Spark 4.0.0.
This was acknowledged as a Spark bug and fixed in SPARK-52259, included in Spark 4.0.1 (released 2025-09-02). The fix removes the ClassTag context bound and restores the original constructor signatures.
Spark 4.1.0 Test Detail
Using the same XGBoost JAR (built against Spark 3.5) on Spark 4.1.0, XGBoostClassifier trained a multi-class classification model on the Iris dataset (100 rounds, 4 workers) and produced correct predictions:
scala> val est = new XGBoostClassifier().setNumWorkers(4).setNumRound(100).setLabelCol(labelCol)
val est: ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier = xgbc_0e9d45a85333
scala> val model = est.fit(trainDataset)
[17:00:50] [0] train-mlogloss:0.73157925999933671
...
[17:00:50] [99] train-mlogloss:0.01364579657468224
val model: ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel = xgbc_0e9d45a85333
scala> model.transform(validationDataset).show()
+------------+-----------+------------+-----------+-----+-----------------+--------------------+--------------------+----------+
|sepal_length|sepal_width|petal_length|petal_width|class| features| rawPrediction| probability|prediction|
+------------+-----------+------------+-----------+-----+-----------------+--------------------+--------------------+----------+
| 4.5| 2.3| 1.3| 0.3| 0.0|[4.5,2.3,1.3,0.3]|[2.66192507743835...|[0.99191308021545...| 0.0|
| 4.8| 3.1| 1.6| 0.2| 0.0|[4.8,3.1,1.6,0.2]|[2.66192507743835...|[0.99287110567092...| 0.0|
...
Request
-
Document Spark 4.0 support status. Currently XGBoost builds against Spark 3.5.3 (
spark.versioninpom.xml). Our testing confirms the JAR works on Spark 4.1.0 (and should work on 4.0.1+), but users need official clarity on whether this is supported or if a dedicated Spark 4.0 build profile is needed. -
Consider adding a Spark 4.0 CI build to catch future incompatibilities, as Spark 4.0 introduces other changes (Java 17 minimum, Scala 2.13 only, Spark Connect ML, etc.).
-
Warn users about Spark 4.0.0 specifically. Users must use Spark 4.0.1+ (or 4.1.0+) due to the upstream SPARK-52259 bug. Spark 4.0.0 is broken for all third-party ML libraries that use
Param.