Problem Description
I have tried to build and run this implementation on a Spark Cluster with version 3.1.1 and Scala version 2.12. I receive the following error when trying to read the example CSV-File with IOHelper. Do you have any idea how to fix this? Thank you in advance!
Code
import org.alitouka.spark.dbscan._
import org.alitouka.spark.dbscan.util.io._
val data = IOHelper.readDataset(sc, "/path/to/example.csv")
val clusteringSettings = new DbscanSettings().withEpsilon(25).withNumberOfPoints(30)
val model = Dbscan.train(data, clusteringSettings)
Stack Trace
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2465)
at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
at org.apache.spark.rdd.RDD.map(RDD.scala:421)
at org.alitouka.spark.dbscan.util.io.IOHelper$.readDataset(IOHelper.scala:23)
... 44 elided
Caused by: java.io.NotSerializableException: org.alitouka.spark.dbscan.util.io.IOHelper$
Serialization stack:
- object not serializable (class: org.alitouka.spark.dbscan.util.io.IOHelper$, value: org.alitouka.spark.dbscan.util.io.IOHelper$@7e66f6f6)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 1)
- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.alitouka.spark.dbscan.util.io.IOHelper$, functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic org/alitouka/spark/dbscan/util/io/IOHelper$.$anonfun$readDataset$1:(Lorg/alitouka/spark/dbscan/util/io/IOHelper$;Ljava/lang/String;)Lorg/alitouka/spark/dbscan/spatial/Point;, instantiatedMethodType=(Ljava/lang/String;)Lorg/alitouka/spark/dbscan/spatial/Point;, numCaptured=1])
- writeReplace data (class: java.lang.invoke.SerializedLambda)
- object (class org.alitouka.spark.dbscan.util.io.IOHelper$$$Lambda$2262/0x0000000840f0a840, org.alitouka.spark.dbscan.util.io.IOHelper$$$Lambda$2262/0x0000000840f0a840@5b9a19f5)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:413)
... 53 more
Problem Description
I have tried to build and run this implementation on a Spark Cluster with version 3.1.1 and Scala version 2.12. I receive the following error when trying to read the example CSV-File with IOHelper. Do you have any idea how to fix this? Thank you in advance!
Code
Stack Trace