Skip to content

Compatibility with Spark 3.1.1 and Scala 2.12 #31

@JJorczik

Description

@JJorczik

Problem Description

I have tried to build and run this implementation on a Spark Cluster with version 3.1.1 and Scala version 2.12. I receive the following error when trying to read the example CSV-File with IOHelper. Do you have any idea how to fix this? Thank you in advance!

Code

import org.alitouka.spark.dbscan._
import org.alitouka.spark.dbscan.util.io._
val data = IOHelper.readDataset(sc, "/path/to/example.csv")
val clusteringSettings = new DbscanSettings().withEpsilon(25).withNumberOfPoints(30)
val model = Dbscan.train(data, clusteringSettings)

Stack Trace

org.apache.spark.SparkException: Task not serializable
  at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2465)
  at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
  at org.apache.spark.rdd.RDD.map(RDD.scala:421)
  at org.alitouka.spark.dbscan.util.io.IOHelper$.readDataset(IOHelper.scala:23)
  ... 44 elided
Caused by: java.io.NotSerializableException: org.alitouka.spark.dbscan.util.io.IOHelper$
Serialization stack:
	- object not serializable (class: org.alitouka.spark.dbscan.util.io.IOHelper$, value: org.alitouka.spark.dbscan.util.io.IOHelper$@7e66f6f6)
	- element of array (index: 0)
	- array (class [Ljava.lang.Object;, size 1)
	- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
	- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.alitouka.spark.dbscan.util.io.IOHelper$, functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic org/alitouka/spark/dbscan/util/io/IOHelper$.$anonfun$readDataset$1:(Lorg/alitouka/spark/dbscan/util/io/IOHelper$;Ljava/lang/String;)Lorg/alitouka/spark/dbscan/spatial/Point;, instantiatedMethodType=(Ljava/lang/String;)Lorg/alitouka/spark/dbscan/spatial/Point;, numCaptured=1])
	- writeReplace data (class: java.lang.invoke.SerializedLambda)
	- object (class org.alitouka.spark.dbscan.util.io.IOHelper$$$Lambda$2262/0x0000000840f0a840, org.alitouka.spark.dbscan.util.io.IOHelper$$$Lambda$2262/0x0000000840f0a840@5b9a19f5)
  at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
  at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
  at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
  at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:413)
  ... 53 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions