TL;DR
Docker on the Mac defaults to running with 2GB. For Windows, it might be 1GB. With 2GB, the JVM for BeakerX notebooks appears to be launched with ~460MB (< 25%) of the available memory, which is just below what Spark considers the minimum value, ~470MB. The workaround appears to be increasing the memory for Docker.
The ~470MB number is actually hard coded in Spark! (see below), so this isn't really a BeakerX bug, per se, but adding a warning to the Spark notebooks or other doc update might be helpful. If there's a more elegant solution possible in BeakerX, then even better. Note that I haven't observed this issue with the Jupyter's project's Docker image jupyter/all-spark-notebook. I could be they are setting spark.testing testing properties, which circumvent the 470MB threshold...
Details
Mac laptop, with macOS High Sierra and using the simple Spark notebook provided with BeakerX. When evaluating this cell, where Mac Docker environment is set to use up to 2GB:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
.appName("Simple Application")
.master("local[4]")
.config("spark.ui.enabled", "false")
.getOrCreate()
... it throws the following:
java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.
at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:217)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:199)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:332)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:432)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910)
... 46 elided
Looking at the logic in UnifiedMemoryManager, it's looking for 1.5 * RESERVED_SYSTEM_MEMORY_BYTES, which is 1.5*300*1024*1024 = 1.5*314572800 = 471859200.
Logging into the running docker image docker exec -it <image_id> bash and running top shows that the "system" has 2GB of memory and java has about ~25% of it (but varies in size).
If I set Mac Docker to use 2.5GB or greater, then restart the BeakerX image, all works as expected.
TL;DR
Docker on the Mac defaults to running with 2GB. For Windows, it might be 1GB. With 2GB, the JVM for BeakerX notebooks appears to be launched with ~460MB (< 25%) of the available memory, which is just below what Spark considers the minimum value, ~470MB. The workaround appears to be increasing the memory for Docker.
The ~470MB number is actually hard coded in Spark! (see below), so this isn't really a BeakerX bug, per se, but adding a warning to the Spark notebooks or other doc update might be helpful. If there's a more elegant solution possible in BeakerX, then even better. Note that I haven't observed this issue with the Jupyter's project's Docker image
jupyter/all-spark-notebook. I could be they are settingspark.testingtesting properties, which circumvent the 470MB threshold...Details
Mac laptop, with macOS High Sierra and using the simple Spark notebook provided with BeakerX. When evaluating this cell, where Mac Docker environment is set to use up to 2GB:
... it throws the following:
Looking at the logic in
UnifiedMemoryManager, it's looking for1.5 * RESERVED_SYSTEM_MEMORY_BYTES, which is1.5*300*1024*1024 = 1.5*314572800 = 471859200.Logging into the running docker image
docker exec -it <image_id> bashand runningtopshows that the "system" has 2GB of memory andjavahas about ~25% of it (but varies in size).If I set Mac Docker to use 2.5GB or greater, then restart the BeakerX image, all works as expected.