Skip to content

Creating a SparkSession always fails with not enough memory... unless Docker has enough memory #7666

@deanwampler

Description

@deanwampler

TL;DR

Docker on the Mac defaults to running with 2GB. For Windows, it might be 1GB. With 2GB, the JVM for BeakerX notebooks appears to be launched with ~460MB (< 25%) of the available memory, which is just below what Spark considers the minimum value, ~470MB. The workaround appears to be increasing the memory for Docker.

The ~470MB number is actually hard coded in Spark! (see below), so this isn't really a BeakerX bug, per se, but adding a warning to the Spark notebooks or other doc update might be helpful. If there's a more elegant solution possible in BeakerX, then even better. Note that I haven't observed this issue with the Jupyter's project's Docker image jupyter/all-spark-notebook. I could be they are setting spark.testing testing properties, which circumvent the 470MB threshold...

Details

Mac laptop, with macOS High Sierra and using the simple Spark notebook provided with BeakerX. When evaluating this cell, where Mac Docker environment is set to use up to 2GB:

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
                        .appName("Simple Application")
                        .master("local[4]")
                        .config("spark.ui.enabled", "false")
                        .getOrCreate()

... it throws the following:

java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.
  at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:217)
  at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:199)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:332)
  at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
  at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:432)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910)
  ... 46 elided

Looking at the logic in UnifiedMemoryManager, it's looking for 1.5 * RESERVED_SYSTEM_MEMORY_BYTES, which is 1.5*300*1024*1024 = 1.5*314572800 = 471859200.

Logging into the running docker image docker exec -it <image_id> bash and running top shows that the "system" has 2GB of memory and java has about ~25% of it (but varies in size).

If I set Mac Docker to use 2.5GB or greater, then restart the BeakerX image, all works as expected.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions