Creating a SparkSession always fails with not enough memory... unless Docker has enough memory

### TL;DR

Docker on the Mac defaults to running with 2GB. For Windows, it might be 1GB. With 2GB, the JVM for BeakerX notebooks appears to be launched with ~460MB (< 25%) of the available memory, which is just below what Spark considers the minimum value, ~470MB. The workaround appears to be increasing the memory for Docker.

The ~470MB number is actually hard coded in Spark! (see below), so this isn't really a BeakerX bug, per se, but adding a warning to the Spark notebooks or other doc update might be helpful. If there's a more elegant solution possible in BeakerX, then even better. Note that I haven't observed this issue with the Jupyter's project's Docker image `jupyter/all-spark-notebook`. I could be they are setting `spark.testing` testing properties, which circumvent the 470MB threshold...

### Details

Mac laptop, with macOS High Sierra and using the simple Spark notebook provided with BeakerX. When evaluating this cell, where Mac Docker environment is set to use up to 2GB:

```
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
                        .appName("Simple Application")
                        .master("local[4]")
                        .config("spark.ui.enabled", "false")
                        .getOrCreate()
```

... it throws the following:

```
java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.
  at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:217)
  at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:199)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:332)
  at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
  at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:432)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910)
  ... 46 elided
```

Looking at the logic in `UnifiedMemoryManager`, it's looking for `1.5 * RESERVED_SYSTEM_MEMORY_BYTES`, which is `1.5*300*1024*1024 = 1.5*314572800 = 471859200`.

Logging into the running docker image `docker exec -it <image_id> bash` and running `top` shows that the "system" has 2GB of memory and `java` has about ~25% of it (but varies in size).

If I set Mac Docker to use 2.5GB or greater, then restart the BeakerX image, all works as expected.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating a SparkSession always fails with not enough memory... unless Docker has enough memory #7666

TL;DR

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Creating a SparkSession always fails with not enough memory... unless Docker has enough memory #7666

Description

TL;DR

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions