问题描述
如何增加可用于 Apache Spark 执行器节点的内存?
How can I increase the memory available for Apache spark executor nodes?
我有一个适合加载到 Apache Spark 的 2 GB 文件.我目前在 1 台机器上运行 apache spark,所以驱动程序和执行程序在同一台机器上.该机器有 8 GB 的内存.
I have a 2 GB file that is suitable to loading in to Apache Spark. I am running apache spark for the moment on 1 machine, so the driver and executor are on the same machine. The machine has 8 GB of memory.
当我在将文件设置为缓存在内存中后尝试计算文件的行数时,出现以下错误:
When I try count the lines of the file after setting the file to be cached in memory I get these errors:
2014-10-25 22:25:12 WARN CacheManager:71 - Not enough space to cache partition rdd_1_1 in memory! Free memory is 278099801 bytes.
我查看了文档 here 并设置了 spark.executor.memory
到 4g
在 $SPARK_HOME/conf/spark-defaults.conf
I looked at the documentation here and set spark.executor.memory
to 4g
in $SPARK_HOME/conf/spark-defaults.conf
UI 显示此变量是在 Spark 环境中设置的.您可以在此处
The UI shows this variable is set in the Spark Environment. You can find screenshot here
但是,当我转到 Executor 选项卡 时,内存我的单个 Executor 的限制仍然设置为 265.4 MB.我也仍然得到同样的错误.
However when I go to the Executor tab the memory limit for my single Executor is still set to 265.4 MB. I also still get the same error.
我尝试了这里提到的各种事情 但我仍然收到错误消息,并且不清楚应该在哪里更改设置.
I tried various things mentioned here but I still get the error and don't have a clear idea where I should change the setting.
我正在从 spark-shell 交互式地运行我的代码
I am running my code interactively from the spark-shell
推荐答案
由于您在本地模式下运行 Spark,因此设置 spark.executor.memory
不会有任何效果,正如您所注意到的.这样做的原因是 Worker 在您启动 spark-shell 时启动的驱动程序 JVM 进程中存在",并且用于该进程的默认内存为 512M.您可以通过将 spark.driver.memory
设置为更高的值来增加它,例如 5g.您可以通过以下任一方式执行此操作:
Since you are running Spark in local mode, setting spark.executor.memory
won't have any effect, as you have noticed. The reason for this is that the Worker "lives" within the driver JVM process that you start when you start spark-shell and the default memory used for that is 512M. You can increase that by setting spark.driver.memory
to something higher, for example 5g. You can do that by either:
在属性文件中设置(默认为
$SPARK_HOME/conf/spark-defaults.conf
),
spark.driver.memory 5g
或者通过在运行时提供配置设置
or by supplying configuration setting at runtime
$ ./bin/spark-shell --driver-memory 5g
请注意,这不能通过在应用程序中设置来实现,因为到那时已经太晚了,该进程已经启动了一些内存.
Note that this cannot be achieved by setting it in the application, because it is already too late by then, the process has already started with some amount of memory.
265.4 MB 的原因是 Spark 将 spark.storage.memoryFraction * spark.storage.safetyFraction 用于存储内存总量,默认情况下它们是 0.6 和 0.9.
The reason for 265.4 MB is that Spark dedicates spark.storage.memoryFraction * spark.storage.safetyFraction to the total amount of storage memory and by default they are 0.6 and 0.9.
512 MB * 0.6 * 0.9 ~ 265.4 MB
因此请注意,并非所有驱动程序内存都可用于 RDD 存储.
So be aware that not the whole amount of driver memory will be available for RDD storage.
但是当您开始在集群上运行它时,spark.executor.memory
设置将在计算专用于 Spark 内存缓存的数量时接管.
But when you'll start running this on a cluster, the spark.executor.memory
setting will take over when calculating the amount to dedicate to Spark's memory cache.
这篇关于如何设置 Apache Spark Executor 内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!