本文介绍了如何为apache spark worker更改每个节点的内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在配置一个 Apache Spark 集群.

I am configuring an Apache Spark cluster.

当我用 1 个主站和 3 个从站运行集群时,我在主监控页面上看到了这个:

When I run the cluster with 1 master and 3 slaves, I see this on the master monitor page:

Memory
2.0 GB (512.0 MB Used)
2.0 GB (512.0 MB Used)
6.0 GB (512.0 MB Used)

我想增加工作人员的已用内存,但我找不到合适的配置.我已将 spark-env.sh 更改如下:

I want to increase the used memory for the workers but I could not find the right config for this. I have changed spark-env.sh as below:

export SPARK_WORKER_MEMORY=6g
export SPARK_MEM=6g
export SPARK_DAEMON_MEMORY=6g
export SPARK_JAVA_OPTS="-Dspark.executor.memory=6g"
export JAVA_OPTS="-Xms6G -Xmx6G"

但是使用的内存还是一样.我应该怎么做才能改变已用内存?

But the used memory is still the same. What should I do to change used memory?

推荐答案

当使用 1.0.0+ 并使用 spark-shell 或 spark-submit 时,使用 --executor-memory 选项.例如

When using 1.0.0+ and using spark-shell or spark-submit, use the --executor-memory option. E.g.

spark-shell --executor-memory 8G ...

0.9.0 及以下:

当您开始工作或启动外壳程序时,请更改内存.我们必须修改 spark-shell 脚本,以便它将命令行参数作为底层 java 应用程序的参数传递.特别是:

When you start a job or start the shell change the memory. We had to modify the spark-shell script so that it would carry command line arguments through as arguments for the underlying java application. In particular:

OPTIONS="$@"
...
$FWDIR/bin/spark-class $OPTIONS org.apache.spark.repl.Main "$@"

然后我们可以按如下方式运行我们的 spark shell:

Then we can run our spark shell as follows:

spark-shell -Dspark.executor.memory=6g

在为独立 jar 配置它时,我在创建 spark 上下文之前以编程方式设置系统属性并将值作为命令行参数传入(我可以使它比冗长的系统道具更短).

When configuring it for a standalone jar, I set the system property programmatically before creating the spark context and pass the value in as a command line argument (I can make it shorter than the long winded system props then).

System.setProperty("spark.executor.memory", valueFromCommandLine)

至于更改默认集群范围,抱歉,不完全确定如何正确执行.

As for changing the default cluster wide, sorry, not entirely sure how to do it properly.

最后一点 - 我有点担心你有 2 个 2GB 的节点和一个 6GB 的节点.您可以使用的内存将仅限于最小节点 - 所以这里是 2GB.

One final point - I'm a little worried by the fact you have 2 nodes with 2GB and one with 6GB. The memory you can use will be limited to the smallest node - so here 2GB.

这篇关于如何为apache spark worker更改每个节点的内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 04:46