问题描述
我有一个RStudio驱动程序实例,该实例已连接到Spark群集.我想知道是否有任何方法可以使用外部配置文件从RStudio实际连接到Spark集群,该文件可以指定执行程序的数量,内存和其他Spark参数.我知道我们可以使用以下命令来做到这一点
I have an RStudio driver instance which is connected to a Spark Cluster. I wanted to know if there is any way to actually connect to Spark cluster from RStudio using an external configuration file which can specify the number of executors, memory and other spark parameters. I know we can do it using the below command
sparkR.session(sparkConfig = list(spark.cores.max='2',spark.executor.memory = '8g'))
我正在专门寻找一种从外部文件获取spark参数以启动sparkR会话的方法.
I am specifically looking for a method which takes spark parameters from an external file to start the sparkR session.
推荐答案
Spark使用标准化的配置布局,其中 spark-defaults.conf
用于指定配置选项.该文件应位于以下目录之一:
Spark uses standardized configuration layout with spark-defaults.conf
used for specifying configuration option. This file should be located in one of the following directories:
-
SPARK_HOME/conf
-
SPARK_CONF_DIR
所有您需要做的就是配置 SPARK_HOME
或 SPARK_CONF_DIR
环境变量,然后在其中进行配置.
All you have to do is to configure SPARK_HOME
or SPARK_CONF_DIR
environment variables and put configuration there.
每个Spark安装随附模板文件,您可以以此为灵感.
Each Spark installation comes with template files you can use as an inspiration.
这篇关于使用外部配置文件启动SparkR会话的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!