问题描述
我正在尝试将 jar 自动包含到我的 PySpark 类路径中.现在我可以输入以下命令并且它可以工作:
I'm trying to automatically include jars to my PySpark classpath. Right now I can type the following command and it works:
$ pyspark --jars /path/to/my.jar
我希望默认包含该 jar,以便我只能输入 pyspark
并在 IPython Notebook 中使用它.
I'd like to have that jar included by default so that I can only type pyspark
and also use it in IPython Notebook.
我读到我可以通过在 env 中设置 PYSPARK_SUBMIT_ARGS 来包含参数:
I've read that I can include the argument by setting PYSPARK_SUBMIT_ARGS in env:
export PYSPARK_SUBMIT_ARGS="--jars /path/to/my.jar"
不幸的是,上述方法不起作用.我收到运行时错误 Failed to load class for data source
.
Unfortunately the above doesn't work. I get the runtime error Failed to load class for data source
.
运行 Spark 1.3.1.
Running Spark 1.3.1.
编辑
我在使用 IPython Notebook 时的解决方法如下:
My workaround when using IPython Notebook is the following:
$ IPYTHON_OPTS="notebook" pyspark --jars /path/to/my.jar
推荐答案
您可以在 spark-defaults.conf 文件(位于 Spark 安装的 conf 文件夹中)中添加 jar 文件.如果 jars 列表中有多个条目,请使用 : 作为分隔符.
You can add the jar files in the spark-defaults.conf file (located in the conf folder of your spark installation). If there is more than one entry in the jars list, use : as separator.
spark.driver.extraClassPath /path/to/my.jar
此属性记录在 https://spark.apache.org/docs/1.3.1/configuration.html#runtime-environment
这篇关于自动将 jars 包含到 PySpark 类路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!