问题描述
鼓励我试用spark-csv在.Pypark中阅读.csv文件
我发现了一些帖子,如
This page was inspiring me to try out spark-csv for reading .csv file in PysparkI found a couple of posts such as this describing how to use spark-csv
但是我不能通过包含.jar文件或包扩展来初始化ipython实例
But I am not able to initialize the ipython instance by including either the .jar file or package extension in the start-up that could be done through spark-shell.
也就是说,而不是 ipython notebook --profile = pyspark
,我尝试了 ipython notebook --profile = pyspark --packages com.databricks:spark-csv_2.10:1.0.3
但不支持。
That is, instead of ipython notebook --profile=pyspark
, I tried out ipython notebook --profile=pyspark --packages com.databricks:spark-csv_2.10:1.0.3
but it is not supported.
请指教。
推荐答案
PYSPARK_SUBMIT_ARGS
变量。例如:
export PACKAGES="com.databricks:spark-csv_2.11:1.3.0"
export PYSPARK_SUBMIT_ARGS="--packages ${PACKAGES} pyspark-shell"
这些属性也可以在你的代码中动态设置 SparkContext
/ SparkSession
和相应的JVM已经启动:
These property can be also set dynamically in your code before SparkContext
/ SparkSession
and corresponding JVM have been started:
packages = "com.databricks:spark-csv_2.11:1.3.0"
os.environ["PYSPARK_SUBMIT_ARGS"] = (
"--packages {0} pyspark-shell".format(packages)
)
这篇关于如何在IPython Notebook中加载jar依赖的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!