本文介绍了在PyCharm IDE中添加Spark软件包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我已按照
或使用 os.environ
直接在您的代码中显示,如在pyspark代码内加载外部库
I have set up my PyCharm to link with my local spark installation as per in this link
from pyspark import SparkContext, SQLContext, SparkConf
from operator import add
conf = SparkConf()
conf.setMaster("spark://localhost:7077")
conf.setAppName("Test")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
df = sqlContext.createDataFrame([(2012, 8, "Batman", 9.8), (2012, 8, "Hero", 8.7), (2012, 7, "Robot", 5.5), (2011, 7, "Git", 2.0)],["year", "month", "title", "rating"])
df.write.mode('overwrite').format("com.databricks.spark.avro").save("file:///Users/abhattac/PycharmProjects/WordCount/users")
This requires Databrick's avro jar to be shipped to worker node. I can get it done using spark-submit from shell like the following:
/usr/local/Cellar/apache-spark/1.6.1/bin/pyspark AvroFile.py --packages com.databricks:spark-avro_2.10:2.0.1
I couldn't find out how to provide --packages option when I am running it from inside PyCharm IDE. Any help will be appreciated.
解决方案
You can use Python PYSPARK_SUBMIT_ARGS
environment variable, either by passing it using environment variables section of the PyCharm run configuration (the same place where you set SPARK_HOME
)
or using os.environ
directly in your code as shown in load external libraries inside pyspark code
这篇关于在PyCharm IDE中添加Spark软件包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!