问题描述
Spark的新手.一切都很好下载,但是当我运行pyspark时,出现以下错误:
New to Spark. Downloaded everything alright but when I run pyspark I get the following errors:
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/02/05 20:46:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\bin\..\python\pyspark\shell.py", line 43, in <module>
spark = SparkSession.builder\
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\python\pyspark\sql\session.py", line 179, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\python\pyspark\sql\utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
此外,当我尝试时(如 http://spark所建议. apache.org/docs/latest/quick-start.html )
Also, when I try (as recommended by http://spark.apache.org/docs/latest/quick-start.html)
textFile = sc.textFile("README.md")
我得到:
NameError: name 'sc' is not defined
有什么建议吗?谢谢!
推荐答案
您似乎已在上述答案中找到了问题第二部分的答案,但对于将来的用户,您会通过'org.apache.spark.sql.hive.HiveSessionState'
错误到达此处,此类可在spark-hive jar文件中找到,如果不是使用Hive构建的,则该文件不会与Spark捆绑在一起.
It looks like you've found the answer to the second part of your question in the above answer, but for future users getting here via the 'org.apache.spark.sql.hive.HiveSessionState'
error, this class is found in the spark-hive jar file, which does not come bundled with Spark if it isn't built with Hive.
您可以在以下位置获得此罐子:
You can get this jar at:
http://central.maven.org/maven2/org/apache/spark/spark-hive_${SCALA_VERSION}/${SPARK_VERSION}/spark-hive_${SCALA_VERSION}-${SPARK_VERSION}.jar
您必须将其放入SPARK_HOME/jars
文件夹中,然后Spark应该能够找到所需的所有Hive类.
You'll have to put it into your SPARK_HOME/jars
folder, and then Spark should be able to find all of the Hive classes required.
这篇关于带有Python的Apache Spark:错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!