PySpark安装错误

本文介绍了PySpark安装错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我遵循各种博客文章的指示，包括，和在我的笔记本电脑上安装pyspark。但是，当我尝试从终端或jupyter笔记本使用pyspark时，我总是收到以下错误消息。

我已经安装了所有必要的软件，如问题底部所示。 / p>

我已将以下内容添加到我的 .bashrc

函数sjupyter_init（）
 {
＃将anaconda3设置为python 
 export PATH =〜/ anaconda3 / bin：$ PATH 
 
 #Spark路径（基于您的计算机）
 SPARK_HOME = / opt / spark 
 export PATH = $ SPARK_HOME：$ PATH 
 
 export PYTHONPATH = $ SPARK_HOME / python：/ home / khurram / anaconda3 / bin中/ python3 
出口PYSPARK_DRIVER_PYTHON = jupyter 
出口PYSPARK_DRIVER_PYTHON_OPTS = 笔记本 
出口PYSPARK_PYTHON = python3 
}

我执行 sjupyter_init 然后是 jupyter notebook 从终端推出带有pyspark的jupyter笔记本。

在笔记本中，我执行以下内容而不会出错
$ b $ $ p $ import findspark findpark.init（'/ opt / spark'） from pyspark.sql import SparkSession

但是当我执行下面的命令时：

  spark = SparkSession.builder.appName（test）。getOrCreate（）

它导致这个错误信息

 使用Spark的默认log4j配置文件：org / apache / spark / log4j-defaults.properties 
将默认日志级别设置为WARN。 
要调整日志记录级别，请使用sc.setLogLevel（newLevel）。对于SparkR，使用setLogLevel（newLevel）。 
 18/01/20 17:10:06 WARN NativeCodeLoader：无法为您的平台加载native-hadoop库...在适用的情况下使用builtin-java类
 Traceback（最近调用最后一次）：
文件< stdin>，第1行，位于< module> 
在getOrCreate 
 sc = SparkContext.getOrCreate（sparkConf）
文件中/opt/spark/python/pyspark/sql/session.py，第173行，/ opt / spark / python / pyspark / context.py，第334行，在getOrCreate 
 SparkContext（conf = conf或SparkConf（））
文件/opt/spark/python/pyspark/context.py，第118行，在__init__ 
 conf，jsc，profiler_cls）
文件/opt/spark/python/pyspark/context.py，第180行，在_do_init 
 self._jsc = jsc或self。 _initialize_context（self._conf._jconf）
档 /opt/spark/python/pyspark/context.py，线路273，在_initialize_context 
返回self._jvm.JavaSparkContext（jconf）
文件/home/khurram/anaconda3/lib/python3.6/site-packages/py4j/java_gateway.py，第1428行，在__call__ 
 answer，self._gateway_client，None，self._fqn）
文件/home/khurram/anaconda3/lib/python3.6/site-packages/py4j/protocol.py，第320行，以get_return_value 
格式（target_id，。，name），值）
 py4j.protocol.Py4JJavaE rror：调用None.org.apache.spark.api.java.JavaSparkContext时发生错误。 
：java.lang.ExceptionInInitializerError 
 at org.apache.spark.SparkConf.validateSettings（SparkConf.scala：546）
 at org.apache.spark.SparkContext。< init>（SparkContext .scala：373）
。在org.apache.spark.api.java.JavaSparkContext< INIT>（JavaSparkContext.scala：58）
。在sun.reflect.NativeConstructorAccessorImpl.newInstance0（本机方法）
。在sun.reflect.NativeConstructorAccessorImpl.newInstance（NativeConstructorAccessorImpl.java:62）
处java.lang.reflect.Constructor中sun.reflect.DelegatingConstructorAccessorImpl.newInstance（DelegatingConstructorAccessorImpl.java:45）
。的newInstance（Constructor.java:423）
在py4j.reflection.MethodInvoker.invoke（MethodInvoker.java:247）
在py4j.reflection.ReflectionEngine.invoke（ReflectionEngine.java:357）
在py4j.Gateway.invoke（Gateway.java:236）
在py4j.commands.ConstructorCommand.invokeConstructor（ConstructorCommand。 java：80）
 at py4j.commands.ConstructorCommand.execute（ConstructorCommand.java:69）
 at py4j.GatewayConnection.run（GatewayConnection.java:214）
 at java.lang.Thread .run（Thread.java:748）
导致：java.net.UnknownHostException：linux-0he7：linux-0he7：名称或服务未知$ b $在java.net.InetAddress.getLocalHost（InetAddress。 
 at org.apache.spark.util.Utils $ .findLocalInetAddress（Utils.scala：891）
 at org.apache.spark.util.Utils $ .org $ apache $ spark $ util $ Utils $ localIpAddress $ lzycompute（Utils.scala：884）
 at org.apache.spark.util.Utils $ .org $ apache $ spark $ util $ Utils $$ localIpAddress（Utils.scala：884） 
在org.apache.spark.util.Utils $$ anonfun $ localHostName $ 1.apply（Utils.scala：941）
在org.apache.spark.util.Utils $$ anonfun $ localHostName $ 1。应用（Utils.scala：941）
 at scala.Option.getOrElse（Option.scala：121）
 at org.apache.spark.util.Utils $ .localHostName（Utils .scala：941）
 at org.apache.spark.internal.config.package $。< init>（package.scala：204）
 at org.apache.spark.internal.config.package $。< clinit>（package.scala）
 ... 14 more 
导致：java.net.UnknownHostException：linux-0he7：名称或服务未知$ b $在java.net .Inet6AddressImpl.lookupAllHostAddr（本机方法）
在java.net.InetAddress中的$ 2.lookupAllHostAddr（InetAddress.java:928）
在java.net.InetAddress.getAddressesFromNameService（InetAddress.java:1323）
在java.net.InetAddress.getLocalHost（InetAddress.java:1500）
 ... 23 more

我的操作系统的详细信息是

操作系统：

  OpenSuse Leap 42.2 64位

Java：

  khurram @ linux-0he7：〜> java -version 
 openjdk version1.8.0_151

Scala

  khurram @ linux-0he7：〜> scala -version 
 Scala代码运行器版本2.12.4  - 版权2002-2017，LAMP / EPFL和Lightbend，Inc.

Hadoop 3.0

  khurram @ linux-0he7：〜> echo $ HADOOP_HOME 
 / opt / hadoop

$ b Py4J p>

  khurram @ linux-0he7：〜> pip show py4j 
名称：py4j 
版本：0.10.6 
摘要：使Python程序能够动态访问任意Java对象
主页：https：//www.py4j。 org / 
作者：Barthelemy Dagenais 
作者信箱：barthelemy@infobart.com 
许可证：BSD许可证
地点：/home/khurram/anaconda3/lib/python3.6/ site-packages 
要求：
 khurram @ linux-0he7：〜>

我为<$执行了 chmod 777 c $ c> hadoop 和 spark 目录。

  khurram @ Linux的0he7：〜> ls -al / opt / 
总数8 
 drwxr-xr-x 1 root root 96 Jan 19 20:22。 
 drwxr-xr-x 1 root root 222 Jan 20 14:54 .. 
 lrwxrwxrwx 1 root root 18 Jan 19 20:22 hadoop  - > /opt/hadoop-3.0.0/ 
 drwxrwxrwx 1 khurram users 126 Dec 8 19:42 hadoop-3.0.0 
 lrwxrwxrwx 1 root root 30 Jan 19 19:40 spark  - > /opt/spark-2.2.1-bin-hadoop2.7 
 drwxrwxrwx 1 khurram users 150 Jan 19 19:33 spark-2.2.1-bin-hadoop2.7 
 khurram @ linux-0he7：〜 >

主机文件的内容

  khurram @ linux-0he7：> cat / etc / hosts 
 
 127.0.0.1 localhost 
 
＃特殊IPv6地址
 :: 1本地主机ipv6-localhost ipv6-loopback 
 
 fe00 :: 0 ipv6-localnet 
 
 ff00 :: 0 ipv6-mcastprefix 
 ff02 :: 1 ipv6-allnodes 
 ff02 :: 2 ipv6-allrouters 
 ff02 :: 3 ipv6-allhosts

解决方案

UnknownHostException 是

查看您的提示shell linux- 0he7 所以我假设你使用的是lo校准模式。这意味着您的 / etc / hosts 不包含 linux-0he7 。

添加
127.0.0.1 linux-0he7
到 / etc / hosts 应该可以解决问题。

您也可以使用 spark.driver.bindAddress 和 spark.driver.host 来使用特定主机IP为驱动程序。

目前还不支持Hadoop 3.0.0的异常。我会建议暂时使用2.x。

I have followed instructions from various blogs posts including this, this, this and this to install pyspark on my laptop. However when I try to use pyspark from terminal or jupyter notebook I keep getting following error.
I have installed all the necessary software as shown at the bottom of the question.
I have added the following to my .bashrc
function sjupyter_init() { #Set anaconda3 as python export PATH=~/anaconda3/bin:$PATH #Spark path (based on your computer) SPARK_HOME=/opt/spark export PATH=$SPARK_HOME:$PATH export PYTHONPATH=$SPARK_HOME/python:/home/khurram/anaconda3/bin/python3 export PYSPARK_DRIVER_PYTHON="jupyter" export PYSPARK_DRIVER_PYTHON_OPTS="notebook" export PYSPARK_PYTHON=python3 }
I execute sjupyter_init followed by jupyter notebook from terminal to launch jupyter notebooks with pyspark.
In a notebook I execute the following without error
import findspark findspark.init('/opt/spark') from pyspark.sql import SparkSession
But when I execute below line
spark = SparkSession.builder.appName("test").getOrCreate()
It results in this error message
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 18/01/20 17:10:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/spark/python/pyspark/sql/session.py", line 173, in getOrCreate sc = SparkContext.getOrCreate(sparkConf) File "/opt/spark/python/pyspark/context.py", line 334, in getOrCreate SparkContext(conf=conf or SparkConf()) File "/opt/spark/python/pyspark/context.py", line 118, in __init__ conf, jsc, profiler_cls) File "/opt/spark/python/pyspark/context.py", line 180, in _do_init self._jsc = jsc or self._initialize_context(self._conf._jconf) File "/opt/spark/python/pyspark/context.py", line 273, in _initialize_context return self._jvm.JavaSparkContext(jconf) File "/home/khurram/anaconda3/lib/python3.6/site-packages/py4j/java_gateway.py", line 1428, in __call__ answer, self._gateway_client, None, self._fqn) File "/home/khurram/anaconda3/lib/python3.6/site-packages/py4j/protocol.py", line 320, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.ExceptionInInitializerError at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:546) at org.apache.spark.SparkContext.<init>(SparkContext.scala:373) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:236) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: linux-0he7: linux-0he7: Name or service not known at java.net.InetAddress.getLocalHost(InetAddress.java:1505) at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:891) at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress$lzycompute(Utils.scala:884) at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress(Utils.scala:884) at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:941) at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:941) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.util.Utils$.localHostName(Utils.scala:941) at org.apache.spark.internal.config.package$.<init>(package.scala:204) at org.apache.spark.internal.config.package$.<clinit>(package.scala) ... 14 more Caused by: java.net.UnknownHostException: linux-0he7: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at java.net.InetAddress.getLocalHost(InetAddress.java:1500) ... 23 more
My OS details are
OS:
OpenSuse Leap 42.2 64-bit
Java:
khurram@linux-0he7:~> java -version openjdk version "1.8.0_151"
Scala
khurram@linux-0he7:~> scala -version Scala code runner version 2.12.4 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.
Hadoop 3.0
khurram@linux-0he7:~> echo $HADOOP_HOME /opt/hadoop
Py4J
khurram@linux-0he7:~> pip show py4j Name: py4j Version: 0.10.6 Summary: Enables Python programs to dynamically access arbitrary Java objects Home-page: https://www.py4j.org/ Author: Barthelemy Dagenais Author-email: barthelemy@infobart.com License: BSD License Location: /home/khurram/anaconda3/lib/python3.6/site-packages Requires: khurram@linux-0he7:~>
I have executed chmod 777 for hadoop and spark directories.
khurram@linux-0he7:~> ls -al /opt/ total 8 drwxr-xr-x 1 root root 96 Jan 19 20:22 . drwxr-xr-x 1 root root 222 Jan 20 14:54 .. lrwxrwxrwx 1 root root 18 Jan 19 20:22 hadoop -> /opt/hadoop-3.0.0/ drwxrwxrwx 1 khurram users 126 Dec 8 19:42 hadoop-3.0.0 lrwxrwxrwx 1 root root 30 Jan 19 19:40 spark -> /opt/spark-2.2.1-bin-hadoop2.7 drwxrwxrwx 1 khurram users 150 Jan 19 19:33 spark-2.2.1-bin-hadoop2.7 khurram@linux-0he7:~>
Contents of hosts file
khurram@linux-0he7:> cat /etc/hosts 127.0.0.1 localhost # special IPv6 addresses ::1 localhost ipv6-localhost ipv6-loopback fe00::0 ipv6-localnet ff00::0 ipv6-mcastprefix ff02::1 ipv6-allnodes ff02::2 ipv6-allrouters ff02::3 ipv6-allhosts
解决方案
UnknownHostException is
and it is thrown at the bottom of your stack trace:
Looking at your prompt shell linux-0he7 so I assume you're using local mode. This means that your /etc/hosts doesn't include linux-0he7.
Adding
127.0.0.1 linux-0he7
to /etc/hosts should resolve the problem.
You can also use spark.driver.bindAddress and spark.driver.host to use specific host IP for the driver.
Independent of the exception Hadoop 3.0.0 is not supported yet. I would recommend using 2.x for the time being.

这篇关于PySpark安装错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！