本文介绍了NameError:名称"spark"未定义,如何解决?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚在ubuntu18.04笔记本电脑中安装了pyspark2.4.5,当我运行以下代码时,

I have just installed pyspark2.4.5 in my ubuntu18.04 laptop, and when I run following codes,

#this is a part of the code.
import pubmed_parser as pp
from pyspark.sql import SparkSession
from pyspark.sql import Row

medline_files_rdd = spark.sparkContext.parallelize(glob('/mnt/hgfs/ShareDir/data/*.gz'), numSlices=1000)
parse_results_rdd = medline_files_rdd.\
    flatMap(lambda x: [Row(file_name=os.path.basename(x), **publication_dict)
                       for publication_dict in pp.parse_medline_xml(x)])

medline_df = parse_results_rdd.toDF()
# save to parquet
medline_df.write.parquet('raw_medline.parquet', mode='overwrite')


medline_df = spark.read.parquet('raw_medline.parquet')

我收到这样的错误,

medline_files_rdd = spark.sparkContext.parallelize(glob('/mnt/hgfs/ShareDir/data/*.gz'), numSlices=1000)
NameError: name 'spark' is not defined

我在StackOverflow上看到了类似的问题,但所有问题都无法解决我的问题.有人能帮助我吗?非常感谢.

I have seen similiar questions on StackOverflow, but all of them can not solve my problem.Does anyone can help me?Thanks a lot.

顺便说一句,我是spark的新手,如果我只想在Python中使用spark,它的作用足以我通过使用来安装pyspark pip install pyspark 吗?还有其他我该怎么办?我应该安装Hadoop还是其他?

By the way, I am new in spark, if I just want to use spark in Python, does it enough that I just install pyspark by usingpip install pyspark ? any others should I do? Should I install Hadoop or others?

推荐答案

只需在开始时创建spark会话

Just create spark session in the starting

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('abc').getOrCreate()

这篇关于NameError:名称"spark"未定义,如何解决?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-12 14:21