如何使用pyspark在Spark 2.0中构建sparkSession?

本文介绍了如何使用pyspark在Spark 2.0中构建sparkSession?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚刚可以访问spark 2.0；到目前为止，我一直在使用spark 1.6.1.有人可以帮我使用pyspark(python)设置sparkSession吗?我知道在线上提供的Scala示例是相似的(这里)，但我希望能直接使用python语言进行演练.

I just got access to spark 2.0; I have been using spark 1.6.1 up until this point. Can someone please help me set up a sparkSession using pyspark (python)? I know that the scala examples available online are similar (here), but I was hoping for a direct walkthrough in python language.

我的具体情况:我正在从齐柏林飞艇(Seppelin)Spark笔记本中的S3加载avro文件.然后构建df并运行各种pyspark& sql查询了他们.我所有的旧查询都使用sqlContext.我知道这是不好的做法，但是我用

My specific case: I am loading in avro files from S3 in a zeppelin spark notebook. Then building df's and running various pyspark & sql queries off of them. All of my old queries use sqlContext. I know this is poor practice, but I started my notebook with

sqlContext = SparkSession.builder.enableHiveSupport().getOrCreate().

我可以使用

mydata = sqlContext.read.format("com.databricks.spark.avro").load("s3:...

并建立没有问题的数据框.但是，一旦我开始查询数据帧/临时表，就会不断收到"java.lang.NullPointerException"错误.我认为这表明存在翻译错误(例如，旧查询在1.6.1中有效，但需要针对2.0进行调整).无论查询类型如何，都会发生该错误.所以我假设

and build dataframes with no issues. But once I start querying the dataframes/temp tables, I keep getting the "java.lang.NullPointerException" error. I think that is indicative of a translational error (e.g. old queries worked in 1.6.1 but need to be tweaked for 2.0). The error occurs regardless of query type. So I am assuming

1.)sqlContext别名是个坏主意

1.) the sqlContext alias is a bad idea

和

2.)我需要正确设置sparkSession.

2.) I need to properly set up a sparkSession.

因此，如果有人可以向我展示如何完成此工作，或者可以解释他们知道的不同版本的spark之间的差异，我将不胜感激.请让我知道是否需要详细说明这个问题.我很抱歉，如果它令人费解.

So if someone could show me how this is done, or perhaps explain the discrepancies they know of between the different versions of spark, I would greatly appreciate it. Please let me know if I need to elaborate on this question. I apologize if it is convoluted.

0中构建sparkSession

如何使用pyspark在Spark 2.0中构建sparkSession?

问题描述

推荐答案