问题描述
-
SparkContext,
JavaSparkContext,
SQLContext
和SparkSession
有什么区别? - 是否有使用
SparkSession
转换或创建上下文的方法? - 我可以使用一个单独的条目
SparkSession
完全替换所有上下文吗? -
SQLContext
,SparkContext
和JavaSparkContext
中的所有功能也都在SparkSession
中吗? - 某些功能,例如
parallelize
在SparkContext
和JavaSparkContext
中具有不同的行为.它们在SparkSession
中的表现如何? -
如何使用
SparkSession
创建以下内容?
- What is the difference between
SparkContext,
JavaSparkContext,
SQLContext
andSparkSession
? - Is there any method to convert or create a Context using a
SparkSession
? - Can I completely replace all the Contexts using one single entry
SparkSession
? - Are all the functions in
SQLContext
,SparkContext
, andJavaSparkContext
also inSparkSession
? - Some functions like
parallelize
have different behaviors inSparkContext
andJavaSparkContext
. How do they behave inSparkSession
? How can I create the following using a
SparkSession
?
-
RDD
-
JavaRDD
-
JavaPairRDD
-
Dataset
RDD
JavaRDD
JavaPairRDD
Dataset
有没有一种方法可以将JavaPairRDD
转换为Dataset
或Dataset
转换为JavaPairRDD
?
Is there a method to transform a JavaPairRDD
into a Dataset
or a Dataset
into a JavaPairRDD
?
推荐答案
sparkContext
是Scala实现的入口点,而JavaSparkContext
是sparkContext
的Java包装器.
sparkContext
is a Scala implementation entry point and JavaSparkContext
is a java wrapper of sparkContext
.
SQLContext
是SparkSQL的入口点,可以从sparkContext
接收.在2.xx之前,RDD,DataFrame和Data-set是三个不同的数据抽象.从Spark 2.xx开始,这三个数据抽象都是统一,SparkSession
是Spark的统一入口点.
SQLContext
is entry point of SparkSQL which can be received from sparkContext
.Prior to 2.x.x, RDD ,DataFrame and Data-set were three different data abstractions.Since Spark 2.x.x, All three data abstractions are unified and SparkSession
is the unified entry point of Spark.
另一个要点是,RDD表示非结构化数据,强类型数据,而DataFrames表示结构化和松散类型数据.您可以检查
An additional note is , RDD meant for unstructured data, strongly typed data and DataFrames are for structured and loosely typed data. You can check
是的.其sparkSession.sparkContext()
,对于SQL,为sparkSession.sqlContext()
yes. its sparkSession.sparkContext()
and for SQL, sparkSession.sqlContext()
是的.您可以从sparkSession中获取相应的顶点.
yes. you can get respective contexs from sparkSession.
不直接.您必须获得各自的上下文并加以利用.诸如向后兼容性之类的
Not directly. you got to get respective context and make use of it.something like backward compatibility
获取各自的上下文并加以利用.
get respective context and make use of it.
- RDD可以从
sparkSession.sparkContext.parallelize(???)
创建 - JavaRDD同样适用于此,但在Java实现中
- JavaPairRDD
sparkSession.sparkContext.parallelize(???).map(//making your data as key-value pair here is one way)
- sparkSession返回的数据集是结构化数据.
- RDD can be created from
sparkSession.sparkContext.parallelize(???)
- JavaRDD same applies with this but in java implementation
- JavaPairRDD
sparkSession.sparkContext.parallelize(???).map(//making your data as key-value pair here is one way)
- Dataset what sparkSession returns is Dataset if it is structured data.
这篇关于SparkContext,JavaSparkContext,SQLContext和SparkSession之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!