问题描述
我的Spark-Code充满了这样的代码
My Spark-Code is cluttered with code like this
object Transformations {
def selectI(df:DataFrame) : DataFrame = {
// needed to use $ to generate ColumnName
import df.sparkSession.implicits._
df.select($"i")
}
}
或者
object Transformations {
def selectI(df:DataFrame)(implicit spark:SparkSession) : DataFrame = {
// needed to use $ to generate ColumnName
import sparkSession.implicits._
df.select($"i")
}
}
我真的不明白为什么我们只需要导入SparkSession
实例来导入这些隐式转换.我想做类似的事情:
I don't really understand why we need an instance of SparkSession
just to import these implicit conversions. I would rather like to do something like :
object Transformations {
import org.apache.spark.sql.SQLImplicits._ // does not work
def selectI(df:DataFrame) : DataFrame = {
df.select($"i")
}
}
是否有解决此问题的简便方法?我对隐式的使用不仅限于$
,还包括Encoders
,.toDF()
等.
Is there an elegant solution for this problem? My use of the implicits is not limited to $
but also Encoders
, .toDF()
etc.
推荐答案
因为每个Dataset
都存在于特定的SparkSession
范围内,并且单个Spark应用程序可以具有多个活动的SparkSession
.
Because every Dataset
exists in a scope of specific SparkSession
and a single Spark application can have multiple active SparkSession
.
理论上,某些SparkSession.implicits._
可以与会话实例分开存在,例如:
Theoretically some of the SparkSession.implicits._
could exist separately from the session instance like:
import org.apache.spark.sql.implicits._ // For let's say `$` or `Encoders`
import org.apache.spark.sql.SparkSession.builder.getOrCreate.implicits._ // For toDF
但这会对用户代码产生重大影响.
but it would have a significant impact on the user code.
这篇关于导入不带SparkSession实例的隐式转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!