本文介绍了导入不带SparkSession实例的隐式转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的Spark-Code充满了这样的代码

My Spark-Code is cluttered with code like this

object Transformations {
  def selectI(df:DataFrame) : DataFrame = {
    // needed to use $ to generate ColumnName
    import df.sparkSession.implicits._

    df.select($"i")
  }
}

或者

object Transformations {
  def selectI(df:DataFrame)(implicit spark:SparkSession) : DataFrame = {
    // needed to use $ to generate ColumnName
    import sparkSession.implicits._

    df.select($"i")
  }
}

我真的不明白为什么我们只需要导入SparkSession实例来导入这些隐式转换.我想做类似的事情:

I don't really understand why we need an instance of SparkSession just to import these implicit conversions. I would rather like to do something like :

object Transformations {
  import org.apache.spark.sql.SQLImplicits._ // does not work

  def selectI(df:DataFrame) : DataFrame = {
    df.select($"i")
  }
}

是否有解决此问题的简便方法?我对隐式的使用不仅限于$,还包括Encoders.toDF()等.

Is there an elegant solution for this problem? My use of the implicits is not limited to $ but also Encoders, .toDF() etc.

推荐答案

因为每个Dataset都存在于特定的SparkSession范围内,并且单个Spark应用程序可以具有多个活动的SparkSession.

Because every Dataset exists in a scope of specific SparkSession and a single Spark application can have multiple active SparkSession.

理论上,某些SparkSession.implicits._可以与会话实例分开存在,例如:

Theoretically some of the SparkSession.implicits._ could exist separately from the session instance like:

import org.apache.spark.sql.implicits._   // For let's say `$` or `Encoders`
import org.apache.spark.sql.SparkSession.builder.getOrCreate.implicits._  // For toDF

但这会对用户代码产生重大影响.

but it would have a significant impact on the user code.

这篇关于导入不带SparkSession实例的隐式转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 07:42