问题描述
我已经定义了以下功能UDF SparkSQL注册:
I have defined the following function to register as UDF SparkSQL:
def array_sum(x: WrappedArray[Long]): Long= {
x.sum
}
我想,这个函数接收作为参数任何数值类型的作品。我试过如下:
I would like that this function works with any numeric type that receives as argument. I tried the following:
import Numeric.Implicits._
import scala.reflect.ClassTag
def array_sum(x: WrappedArray[NumericType]) = {
x.sum
}
但它不工作。有任何想法吗?谢谢!
But it does not work. Any ideas? Thank you!
推荐答案
NumericType
是星火SQL特定的,从来没有接触到接收标准的Scala对象的UDF。因此,最有可能的,你想是这样的:
NumericType
is Spark SQL specific and is never exposed to UDFs which receive standard Scala objects. So most likely you want something like this:
def array_sum[T : Numeric : ClassTag](x: Seq[T]) = x.sum
udf[Double, Seq[Double]](array_sum _)
虽然它看起来不像有很多在这里获得。要建立这样的事情你应该实现自定义的前pression的正确途径。
although it doesn't look like there is much to gain here. To build something like this the right way you should probably implement custom expression.
实例:
val rddDouble: RDD[(Long, Array[Double])] = sc.parallelize(Seq(1L, Array(1.0, 2.0)
val double_array_sum = udf[Double, Seq[Double]](array_sum _)
rddDouble.toDF("k", "v").select(double_array_sum($"v")).show
// +------+
// |UDF(v)|
// +------+
// | 3.0|
// +------+
val rddFloat: RDD[(Long, Array[Float])] = sc.parallelize(Seq(
(1L, Array(1.0f, 2.0f))
))
val float_array_sum = udf[Float, Seq[Float]](array_sum _)
rddFloat.toDF("k", "v").select(float_array_sum($"v")).show
// +------+
// |UDF(v)|
// +------+
// | 3.0|
// +------+
这篇关于定义数值类型SparkSQL斯卡拉功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!