本文介绍了使用通用类型和附加参数定义UDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想像下面的伪代码一样在scala spark中定义UDF:

I want to define a UDF in scala spark like the pseudo code below:

def transformUDF(size:Int):UserDefinedFunction = udf((input:Seq[T]){

  if (input != null)
    Vectors.dense(input.map(_.toDouble).toArray)
  else
    Vectors.dense(Array.fill[Double](size)(0.0))

})

如果input不为null,则将每个元素都转换为Double Type.
如果input为null,则返回全零向量.

if input is not null, cast every element to Double Type.
if input is null, return a all-zero vector.

我希望T限于数字类型,例如Java中的java.lang.Number.但是Seq[java.lang.Number]似乎无法与toDouble一起使用.

And I want T to be limited to numeric type, like java.lang.Number in Java. But it seems that Seq[java.lang.Number] cannot work with the toDouble.

有什么合适的方法吗?

推荐答案

在我的工作评论中提到为

As mentioned in my working comment as

def transformUDF: UserDefinedFunction = udf((size: Int, input:Seq[java.lang.Number]) => {
  if (input != null)
    Vectors.dense(input.map(_.doubleValue()).toArray)
  else
    Vectors.dense(Array.fill[Double](size)(0.0))
})

您无需创建新列,只需将其传递给udf函数即可

You don't need to create a new column, you can just pass it to the udf function as

dataframe.withColumn("newCol", transformUDF(lit(the size you want), dataframe("the column you want to transform")))

这篇关于使用通用类型和附加参数定义UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 05:45