本文介绍了从 Apache Spark SQL split() 函数中获取最后一个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想从 Spark SQL split() 函数返回的数组中获取最后一个元素.
I want to get the last element from the Array that return from Spark SQL split() function.
split(4:3-2:3-5:4-6:4-5:2,'-')
我知道它可以通过
split(4:3-2:3-5:4-6:4-5:2,'-')[4]
但是当我不知道 Array 的长度时,我想要另一种方式.请帮帮我.
But i want another way when i don't know the length of the Array .please help me.
推荐答案
您可以使用 UDF 来做到这一点,如下所示:
You could use an UDF to do that, as follow:
val df = sc.parallelize(Seq((1L,"one-last1"), (2L,"two-last2"), (3L,"three-last3"))).toDF("key","Value")
+---+-----------+
|key|Value |
+---+-----------+
|1 |one-last1 |
|2 |two-last2 |
|3 |three-last3|
+---+-----------+
val get_last = udf((xs: Seq[String]) => Try(xs.last).toOption)
val with_just_last = df.withColumn("Last" , get_last(split(col("Value"), "-")))
+---+-----------+--------+
|key|Value |Last |
+---+-----------+--------+
|1 |one-last1 |last1 |
|2 |two-last2 |last2 |
|3 |three-last3|last3 |
+---+-----------+--------+
请记住,SparkSQL 中的 split 函数可以应用于 DataFrame 的列.
Remember that the split function from SparkSQL can be applied to a column of the DataFrame.
这篇关于从 Apache Spark SQL split() 函数中获取最后一个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!