问题描述
我正在尝试使用 Scala API 将CASE WHEN ... ELSE ..."计算列添加到现有 DataFrame.起始数据帧:
I'm trying to add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame, using Scala APIs.Starting dataframe:
color
Red
Green
Blue
所需的数据框(SQL 语法:CASE WHEN color == Green THEN 1 ELSE 0 END AS bool):
Desired dataframe (SQL syntax: CASE WHEN color == Green THEN 1 ELSE 0 END AS bool):
color bool
Red 0
Green 1
Blue 0
我应该如何实现这个逻辑?
How should I implement this logic?
推荐答案
在即将发布的 SPARK 1.4.0 版本中(应该在未来几天内发布).您可以使用 when/otherwise 语法:
In the upcoming SPARK 1.4.0 release (should be released in the next couple of days). You can use the when/otherwise syntax:
// Create the dataframe
val df = Seq("Red", "Green", "Blue").map(Tuple1.apply).toDF("color")
// Use when/otherwise syntax
val df1 = df.withColumn("Green_Ind", when($"color" === "Green", 1).otherwise(0))
如果您使用的是 SPARK 1.3.0,您可以选择使用 UDF:
If you are using SPARK 1.3.0 you can chose to use a UDF:
// Define the UDF
val isGreen = udf((color: String) => {
if (color == "Green") 1
else 0
})
val df2 = df.withColumn("Green_Ind", isGreen($"color"))
这篇关于Apache Spark,添加一个“CASE WHEN ... ELSE ...";计算列到现有 DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!