使用scala火花数据框操作行和列级别

本文介绍了使用scala火花数据框操作行和列级别的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

原始数据框
0.2 0.3

Original Data frame
0.2 0.3

+------+------------- -+
|  name| country |
+------+---------------+
|Raju  |UAS         |
|Ram  |Pak.         |
|null    |China      |
|null    |null          |
+------+--------------+

  I Need  this
+------+--------------+
|Nwet|wet Con |
+------+--------------+
|0.2   | 0.3           |
|0.2   | 0.3           |
|0.0   | 0.3.          |
|0.0   | 0.0           |
+------+--------------+

我想创建一个 Udf .对于两个列
这将应用于 Name Column 它检查它是否不为 null 然后它返回 0.2 return 0.0 .并且相同的 Udf 适用于 country 列检查它是否为 null 返回 0.0 .不为空则返回 0.3

i want to create one Udf . for Both the column
which will apply to Name Column it check the if it not null then it return 0.2 return 0.0 .and same Udf apply to country column check if it null return 0.0 . not null then it return 0.3

推荐答案

使用 apache 的 StringUtils:

Using StringUtils of apache:

val transcodificationName: UserDefinedFunction =
    udf { (name: String) => {
        if (StringUtils.isBlank(name)) 0.0
        else 0.2
        }
    }
val transcodificationCountry: UserDefinedFunction =
    udf { (country: String) => {
        if (StringUtils.isBlank(country)) 0.0
        else 0.3
        }
    }

dataframe
    .withColumn("Nwet", transcodificationName(col("name"))).cast(DoubleType)
    .withColumn("wetCon", transcodificationCountry(col("country"))).cast(DoubleType)
    .select("Nwet", "wetcon")

val transcodificationColumns: UserDefinedFunction =
        udf { (input: String, columnName:String) => {
                if (StringUtils.isBlank(country)) 0.0
                else if(columnName.equals("name")) 0.2
                else if(columnName.equals("country") 0.3
                else 0.0
            }
        }


    dataframe
        .withColumn("Nwet", transcodificationColumns(col("name"), "name")).cast(DoubleType)
        .withColumn("wetCon", transcodificationColumns(col("country")), "country").cast(DoubleType)
        .select("Nwet", "wetcon")

这篇关于使用scala火花数据框操作行和列级别的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..