本文介绍了在Spark数据框中合并重复列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个spark数据框,该数据框可以包含具有不同行值的重复列,是否可以合并这些重复列并获得没有任何重复列的数据框
I have a spark data frame which can have duplicate columns, with different row values, is it possible to coalesce those duplicate columns and get a dataframe without any duplicate columns
示例:
|name |upload| name| upload1|
| null| null|alice| 101|
| null| null| bob| 231|
|alice| 100| null| null|
| bob| 23| null| null|
应成为-
|name |upload| upload1|
| alice| null| 101|
| bob | null| 231|
|alice| 100| null|
| bob| 23| null|
推荐答案
val DF1 = Seq(
(None, None, Some("alice"), Some(101)),
(None, None, Some("bob"), Some(231)),
(Some("alice"), Some(100), None, None),
(Some("bob"), Some(23), None, None)).
toDF("name","upload", "name1", "upload1")
DF1.withColumn("name", coalesce($"name", $"name1")).drop("name1").show
+-----+------+-------+
| name|upload|upload1|
+-----+------+-------+
|alice| null| 101|
| bob| null| 231|
|alice| 100| null|
| bob| 23| null|
+-----+------+-------+
这篇关于在Spark数据框中合并重复列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!