问题描述
我正在使用Scala和Spark.我有两个数据框.
I am using Scala and Spark.I have two data frames.
第一个如下:
+------+------+-----------+
| num1 | num2 | arr |
+------+------+-----------+
| 25 | 10 | [a,c] |
| 35 | 15 | [a,b,d] |
+------+------+-----------+
在第二个中,数据帧头是
In the second one the data frame headers are
num1, num2, a, b, c, d
我通过添加所有可能的标题列创建了一个案例类.
I have created a case class by adding all the possible header columns.
现在我想要的是,通过匹配num1和num2列,我必须检查是否arr列中的数组包含第二个数据帧的标头.如果是这样,则该值应为1,否则为0.
Now what I want is, by matching the columns num1 and num2, I have to check whetherthe array in arr column contains the headers of the second data frame.If it so the value should be 1, else 0.
因此所需的输出是:
+------+------+---+---+---+---+
| num1 | num2 | a | b | c | d |
+------+------+---+---+---+---+
| 25 | 10 | 1 | 0 | 1 | 0 |
| 35 | 15 | 1 | 1 | 0 | 1 |
+------+------+---+---+---+---+
推荐答案
如果我正确理解,您希望将数组列 arr
转换为每个可能值的一列,该列应包含是否数组包含该值.
If I understand correctly, you want to transform the array column arr
into one column per possible value, that would contain whether or not the array contains that value.
如果是这样,您可以像这样使用 array_contains
函数:
If so, you can use the array_contains
function like this:
val df = Seq((25, 10, Seq("a","c")), (35, 15, Seq("a","b","d")))
.toDF("num1", "num2", "arr")
val values = Seq("a", "b", "c", "d")
df
.select(Seq("num1", "num2").map(col) ++
values.map(x => array_contains('arr, x) as x) : _*)
.show
+----+----+---+---+---+---+
|num1|num2| a| b| c| d|
+----+----+---+---+---+---+
| 25| 10| 1| 0| 1| 0|
| 35| 15| 1| 1| 0| 1|
+----+----+---+---+---+---+
这篇关于使用Scala和Spark将数组列的行与另一个数据框的标题进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!