本文介绍了Spark DataFrame使用键作为成员分解地图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在 databrick的博客:
// input
{
"a": {
"b": 1,
"c": 2
}
}
Python: events.select(explode("a").alias("x", "y"))
Scala: events.select(explode('a) as Seq("x", "y"))
SQL: select explode(a) as (x, y) from events
// output
[{ "x": "b", "y": 1 }, { "x": "c", "y": 2 }]
但是,我看不到导致我将地图更改为将键展平到然后分解的数组的方法:
However, I can't see a way that this leads me to change my map to an array into which the key is flattened which is then exploded:
// input
{
"id": 0,
"a": {
"b": {"d": 1, "e": 2}
"c": {"d": 3, "e": 4}
}
}
// Schema
struct<id:bigint,a:map<string,struct<d:bigint,e:bigint>>>
root
|-- id: long (nullable = true)
|-- a: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
| | |-- d: long (nullable = true)
| | |-- e: long (nullable = true)
// Imagined proces
Python: …
Scala: events.select('id, explode('a) as Seq("x", "*")) //? "*" ?
SQL: …
// Desired output
[{ "id": 0, "x": "b", "d": 1, "e": 2 }, { "id": 0, "x": "c", "d": 3, "e": 4 }]
有没有一种明显的方式可以让人们输入这样的表格:
Is there some obvious way that one could take such input to make a table like:
id | x | d | e
---|---|---|---
0 | b | 1 | 2
0 | c | 3 | 4
推荐答案
尽管我不知道是否可以用一个explode
炸开地图,但是有一种使用UDF的方法.诀窍是使用Row#schema.fields(i).name
获取键"的名称
Although I don't know whether its possible to explode the map with one single explode
, there is a way to it with a UDF. The trick is to use Row#schema.fields(i).name
to get the name of the "key"
def mapStructs = udf((r: Row) => {
r.schema.fields.map(f => (
f.name,
r.getAs[Row](f.name).getAs[Long]("d"),
r.getAs[Row](f.name).getAs[Long]("e"))
)
})
df
.withColumn("udfResult", explode(mapStructs($"a")))
.withColumn("x", $"udfResult._1")
.withColumn("d", $"udfResult._2")
.withColumn("e", $"udfResult._3")
.drop($"udfResult")
.drop($"a")
.show
给予
+---+---+---+---+
| id| x| d| e|
+---+---+---+---+
| 0| b| 1| 2|
| 0| c| 3| 4|
+---+---+---+---+
这篇关于Spark DataFrame使用键作为成员分解地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!