本文介绍了如何将 StructType 从 Spark 中的 json 数据帧分解为行而不是列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我用这个模式读取了一个嵌套的 json :
I read a nested json with this schema :
root
|-- company: struct (nullable = true)
| |-- 0: string (nullable = true)
| |-- 1: string (nullable = true)
| |-- 10: string (nullable = true)
| |-- 100: string (nullable = true)
| |-- 101: string (nullable = true)
| |-- 102: string (nullable = true)
| |-- 103: string (nullable = true)
| |-- 104: string (nullable = true)
| |-- 105: string (nullable = true)
| |-- 106: string (nullable = true)
| |-- 107: string (nullable = true)
| |-- 108: string (nullable = true)
| |-- 109: string (nullable = true)
当我尝试:
df.select(col("company.*"))
我将结构公司"的每个字段都作为列.但我希望它们作为行.我想在另一列中获得带有 id 和字符串的行:
I get every fields of the struct "company" as columns. But I want them as rows. I would like to get a row with the id and the string in another column :
0 1 10 100 101 102
"hey" "yooyo" "yuyu" "hey" "yooyo" "yuyu"
而是得到类似的东西:
id name
0 "hey"
1 "yoooyo"
10 "yuuy"
100 "hey"
101 "yooyo"
102 "yuyu"
预先感谢您的帮助,
棘手
推荐答案
尝试使用 union:
Try this using union:
val dfExpl = df.select("company.*")
dfExpl.columns
.map(name => dfExpl.select(lit(name),col(name)))
.reduce(_ union _)
.show
或者使用数组/爆炸:
val dfExpl = df.select("company.*")
val selectExpr = dfExpl
.columns
.map(name =>
struct(
lit(name).as("id"),
col(name).as("value")
).as("col")
)
dfExpl
.select(
explode(array(selectExpr: _*))
)
.select("col.*")
.show()
这篇关于如何将 StructType 从 Spark 中的 json 数据帧分解为行而不是列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!