本文介绍了将结构数组分解为Spark中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想将结构数组分解为列(由struct字段定义)。例如,
I'd like to explode an array of structs to columns (as defined by the struct fields). E.g.
root
|-- arr: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: long (nullable = false)
| | |-- name: string (nullable = true)
应转换为
root
|-- id: long (nullable = true)
|-- name: string (nullable = true)
我可以做到
df
.select(explode($"arr").as("tmp"))
.select($"tmp.*")
如何在单个选择语句中做到这一点?
How can I do that in a single select statement?
I认为这样做可以,但是不幸的是,它不起作用:
I thought this could work, unfortunately it does not:
df.select(explode($"arr")(".*"))
推荐答案
单步解决方案仅适用于 MapType
列:
Single step solution is available only for MapType
columns:
val df = Seq(Tuple1(Map((1L, "bar"), (2L, "foo")))).toDF
df.select(explode($"_1") as Seq("foo", "bar")).show
+---+---+
|foo|bar|
+---+---+
| 1|bar|
| 2|foo|
+---+---+
对于数组,您可以使用 flatMap
:
With arrays you can use flatMap
:
val df = Seq(Tuple1(Array((1L, "bar"), (2L, "foo")))).toDF
df.as[Seq[(Long, String)]].flatMap(identity)
单个 SELECT
语句可以用SQL编写:
A single SELECT
statement can written in SQL:
df.createOrReplaceTempView("df")
spark.sql("SELECT x._1, x._2 FROM df LATERAL VIEW explode(_1) t AS x")
这篇关于将结构数组分解为Spark中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!