问题描述
假设我有一张桌子:
101 | [{"system":"x","value":"1"},{"system":"y","value":"2"},{"; system":"z","value":"3"}] |
array_col基本包含结构数组的地方
Where array_col basically contains an array of structures
0:{"system":"x","value":"1"}
0: {"system": "x", "value": "1"}
1:{"system":"y","value":"2"}
1: {"system": "y", "value": "2"}
2:{"system":"z","value":"3"}
2: {"system": "z", "value": "3"}
我需要如下表所示的输出:
I need the output like the following table:
101 | x | 1 |
101 | y | 2 |
101 | z | 3 |
现在,我正在尝试在子查询中使用explode(因为在单个select语句中不能有多个explode,然后根据id将它们加入.但这给了我一个输出,每个系统在其中显示每个值,所以我得到9个结果,而不是3个.
Right now I'm trying to use explode in sub queries (Since can't have multiple explode in a single select statement, and then joining them based on id. But that is giving me an output where each system is showing for each value, so instead of 3 i'm getting 9 results.
101 | x | 1 |
101 | x | 2 |
101 | x | 3 |
101 | y | 1 |
101 | y | 2 |
101 | y | 3 |
101 | z | 1 |
101 | z | 2 |
101 | z | 3 |
帮我获得3行而不是9行的输出.
Help me get the output with 3 rows, instead of 9.
推荐答案
尝试 inline
:
df.selectExpr('id', 'inline(array_col)').show()
+---+------+-----+
| id|system|value|
+---+------+-----+
|101| x| 1|
|101| y| 2|
|101| z| 3|
+---+------+-----+
以上假设数组包含结构,而不是字符串结构.如果您的结构是字符串,则需要先使用 from_json
解析它们:
The above assumes that the arrays contains structs, not structs as strings. If your structs are strings, you need to parse them with from_json
first:
df2 = df.selectExpr(
'id', 'explode(array_col) array_col'
).selectExpr(
'id', "inline(array(from_json(array_col, 'struct<system:string, value:string>')))"
)
df2.show()
+---+------+-----+
| id|system|value|
+---+------+-----+
|101| x| 1|
|101| y| 2|
|101| z| 3|
+---+------+-----+
这篇关于从数据块中的数组列获取数据,而无需交叉联接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!