问题描述
例如我有这个 DF
+----+------+------+-------+
| id | val1 | val2 | val3 |
+----+------+------+-------+
| 1 | a | b | c |
| 2 | d | e | f |
| 3 | g | h | i |
并且我希望以这种方式将列 val1
、val2
和 val3
转置为行:
and I want the Columns val1
, val2
and val3
transposed into Rows in this way:
+----+------+------+
| id | key | value|
+----+------+------+
| 1 | val1 | a |
| 1 | val2 | b |
| 1 | val3 | c |
| 2 | val1 | d |
| 2 | val2 | e |
| 2 | val3 | f |
| 3 | val1 | g |
| 3 | val2 | h |
| 3 | val3 | i |
我怎样才能做到这一点?
How can I achieve that?
推荐答案
通过分解 Map 结构,可以获得这样的转换.所以首先将 3 列转换为 Map(基于 https://stackoverflow.com/a/41291156/4725074)然后爆炸:
By exploding a Map structure it is possible to get such transformation. So first transform the 3 columns into a Map (based on https://stackoverflow.com/a/41291156/4725074) for then exploding:
from pyspark.sql.functions as F
from itertools import chain
m = F.create_map(list(chain(*(
(F.lit(c), F.col(c)) for c in df.columns if c in ["val1", "val2", "val3"]))))
df.withColumn('map', m)\
.select('*', F.explode('map')).drop('map')
使用 select()
而不是 witchColumn()
很重要,因为分解 Map 列将生成 2 个新列,key
> 和 value
.
It's important to use a select()
and not a witchColumn()
as when exploding a Map column will generate 2 new columns, key
and value
.
结果将是如示例中报告的那样将所选列转置为行.
The result will be the transposition of the selected columns into rows as reported in the example.
这篇关于如何在 PySpark DataFrame 中将列转置为行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!