问题描述
我手头有一个问题说明,其中我想取消显示spark-sql/pyspark中的表.我已经阅读了文档,并且可以看到到目前为止仅支持数据透视,但不支持取消数据透视.有什么办法可以做到这一点?
I have a problem statement at hand wherein I want to unpivot table in spark-sql/pyspark. I have gone through the documentation and I could see there is support only for pivot but no support for un-pivot so far.Is there a way I can achieve this?
让我的初始表格如下:
当我使用下面提到的命令在pyspark中旋转它时:
when I pivot this in pyspark using below mentioned command:
df.groupBy("A").pivot("B").sum("C")
我得到这个作为输出:
现在,我想取消透视表.通常,根据我对原始表格的处理方式,此操作可能会/可能不会产生原始表.
Now I want to unpivot the pivoted table. In general this operation may/may not yield the original table based on how I've pivoted the original table.
Spark-sql尚未提供对unpivot的开箱即用支持.有什么办法可以做到这一点?
Spark-sql as of now doesn't provide out of the box support for unpivot. Is there a way I can achieve this?
推荐答案
您可以使用内置的堆栈函数,例如在Scala中:
You can use the built in stack function, for example in Scala:
scala> val df = Seq(("G",Some(4),2,None),("H",None,4,Some(5))).toDF("A","X","Y", "Z")
df: org.apache.spark.sql.DataFrame = [A: string, X: int ... 2 more fields]
scala> df.show
+---+----+---+----+
| A| X| Y| Z|
+---+----+---+----+
| G| 4| 2|null|
| H|null| 4| 5|
+---+----+---+----+
scala> df.select($"A", expr("stack(3, 'X', X, 'Y', Y, 'Z', Z) as (B, C)")).where("C is not null").show
+---+---+---+
| A| B| C|
+---+---+---+
| G| X| 4|
| G| Y| 2|
| H| Y| 4|
| H| Z| 5|
+---+---+---+
或在pyspark中:
Or in pyspark:
In [1]: df = spark.createDataFrame([("G",4,2,None),("H",None,4,5)],list("AXYZ"))
In [2]: df.show()
+---+----+---+----+
| A| X| Y| Z|
+---+----+---+----+
| G| 4| 2|null|
| H|null| 4| 5|
+---+----+---+----+
In [3]: df.selectExpr("A", "stack(3, 'X', X, 'Y', Y, 'Z', Z) as (B, C)").where("C is not null").show()
+---+---+---+
| A| B| C|
+---+---+---+
| G| X| 4|
| G| Y| 2|
| H| Y| 4|
| H| Z| 5|
+---+---+---+
这篇关于取消在spark-sql/pyspark中的透视的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!