本文介绍了如何计算跨列但一列的总数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在数据框中创建一个总计"行.

I want to create a "Total" row in a dataframe.

这将添加除uid单元格以外的所有行.

This will add all rows EXCEPT the uid cell.

uid  val1 val2 val3
3213 1    2    3

要创建此文件:

uid  val1 val2 val3 Total
3213 1    2    3     6

因此,我需要过滤掉UID,然后求和.但是,如果在求和前删除UID,那么求和后将无法重新连接表(因为连接必须在UID上).

So, I need to filter out the UID, then sum. However, if I drop the UID before summing, then I won't be able to rejoin the tables after summing (as the join would have to be on UID).

我正在使用过滤器,但是找不到在过滤器中获取列名的方法.

I was playing with filter, but I cannot find a way to get the Column Name in filter.

所以我到目前为止是:

   val dfvReducedTotalled = dfvReduced.withColumn("TOTAL", dfvReduced.columns
  .filter(col=> !col.?????? == "UID")
  .map(c => col(c)).reduce((c1, c2) => c1 + c2))

推荐答案

您可以首先收集不是uid的列名,使用reduce生成sum表达式,然后创建Total列:

You can collect column names that are not uid firstly, build the sum expressions using reduce and then create the Total column:

val row_sum_expr = df.columns.collect{ case x if x != "uid" => col(x) }.reduce(_ + _)
df.withColumn("Total", row_sum_expr).show
+----+----+----+----+-----+
| uid|val1|val2|val3|Total|
+----+----+----+----+-----+
|3213|   1|   2|   3|    6|
+----+----+----+----+-----+

这篇关于如何计算跨列但一列的总数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 16:43