本文介绍了删除data.table的分组变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 data.table 进行一些争论,并且希望我生成的数据表包含分组变量.

I'd like to use data.table to do some wrangling and would like my resulting data table to not include the grouping variable.

这是MWE:

library("data.table")
DT <- data.table(x = 1:10, grp = rep(1:2,5))
DT[, .(mmm = mean(x)), by = grp]

这将产生:

   grp mmm
1:   1   5
2:   2   6

这很好.但是,我希望 grp 不在这里.可以通过链接 data.table 调用并设置 grp:= NULL 或只是将变量丢弃来解决此问题,但是我可以在第一次调用中阻止它,所以我只能返回 mmm ?

which is all fine. However, I'd prefer the grp not to be here. This can be fixed by chaining the data.table calls and setting grp := NULL or just throwing the variable away, but can I prevent it in the first call so I only return mmm?

推荐答案

目前尚不清楚为什么您不想使用它.使用 DT [,.(mmm = mean(x)),by = grp] [,grp:= NULL] [] 是我的首选.

It isn't clear why you don't want to use this. Using DT[, .(mmm = mean(x)), by = grp][, grp := NULL][] would be my first choice.

尽管我不建议这样做,但您也可以使用:

Although I won't advise it, you can also use:

DT[, .(mmm = DT[, .(mmm = mean(x)), by = grp]$mmm)]

这也将为您提供所需的结果:

which will give you the desired result as well:

   mmm
1:   5
2:   6

尽管您将获得相同的结果,但最好不要使用此方法.这样做的主要缺点是,当您要汇总除value列之外的内容时,会使您的代码变得不必要的复杂.然后,您将得到类似的内容:

Although you will get the same result, it is better not to use this method. The major drawback of this is that you will make your code unnecessary complicated when you want to summarise more than value column. You would then get something like:

DT[, .(mx = DT[, .(mx = mean(x)), by = grp]$mx, my = DT[, .(my = mean(y)), by = grp]$my)]

使用常规 data.table-way

为:

DT[, .(mx = mean(x), my = mean(y)), by = grp][, grp := NULL][]

总结:

使用 DT [,.(mmm = mean(x)),by = grp] [,grp:= NULL] [] 方法将是您的最佳选择.

Using the DT[, .(mmm = mean(x)), by = grp][, grp := NULL][] method would thus be your best choice.

这篇关于删除data.table的分组变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-14 05:46