问题描述
任务:对于所有 condition == FALSE
,将groupmean设置为所有个数字
的均值组
。
对于所有 condition == TRUE
,仅在的情况下,将groupmean设置为
。 个数字
的均值组
的condition == TRUE
我想有一个解决方案,它不需要复制整个data.table,而是将所需的列添加到位。我敢打赌,这里有一个简单的解决方案,但是我有点迷路了……
Task: For all condition==FALSE
, set groupmean to mean of all numbers
by group
.For all condition==TRUE
set groupmean to mean of numbers
only where condition==TRUE
by group
.I would like to have a solution which does not require copying the whole data.table but adds the desired column in place. I bet there's a plain simple solution, but I got lost a little...
到目前为止我的尝试:
set.seed(42)
require(data.table)
DT <- data.table(condition=sample(c(TRUE,FALSE), 50, replace=T),
group=rep(LETTERS[1:4], times=25),
numbers=1:100)
# modifies the right rows, but wrong value
DT[condition==FALSE, groupmean_1 := mean(numbers), by=group]
# right values, but not only rows where condition=FALSE
DT[, groupmean_2 := mean(numbers), by=group]
head(DT)
condition group numbers groupmean_1 groupmean_2
1: FALSE A 1 42.66667 49
2: FALSE B 2 55.68421 50
3: TRUE C 3 NA 51
4: FALSE D 4 47.78947 52
5: FALSE A 5 42.66667 49
6: FALSE B 6 55.68421 50
推荐答案
您应该颠倒定义 groupmean
。将其计算为所有行的组平均值,然后替换 condition == TRUE
之后的行。
You should reverse the sequence of how you define groupmean
. Compute it as the group average for all rows, and substitute the rows where condition == TRUE
afterwards.
DT[, groupmean:=mean(numbers), by=group]
DT[condition==TRUE, groupmean:=mean(numbers), by='group,condition']
我希望对您有所帮助
这篇关于R data.table:为所有行中有条件的行子集添加新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!