问题描述
df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
我想在上述数据框中添加一个新列,该列采用 value
列的累积平均值,而不考虑NA.是否可以使用 dplyr
来做到这一点?我已经尝试过
I'd like to add a new column to the above dataframe which takes the cumulative mean of the value
column, not taking into account NAs. Is it possible to do this with dplyr
? I've tried
df <- df %>% group_by(category) %>% mutate(new_col=cummean(value))
但是 cummean
只是不知道如何使用NA.
but cummean
just doesn't know what to do with NAs.
我不想将NA计数为0.
I do not want to count NAs as 0.
推荐答案
您可以使用 ifelse
将 NA
s视为 0
cummean
呼叫:
You could use ifelse
to treat NA
s as 0
for the cummean
call:
library(dplyr)
df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
df %>%
group_by(category) %>%
mutate(new_col = cummean(ifelse(is.na(value), 0, value)))
输出:
# A tibble: 8 x 3
# Groups: category [2]
category value new_col
<fct> <dbl> <dbl>
1 cat1 NA 0.
2 cat1 2. 1.00
3 cat2 3. 3.00
4 cat1 4. 2.00
5 cat2 5. 4.00
6 cat2 NA 2.67
7 cat1 7. 3.25
8 cat2 8. 4.00
现在我看到这与忽略NA不同.
Now I see this isn't the same as ignoring NAs.
尝试使用此方法.我按一列进行分组,该列指定值是否为 NA
,这意味着 cummean
可以运行而不会遇到任何NA:
Try this one instead. I group by a column which specifies if the value is NA
or not, meaning cummean
can run without encountering any NAs:
library(dplyr)
df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
df %>%
group_by(category, isna = is.na(value)) %>%
mutate(new_col = ifelse(isna, NA, cummean(value)))
输出:
# A tibble: 8 x 4
# Groups: category, isna [4]
category value isna new_col
<fct> <dbl> <lgl> <dbl>
1 cat1 NA TRUE NA
2 cat1 2. FALSE 2.00
3 cat2 3. FALSE 3.00
4 cat1 4. FALSE 3.00
5 cat2 5. FALSE 4.00
6 cat2 NA TRUE NA
7 cat1 7. FALSE 4.33
8 cat2 8. FALSE 5.33
这篇关于将cummean与group_by一起使用并忽略NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!