本文介绍了将cummean与group_by一起使用并忽略NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
                 value=c(NA,2,3,4,5,NA,7,8))

我想在上述数据框中添加一个新列,该列采用 value 列的累积平均值,而不考虑NA.是否可以使用 dplyr 来做到这一点?我已经尝试过

I'd like to add a new column to the above dataframe which takes the cumulative mean of the value column, not taking into account NAs. Is it possible to do this with dplyr? I've tried

df <- df %>% group_by(category) %>% mutate(new_col=cummean(value))

但是 cummean 只是不知道如何使用NA.

but cummean just doesn't know what to do with NAs.

我不想将NA计数为0.

I do not want to count NAs as 0.

推荐答案

您可以使用 ifelse NA s视为 0 cummean 呼叫:

You could use ifelse to treat NAs as 0 for the cummean call:

library(dplyr)

df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
                 value=c(NA,2,3,4,5,NA,7,8))

df %>%
  group_by(category) %>%
  mutate(new_col = cummean(ifelse(is.na(value), 0, value)))

输出:

# A tibble: 8 x 3
# Groups:   category [2]
  category value new_col
  <fct>    <dbl>   <dbl>
1 cat1       NA     0.  
2 cat1        2.    1.00
3 cat2        3.    3.00
4 cat1        4.    2.00
5 cat2        5.    4.00
6 cat2       NA     2.67
7 cat1        7.    3.25
8 cat2        8.    4.00

现在我看到这与忽略NA不同.

Now I see this isn't the same as ignoring NAs.

尝试使用此方法.我按一列进行分组,该列指定值是否为 NA ,这意味着 cummean 可以运行而不会遇到任何NA:

Try this one instead. I group by a column which specifies if the value is NA or not, meaning cummean can run without encountering any NAs:

library(dplyr)

df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
                 value=c(NA,2,3,4,5,NA,7,8))

df %>%
  group_by(category, isna = is.na(value)) %>%
  mutate(new_col = ifelse(isna, NA, cummean(value)))

输出:

# A tibble: 8 x 4
# Groups:   category, isna [4]
  category value isna  new_col
  <fct>    <dbl> <lgl>   <dbl>
1 cat1       NA  TRUE    NA   
2 cat1        2. FALSE    2.00
3 cat2        3. FALSE    3.00
4 cat1        4. FALSE    3.00
5 cat2        5. FALSE    4.00
6 cat2       NA  TRUE    NA   
7 cat1        7. FALSE    4.33
8 cat2        8. FALSE    5.33

这篇关于将cummean与group_by一起使用并忽略NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 20:36