r - 使用group_by并从dplyr汇总所有不包含group_by变量的行

我有一个data.frame与

df1 <- data.frame(id = c("A", "A", "B", "B", "B"),
                  cost = c(100, 10, 120, 102, 102)

我知道我可以用

df1.a <- group_by(df1, id) %>%
    summarise(no.c = n(),
              m.costs = mean(cost))

计算观察数并用id表示平均值。如果我想计算不等于ID的所有行的观察数和均值，该怎么办，例如，它将给我3作为非A的观察值和2作为非B的观察值。

我想使用dplyr包和group_by函数，因为对于大量海量数据帧，我必须这样做。

最佳答案

您可以使用.引用整个data.frame，从而可以计算组与整个之间的差异：

df1 %>% group_by(id) %>%
    summarise(n = n(),
              n_other = nrow(.) - n,
              mean_cost = mean(cost),
              mean_other = (sum(.$cost) - sum(cost)) / n_other)

## # A tibble: 2 × 5
##       id     n n_other mean_cost mean_other
##   <fctr> <int>   <int>     <dbl>      <dbl>
## 1      A     2       3        55        108
## 2      B     3       2       108         55

从结果中可以看到，使用两个组可以只使用rev，但是这种方法可以轻松扩展到更多的组或计算。

关于r - 使用group_by并从dplyr汇总所有不包含group_by变量的行，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/40699053/