我想在分组数据的滑动窗口上计算总和。

因为我想尽可能地坚持官方职能,所以我从rollapplyr开始,如下所示:

library(tidyverse)
library(reshape2)
library(zoo)

data = data.frame(Count=seq(1,10,1),
                  group=c("A","B","A","A","B","B","B","B","A","A"))


window_size <- 3

data_rolling <- data %>%
  arrange(group) %>%
  group_by(group) %>%
  mutate(Rolling_Count = rollapplyr(Count, width=window_size, FUN=sum, fill = NA)) %>%
  ungroup()

对于小于宽度的第一个条目(在这种情况下为3),它会按定义填充NA,但实际上,我想像这样存储可能的数据总和:
 Count group Rolling_Count expected_Result
 1     A            NA    1
 3     A            NA    4
 4     A            8     8
 9     A            16    16
10     A            23    23
 2     B            NA    2
 5     B            NA    7
 6     B            13    13
 7     B            18    18
 8     B            21    21

我知道我可以用以下内容替换width=window_size:
c(rep(1:window_size,1),rep(window_size:window_size,(n()-window_size)))

得到我想要的东西,但这真的很慢。另外,该方法将假定n()大于window_size。

因此:是否已经有一个R / zoo函数可以处理上述分组数据,另外还具有少于window_size条目的数据,并且比上述方法更快?

感谢您的提示!

最佳答案

基于data.tableRcppRoll的解决方案应该性能更高。

它不像我想要的那么干净-实际上partial中有一个RcppRoll::roll_sum()参数尚未实现,从理论上讲可以很干净地解决此问题,但似乎不会很快就可以使用-参见GH Issue #18

无论如何,除非有人在R中实现允许您在此处使用的滚动总和,否则在第一个cumsum行上添加n - 1似乎是一个明智的解决方案。

library(data.table)
library(RcppRoll)

data = data.frame(Count=seq(1,10,1),
                  group=c("A","B","A","A","B","B","B","B","A","A"))

## Convert to a `data.table` by reference
setDT(data)
window_size <- 3

## Add a counter row so that we can go back and fill in rows
## 1 & 2 of each group
data[,Group_RowNumber := seq_len(.N), keyby = .(group)]

## Do a rolling window -- this won't fill in the first 2 rows
data[,Rolling_Count := RcppRoll::roll_sum(Count,
                                          n = window_size,
                                          align = "right",
                                          fill = NA), keyby = .(group)]

## Go back and fill in the ones we missed
data[Group_RowNumber < window_size, Rolling_Count := cumsum(Count), by = .(group)]

data

#     Count group Group_RowNumber Rolling_Count
#  1:     1     A               1             1
#  2:     3     A               2             4
#  3:     4     A               3             8
#  4:     9     A               4            16
#  5:    10     A               5            23
#  6:     2     B               1             2
#  7:     5     B               2             7
#  8:     6     B               3            13
#  9:     7     B               4            18
# 10:     8     B               5            21

10-01 05:08