


I'm trying to learn R and there are a few things I've done for 10+ years in SAS that I cannot quite figure out the best way to do in R. Take this data:

 id  class           t  count  desired
 --  -----  ----------  -----  -------
  1      A  2010-01-15      1        1
  1      A  2010-02-15      2        3
  1      B  2010-04-15      3        3
  1      B  2010-09-15      4        4
  2      A  2010-01-15      5        5
  2      B  2010-06-15      6        6
  2      B  2010-08-15      7       13
  2      B  2010-09-15      8       21


I want to calculate the column desired as a rolling sum by id, class, and within a 4 months rolling window. Notice that not all months are present for each combination of id and class.


In SAS I'd typically do this in one of 2 ways:

  1. RETAIN加上ID&班级.
  2. PROC SQL,在相应窗口中的id,class和df1.d-df2.d中,左连接从df作为df1到df作为df2
  1. RETAIN plus a by id & class.
  2. PROC SQL with a left join from df as df1 to df as df2 on id, class and the df1.d-df2.d within the appropriate window


What is the best R approach to this type of problem?

t <- as.Date(c("2010-01-15","2010-02-15","2010-04-15","2010-09-15",
class <- c("A","A","B","B","A","B","B","B")
id <- c(1,1,1,1,2,2,2,2)
count <- seq(1,8,length.out=8)
desired <- c(1,3,3,4,5,6,13,21)
df <- data.frame(id,class,t,count,desired)



I'm almost embarrassed to post this. I'm usually pretty good as these, but there's got to be a better way.


This first uses zoo's as.yearmon to get the dates in terms of just month and year, then reshapes it to get one column for each id/class combination, then fills in with zeros before, after, and for missing months, then uses zoo to get the rolling sum, then pulls out just the desired months and merges back with the original data frame.

df$yearmon <- as.yearmon(df$t)
dfa <- dcast(id + class ~ yearmon, data=df, value.var="count")
ida <- dfa[,1:2]
dfa <- t(as.matrix(dfa[,-c(1:2)]))
months <- with(df, seq(min(yearmon)-3/12, max(yearmon)+3/12, by=1/12))
dfb <- array(dim=c(length(months), ncol(dfa)),
             dimnames=list(paste(months), colnames(dfa)))
dfb[rownames(dfa),] <- dfa
dfb[is.na(dfb)] <- 0
dfb <- rollsumr(dfb,4, fill=0)
rownames(dfb) <- paste(months)
dfb <- dfb[rownames(dfa),]
dfc <- cbind(ida, t(dfb))
dfc <- melt(dfc, id.vars=c("class", "id"))
names(dfc)[3:4] <- c("yearmon", "desired2")
dfc$yearmon <- as.yearmon(dfc$yearmon)
out <- merge(df,dfc)

> out
  id class  yearmon          t count desired desired2
1  1     A Feb 2010 2010-02-15     2       3        3
2  1     A Jan 2010 2010-01-15     1       1        1
3  1     B Apr 2010 2010-04-15     3       3        3
4  1     B Sep 2010 2010-09-15     4       4        4
5  2     A Jan 2010 2010-01-15     5       5        5
6  2     B Aug 2010 2010-08-15     7      13       13
7  2     B Jun 2010 2010-06-15     6       6        6
8  2     B Sep 2010 2010-09-15     8      21       21


08-20 09:40