问题描述
我正在尝试学习R,在SAS中我做了10年以上的工作,我还无法完全确定在R中做最好的方法.请获取以下数据:
I'm trying to learn R and there are a few things I've done for 10+ years in SAS that I cannot quite figure out the best way to do in R. Take this data:
id class t count desired
-- ----- ---------- ----- -------
1 A 2010-01-15 1 1
1 A 2010-02-15 2 3
1 B 2010-04-15 3 3
1 B 2010-09-15 4 4
2 A 2010-01-15 5 5
2 B 2010-06-15 6 6
2 B 2010-08-15 7 13
2 B 2010-09-15 8 21
我想按ID,类别和在4个月的滚动窗口内,将所需的列计算为滚动总和.请注意,对于id和class的每种组合,并非都存在所有月份.
I want to calculate the column desired as a rolling sum by id, class, and within a 4 months rolling window. Notice that not all months are present for each combination of id and class.
在SAS中,我通常以以下两种方式之一进行操作:
In SAS I'd typically do this in one of 2 ways:
-
RETAIN
加上ID&班级. -
PROC SQL
,在相应窗口中的id,class和df1.d-df2.d中,左连接从df作为df1到df作为df2
RETAIN
plus a by id & class.PROC SQL
with a left join from df as df1 to df as df2 on id, class and the df1.d-df2.d within the appropriate window
解决这类问题的最佳R方法是什么?
What is the best R approach to this type of problem?
t <- as.Date(c("2010-01-15","2010-02-15","2010-04-15","2010-09-15",
"2010-01-15","2010-06-15","2010-08-15","2010-09-15"))
class <- c("A","A","B","B","A","B","B","B")
id <- c(1,1,1,1,2,2,2,2)
count <- seq(1,8,length.out=8)
desired <- c(1,3,3,4,5,6,13,21)
df <- data.frame(id,class,t,count,desired)
推荐答案
我几乎不好意思发表这个帖子.我通常这些都很好,但是必须有一个更好的方法.
I'm almost embarrassed to post this. I'm usually pretty good as these, but there's got to be a better way.
这首先使用zoo
的as.yearmon
来获取仅月份和年份的日期,然后将其重塑为每个id
/class
组合以获得一列,然后在之前用零填充,之后的月份以及缺少的月份,然后使用zoo
来获得滚动总和,然后只提取所需的月份,然后与原始数据框合并回去.
This first uses zoo
's as.yearmon
to get the dates in terms of just month and year, then reshapes it to get one column for each id
/class
combination, then fills in with zeros before, after, and for missing months, then uses zoo
to get the rolling sum, then pulls out just the desired months and merges back with the original data frame.
library(reshape2)
library(zoo)
df$yearmon <- as.yearmon(df$t)
dfa <- dcast(id + class ~ yearmon, data=df, value.var="count")
ida <- dfa[,1:2]
dfa <- t(as.matrix(dfa[,-c(1:2)]))
months <- with(df, seq(min(yearmon)-3/12, max(yearmon)+3/12, by=1/12))
dfb <- array(dim=c(length(months), ncol(dfa)),
dimnames=list(paste(months), colnames(dfa)))
dfb[rownames(dfa),] <- dfa
dfb[is.na(dfb)] <- 0
dfb <- rollsumr(dfb,4, fill=0)
rownames(dfb) <- paste(months)
dfb <- dfb[rownames(dfa),]
dfc <- cbind(ida, t(dfb))
dfc <- melt(dfc, id.vars=c("class", "id"))
names(dfc)[3:4] <- c("yearmon", "desired2")
dfc$yearmon <- as.yearmon(dfc$yearmon)
out <- merge(df,dfc)
> out
id class yearmon t count desired desired2
1 1 A Feb 2010 2010-02-15 2 3 3
2 1 A Jan 2010 2010-01-15 1 1 1
3 1 B Apr 2010 2010-04-15 3 3 3
4 1 B Sep 2010 2010-09-15 4 4 4
5 2 A Jan 2010 2010-01-15 5 5 5
6 2 B Aug 2010 2010-08-15 7 13 13
7 2 B Jun 2010 2010-06-15 6 6 6
8 2 B Sep 2010 2010-09-15 8 21 21
这篇关于通过id变量计算滚动总和,缺少时间点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!