我有以下数据集,我想将每个观察值复制等于(终止年-申报年)的次数,然后我想将填充年+1替换为终止年。
l_c_final
id filing_year termination_year
1 1992 1995
2 2005 2009
3 1995 1997
预期产量:
id filing_year termination_year
1 1992 1995
2 1993 1995
3 1994 1995
4 1995 1995
5 2005 2009
6 2006 2009
7 2007 2009
8 2008 2009
9 2009 2009
10 1995 1997
11 1996 1997
12 1997 1997
尝试过:
l_c_fin_curr1 = l_c_final
l_c_fin_curr = l_c_fin_curr1[]
l_c_fin_curr = subset(l_c_fin_curr,filing_year==99999) # creating empty dataframe
for (i in 1:length(l_c_fin_curr1[,1])) {
cur_yr = l_c_fin_curr1$filing_year[i]
ter_yr = l_c_fin_curr1$termination_year[i]
n = as.numeric(ter_yr - cur_yr)
dim = dim(l_c_fin_curr)[1]
l_c_fin_curr[(dim+1):(dim+n+1),] = l_c_fin_curr1[i,]
l_c_fin_curr$filing_year[(dim+1):(dim+n+1)] = l_c_fin_curr$filing_year[(dim+1):(dim+n+1)] + (0:n)
}
我得到了上面的代码所假定的答案,但是问题是我的数据集的大小为400万条记录,占用了48个以上的HRS。
最佳答案
我们可以通过第3列和第2列的差异来复制行的序列,以创建“ dfN”。将'data.frame'转换为'data.table'(setDT(dfN)
),按'id'分组,我们通过在序列中添加'filing_year'的第一个观察值()。最后,将“ id”更改为行的顺序。
dfN <- df1[rep(seq_len(nrow(df1)), (df1[,3]- df1[,2]+1L)),]
library(data.table)
setDT(dfN)[, filing_year:=filing_year[1L]+0:(.N-1) ,id][, id:= 1:.N]
dfN
# id filing_year termination_year
# 1: 1 1992 1995
# 2: 2 1993 1995
# 3: 3 1994 1995
# 4: 4 1995 1995
# 5: 5 2005 2009
# 6: 6 2006 2009
# 7: 7 2007 2009
# 8: 8 2008 2009
# 9: 9 2009 2009
#10: 10 1995 1997
#11: 11 1996 1997
#12: 12 1997 1997
关于r - 如何在R中的一个变量发生变化的情况下,以不同的次数复制每个观察值?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/34260783/