我有以下数据集,我想将每个观察值复制等于(终止年-申报年)的次数,然后我想将填充年+1替换为终止年。

l_c_final

id filing_year termination_year
1  1992         1995
2  2005         2009
3  1995         1997


预期产量:

id  filing_year  termination_year
1   1992         1995
2   1993         1995
3   1994         1995
4   1995         1995
5   2005         2009
6   2006         2009
7   2007         2009
8   2008         2009
9   2009         2009
10  1995         1997
11  1996         1997
12  1997         1997


尝试过:

l_c_fin_curr1 = l_c_final

l_c_fin_curr = l_c_fin_curr1[]
l_c_fin_curr = subset(l_c_fin_curr,filing_year==99999) # creating empty dataframe
for (i in 1:length(l_c_fin_curr1[,1])) {
    cur_yr = l_c_fin_curr1$filing_year[i]
    ter_yr = l_c_fin_curr1$termination_year[i]
    n = as.numeric(ter_yr - cur_yr)
    dim = dim(l_c_fin_curr)[1]
    l_c_fin_curr[(dim+1):(dim+n+1),] = l_c_fin_curr1[i,]
    l_c_fin_curr$filing_year[(dim+1):(dim+n+1)] = l_c_fin_curr$filing_year[(dim+1):(dim+n+1)] + (0:n)
}


我得到了上面的代码所假定的答案,但是问题是我的数据集的大小为400万条记录,占用了48个以上的HRS。

最佳答案

我们可以通过第3列和第2列的差异来复制行的序列,以创建“ dfN”。将'data.frame'转换为'data.table'(setDT(dfN)),按'id'分组,我们通过在序列中添加'filing_year'的第一个观察值()。最后,将“ id”更改为行的顺序。

dfN <- df1[rep(seq_len(nrow(df1)), (df1[,3]- df1[,2]+1L)),]
library(data.table)
setDT(dfN)[, filing_year:=filing_year[1L]+0:(.N-1) ,id][, id:= 1:.N]
dfN
#    id filing_year termination_year
# 1:  1        1992             1995
# 2:  2        1993             1995
# 3:  3        1994             1995
# 4:  4        1995             1995
# 5:  5        2005             2009
# 6:  6        2006             2009
# 7:  7        2007             2009
# 8:  8        2008             2009
# 9:  9        2009             2009
#10:  10       1995             1997
#11:  11       1996             1997
#12:  12       1997             1997

关于r - 如何在R中的一个变量发生变化的情况下,以不同的次数复制每个观察值?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/34260783/

10-12 17:30