问题描述
我想创建多个变量的多个滞后,所以我认为编写一个函数将是有帮助的。我的代码抛出一个警告(截断向量到长度1)和错误的结果:
I want to create multiple lags of multiple variables, so I thought writing a function would be helpful. My code throws a warning ("Truncating vector to length 1 ") and false results:
library(dplyr)
time <- c(2000:2009, 2000:2009)
x <- c(1:10, 10:19)
id <- c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2)
df <- data.frame(id, time, x)
three_lags <- function (data, column, group, ordervar) {
data <- data %>%
group_by_(group) %>%
mutate(a = lag(column, 1L, NA, order_by = ordervar),
b = lag(column, 2L, NA, order_by = ordervar),
c = lag(column, 3L, NA, order_by = ordervar))
}
df_lags <- three_lags(data=df, column=x, group=id, ordervar=time) %>%
arrange(id, time)
另外我想知道是否有更优雅解决方案使用 mutate_each
,但我也没有这样工作。我当然可以为每个新的滞后变量写一个长代码,但是Id喜欢避免这种变化。
Also I wondered if there might be a more elegant solution using mutate_each
, but I didn't get that to work either. I can of course just write a long code with a line for each new lagged variable, but Id like to avoid that.
编辑:
akrun的dplyr应答工作,但需要很长时间来计算大数据帧。使用 data.table
的解决方案似乎更有效率。所以一个dplyr或其他解决方案也可以实现几列几个滞后仍然被发现。
akrun's dplyr answer works, but takes a long time to compute for large data frames. The solution using data.table
seems to be more efficient. So a dplyr or other solution that also allows the be implemented for several columns & several lags is still to be found.
编辑2:
对于多个列,没有组(例如 ID)以下解决方案似乎非常适合我,由于其简单性。代码当然可以缩短,但是一步一步地:
For multiple columns and no groups (e.g. "ID") the following solution seems very well suited to me, due to its simplicity. The code may of course be shortened, but step by step:
df <- arrange(df, time)
df.lag <- shift(df[,1:24], n=1:3, give.names = T) ##column indexes of columns to be lagged as "[,startcol:endcol]", "n=1:3" sepcifies the number of lags (lag1, lag2 and lag3 in this case)
df.result <- bind_cols(df, df.lag)
推荐答案
我们可以使用 shift
from data.table
可以为'n'采取多个值
We can use shift
from data.table
which can take multiple values for 'n'
library(data.table)
setDT(df)[order(time), c("a", "b", "c") := shift(x, 1:3) , id][order(id, time)]
假设我们需要在多个列上执行
Suppose, we need to do this on multiple columns
df$y <- df$x
setDT(df)[order(time), paste0(rep(c("x", "y"), each =3),
c("a", "b", "c")) :=shift(.SD, 1:3), id, .SDcols = x:y]
shift
也可以在 dplyr中使用
library(dplyr)
df %>%
group_by(id) %>%
arrange(id, time) %>%
do(data.frame(., setNames(shift(.$x, 1:3), c("a", "b", "c"))))
# id time x a b c
# <dbl> <int> <int> <int> <int> <int>
#1 1 2000 1 NA NA NA
#2 1 2001 2 1 NA NA
#3 1 2002 3 2 1 NA
#4 1 2003 4 3 2 1
#5 1 2004 5 4 3 2
#6 1 2005 6 5 4 3
#7 1 2006 7 6 5 4
#8 1 2007 8 7 6 5
#9 1 2008 9 8 7 6
#10 1 2009 10 9 8 7
#11 2 2000 10 NA NA NA
#12 2 2001 11 10 NA NA
#13 2 2002 12 11 10 NA
#14 2 2003 13 12 11 10
#15 2 2004 14 13 12 11
#16 2 2005 15 14 13 12
#17 2 2006 16 15 14 13
#18 2 2007 17 16 15 14
#19 2 2008 18 17 16 15
#20 2 2009 19 18 17 16
这篇关于调试功能:为多列创建多个滞后(dplyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!