问题描述
说我在R中有一个数据帧,如下所示:
Say I have a data frame in R as follows:
> set.seed(1)
> X <- runif(50, 0, 1)
> Y <- runif(50, 0, 1)
> df <- data.frame(X,Y)
> head(df)
X Y
1 0.2655087 0.47761962
2 0.3721239 0.86120948
3 0.5728534 0.43809711
4 0.9082078 0.24479728
5 0.2016819 0.07067905
6 0.8983897 0.09946616
如何在X上执行Y的递归回归,从前20个观察值开始,然后一次将一个观察值增加一个回归窗口,直到覆盖整个样本为止?
How do I perform a recursive regression of Y on X, starting at say the first 20 observations and increasing the regression window by one observation at a time until it covers the full sample?
关于如何执行固定窗口长度的滚动回归(例如,使用zoo
包中的rollapply
),有很多信息.但是,当我找到一个简单的递归选项时,我的搜索工作徒劳无功,在递归选项中,起点固定了,窗口大小却增加了. quantreg
包中的lm.fit.recursive
函数是一个例外(此处).这完美地工作了……但事实上,它没有记录有关标准错误的任何信息,而我需要构造一个递归置信区间.
There is a lot of information out there on how to perform a rolling regression of fixed window length (e.g. using rollapply
in the zoo
package). However, my search efforts have come up in vain when it comes to finding a simple recursive option, where the starting point is instead fixed and the window size increases. An exception is the lm.fit.recursive
function from the quantreg
package (here). This works perfectly... but for the fact that it doesn't record any information about standard errors, which I need for a constructing recursive confidence intervals.
我当然可以使用循环来实现这一目标.但是,我的实际数据帧非常大,并且也按id分组,这会带来麻烦.因此,我希望找到一个更有效的选择.基本上,我正在寻找Stata中的滚动递归"命令的R等效项.
I can of course use a loop to achieve this. However, my actual data frame is very large and also grouped by id, which causes complications. So I'm hoping to find a more efficient option. Basically, I'm looking for the R equivalent of the "rolling [...], recursive" command in Stata.
推荐答案
也许会有所帮助:
set.seed(1)
X1 <- runif(50, 0, 1)
X2 <- runif(50, 0, 10) # I included another variable just for a better demonstration
Y <- runif(50, 0, 1)
df <- data.frame(X1,X2,Y)
rolling_lms <- lapply( seq(20,nrow(df) ), function(x) lm( Y ~ X1+X2, data = df[1:x , ]) )
使用上面的lapply
函数,您可以使用完整的信息进行递归回归.
Using the above lapply
function you the recursive regression you want with full information.
例如,对于具有20个观察值的第一次回归:
For example for the first regression with 20 observations:
> summary(rolling_lms[[1]])
Call:
lm(formula = Y ~ X1 + X2, data = df[1:x, ])
Residuals:
Min 1Q Median 3Q Max
-0.45975 -0.19158 -0.05259 0.13609 0.67775
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.61082 0.17803 3.431 0.00319 **
X1 -0.37834 0.23151 -1.634 0.12060
X2 0.01949 0.02541 0.767 0.45343
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2876 on 17 degrees of freedom
Multiple R-squared: 0.1527, Adjusted R-squared: 0.05297
F-statistic: 1.531 on 2 and 17 DF, p-value: 0.2446
并拥有您需要的所有信息.
And has all the info you need.
> length(rolling_lms)
[1] 31
它从20个观测值开始进行了31次线性回归,直到达到50个.所有包含所有信息的回归都存储为rolling_lms列表的元素.
It performed 31 linear regressions starting from 20 observations and until it reached 50. Every regression with all the information is stored as an element of the rolling_lms list.
编辑
根据下面Carl的评论,为了获得每次回归的所有斜率的矢量,在这种情况下,对于X1变量,这是一种非常好的技术(如Carl建议):
As per Carl's comment below, in order to get a vector of all the slopes for each regression, for X1 variable on this occasion, this is a very good technique (as Carl suggested):
all_slopes<-unlist(sapply(1:31,function(j) rolling_lms[[j]]$coefficients[2]))
输出:
> all_slopes
X1 X1 X1 X1 X1 X1 X1 X1 X1 X1
-0.37833614 -0.23231852 -0.20465589 -0.20458938 -0.11796060 -0.14621369 -0.13861210 -0.11906724 -0.10149900 -0.14045509
X1 X1 X1 X1 X1 X1 X1 X1 X1 X1
-0.14331323 -0.14450837 -0.16214836 -0.15715630 -0.17388457 -0.11427933 -0.10624746 -0.09767893 -0.10111773 -0.06415914
X1 X1 X1 X1 X1 X1 X1 X1 X1 X1
-0.06432559 -0.04492075 -0.04122131 -0.06138768 -0.06287532 -0.06305953 -0.06491377 -0.01389334 -0.01703270 -0.03683358
X1
-0.02039574
这篇关于R中的递归回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!