问题描述
关于R中的滚动回归有很多问题,但是我在这里专门寻找使用 dplyr
,扫帚$ c $的东西。 c>和(如果需要)
purrr
。
There are many questions about rolling regression in R, but here I am specifically looking for something that uses dplyr
, broom
and (if needed) purrr
.
这就是使这个问题与众不同的原因。我想保持 tidyverse
的一致性。是否可以使用整洁的工具(例如 purrr:map
和 dplyr
)进行适当的运行回归?
This is what makes this question different. I want to be tidyverse
consistent. Is is possible to do a proper running regression with tidy tools such as purrr:map
and dplyr
?
请考虑以下简单示例:
library(dplyr)
library(purrr)
library(broom)
library(zoo)
library(lubridate)
mydata = data_frame('group' = c('a','a', 'a','a','b', 'b', 'b', 'b'),
'y' = c(1,2,3,4,2,3,4,5),
'x' = c(2,4,6,8,6,9,12,15),
'date' = c(ymd('2016-06-01', '2016-06-02', '2016-06-03', '2016-06-04',
'2016-06-03', '2016-06-04', '2016-06-05','2016-06-06')))
group y x date
<chr> <dbl> <dbl> <date>
1 a 1.00 2.00 2016-06-01
2 a 2.00 4.00 2016-06-02
3 a 3.00 6.00 2016-06-03
4 a 4.00 8.00 2016-06-04
5 b 2.00 6.00 2016-06-03
6 b 3.00 9.00 2016-06-04
7 b 4.00 12.0 2016-06-05
8 b 5.00 15.0 2016-06-06
对于每个组(在此示例中, a
或 b
):
For each group (in this example, a
or b
):
- 计算滚动在最近2次观察中对
x
的y
进行回归。 li>
- 将该滚动回归的系数存储在数据框的一列中。
- compute the rolling regression of
y
onx
over the last 2 observations. - store the coefficient of that rolling regression in a column of the dataframe.
当然,如您所见,滚动回归只能针对每个组的最后两行进行计算。
Of course, as you can see, the rolling regression can only be computed for the last 2 rows in each group.
我尝试使用以下内容,但未成功。
I have tried to use the following, but without success.
data %>% group_by(group) %>%
mutate(rolling_coef = do(tidy(rollapply(. ,
width=2,
FUN = function(df) {t = lm(formula=y ~ x,
data = as.data.frame(df),
na.rm=TRUE);
return(t$coef) },
by.column=FALSE, align="right"))))
Error in mutate_impl(.data, dots) :
Evaluation error: subscript out of bounds.
In addition: There were 21 warnings (use warnings() to see them)
任何
第一个 a
组的最后两行的预期输出为0.5和0.5(确实存在在此示例中, y
和 x
之间是完美的线性相关)
Expected output for the last two rows of the first a
group is 0.5 and 0.5 (there is indeed a perfect linear correlation between y
and x
in this example)
更具体地说:
mydata_1 <- mydata %>% filter(group == 'a',
row_number() %in% c(1,2))
# A tibble: 2 x 3
group y x
<chr> <dbl> <dbl>
1 a 1.00 2.00
2 a 2.00 4.00
> tidy(lm(y ~ x, mydata_1))['estimate'][2,]
[1] 0.5
以及
mydata_2 <- mydata %>% filter(group == 'a',
row_number() %in% c(2,3))
# A tibble: 2 x 3
group y x
<chr> <dbl> <dbl>
1 a 2.00 4.00
2 a 3.00 6.00
> tidy(lm(y ~ x, mydata_2))['estimate'][2,]
[1] 0.5
编辑:
在此
推荐答案
定义函数 Coef
,其参数由 cbind(y,x)形成
,并用截距使x上的y回归,返回系数。然后使用每个组的当前行和先前行应用 rollapplyr
。如果按 last 表示当前行的前2行,即排除当前行,则将2替换为 list(-seq(2))
作为 rollapplyr
的参数。
Define a function Coef
whose argument is formed from cbind(y, x)
and which regresses y on x with an intercept, returning the coefficients. Then apply rollapplyr
using the current and prior rows over each group. If by last you meant the 2 prior rows to the current row, i.e. exclude the current row, then replace 2 with list(-seq(2))
as an argument to rollapplyr
.
Coef <- . %>% as.data.frame %>% lm %>% coef
mydata %>%
group_by(group) %>%
do(cbind(reg_col = select(., y, x) %>% rollapplyr(2, Coef, by.column = FALSE, fill = NA),
date_col = select(., date))) %>%
ungroup
给予:
# A tibble: 8 x 4
group `reg_col.(Intercept)` reg_col.x date
<chr> <dbl> <dbl> <date>
1 a NA NA 2016-06-01
2 a 0 0.500 2016-06-02
3 a 0 0.500 2016-06-03
4 a 0 0.500 2016-06-04
5 b NA NA 2016-06-03
6 b 0.00000000000000126 0.333 2016-06-04
7 b - 0.00000000000000251 0.333 2016-06-05
8 b 0 0.333 2016-06-06
变化
上面的变体为:
Variation
A variation of the above would be:
mydata %>%
group_by(group) %>%
do(select(., date, y, x) %>%
read.zoo %>%
rollapplyr(2, Coef, by.column = FALSE, fill = NA) %>%
fortify.zoo(names = "date")
) %>%
ungroup
仅坡度
如果仅需要坡度,则可以进一步简化。我们使用斜率等于 cov(x,y)/ var(x)
的事实。
slope <- . %>% { cov(.[, 2], .[, 1]) / var(.[, 2])}
mydata %>%
group_by(group) %>%
mutate(slope = rollapplyr(cbind(y, x), 2, slope, by.column = FALSE, fill = NA)) %>%
ungroup
这篇关于在tidyverse中按组滚动回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!