问题描述
我想使用 dplyr 为每小时(因子变量)拟合一个模型,但出现错误,我不太确定是什么问题.
I would like to fit a model for each hour(the factor variable) using dplyr, I'm getting an error, and i'm not quite sure what's wrong.
df.h <- data.frame(
hour = factor(rep(1:24, each = 21)),
price = runif(504, min = -10, max = 125),
wind = runif(504, min = 0, max = 2500),
temp = runif(504, min = - 10, max = 25)
)
df.h <- tbl_df(df.h)
df.h <- group_by(df.h, hour)
group_size(df.h) # checks out, 21 obs. for each factor variable
# different attempts:
reg.models <- do(df.h, formula = price ~ wind + temp)
reg.models <- do(df.h, .f = lm(price ~ wind + temp, data = df.h))
我尝试了各种变体,但我无法让它发挥作用.
I've tried various variations, but I can't get it to work.
推荐答案
截至 2020 年中期,tchakravarty's回答 会失败.为了规避 broom
和 dpylr
似乎交互的新方法,下面的 broom::tidy
, broom 组合::augment
和 broom::glance
都可以使用.我们只需要在 do()
和稍后的 unnest()
tibble 中使用它们.
As of mid 2020, tchakravarty's answer will fail. In order to circumvent the new approach of broom
and dpylr
seem to interact, the following combination of broom::tidy
, broom::augment
and broom::glance
can be used. We just have to use them inside do()
and later unnest()
the tibble.
library(dplyr)
library(broom)
df.h = data.frame(
hour = factor(rep(1:24, each = 21)),
price = runif(504, min = -10, max = 125),
wind = runif(504, min = 0, max = 2500),
temp = runif(504, min = - 10, max = 25)
)
df.h %>% group_by(hour) %>%
do(fitHour = tidy(lm(price ~ wind + temp, data = .))) %>%
unnest(fitHour)
# # A tibble: 72 x 6
# hour term estimate std.error statistic p.value
# <fct> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 (Intercept) 82.4 18.1 4.55 0.000248
# 2 1 wind -0.0212 0.0108 -1.96 0.0655
# 3 1 temp -1.01 0.792 -1.28 0.218
# 4 2 (Intercept) 25.9 19.7 1.31 0.206
# 5 2 wind 0.0204 0.0131 1.57 0.135
# 6 2 temp 0.680 1.01 0.670 0.511
# 7 3 (Intercept) 88.3 15.5 5.69 0.0000214
# 8 3 wind -0.0188 0.00998 -1.89 0.0754
# 9 3 temp -0.669 0.653 -1.02 0.319
# 10 4 (Intercept) 73.4 14.2 5.17 0.0000639
df.h %>% group_by(hour) %>%
do(fitHour = augment(lm(price ~ wind + temp, data = .))) %>%
unnest(fitHour)
# # A tibble: 24 x 13
# hour r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 0.246 0.162 39.0 2.93 0.0790 2 -105. 218. 222. 27334.
# 2 2 0.161 0.0674 43.5 1.72 0.207 2 -107. 223. 227. 34029.
# 3 3 0.192 0.102 33.9 2.14 0.147 2 -102. 212. 217. 20739.
# 4 4 0.0960 -0.00445 34.3 0.956 0.403 2 -102. 213. 217. 21169.
# 5 5 0.230 0.144 31.7 2.68 0.0955 2 -101. 210. 214. 18088.
# 6 6 0.0190 -0.0900 39.8 0.174 0.842 2 -106. 219. 223. 28507.
# 7 7 0.0129 -0.0967 37.1 0.118 0.889 2 -104. 216. 220. 24801.
# 8 8 0.197 0.108 35.3 2.21 0.139 2 -103. 214. 218. 22438.
# 9 9 0.0429 -0.0634 39.4 0.403 0.674 2 -105. 219. 223. 27918.
# 10 10 0.0943 -0.00633 35.6 0.937 0.410 2 -103. 214. 219. 22854.
# # … with 14 more rows, and 2 more variables: df.residual <int>, nobs <int>
df.h %>% group_by(hour) %>%
do(fitHour = glance(lm(price ~ wind + temp, data = .))) %>%
unnest(fitHour)
# # A tibble: 504 x 10
# hour price wind temp .fitted .resid .std.resid .hat .sigma .cooksd
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 94.2 883. -6.64 70.4 23.7 0.652 0.129 39.6 0.0209
# 2 1 19.3 2107. 2.40 35.4 -16.0 -0.431 0.0864 39.9 0.00584
# 3 1 60.5 2161. 18.3 18.1 42.5 1.18 0.146 38.5 0.0795
# 4 1 116. 1244. 12.0 44.0 71.9 1.91 0.0690 35.8 0.0902
# 5 1 117. 1624. -8.05 56.1 60.6 1.67 0.128 36.9 0.136
# 6 1 75.0 220. -0.838 78.6 -3.58 -0.101 0.175 40.1 0.000724
# 7 1 106. 765. 6.15 60.0 45.7 1.22 0.0845 38.4 0.0461
# 8 1 -9.89 2055. 12.3 26.5 -36.4 -0.979 0.0909 39.0 0.0319
# 9 1 96.1 215. -8.36 86.3 9.82 0.287 0.232 40.0 0.00830
# 10 1 27.2 323. 22.4 52.9 -25.7 -0.777 0.278 39.4 0.0774
# # … with 494 more rows
感谢 Bob Muenchen 的博客 获得灵感.
这篇关于使用 dplyr 拟合多个回归模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!