我想使用dplyr每小时拟合一个模型(因子变量),但出现错误,并且我不太确定出什么问题了。
df.h <- data.frame(
hour = factor(rep(1:24, each = 21)),
price = runif(504, min = -10, max = 125),
wind = runif(504, min = 0, max = 2500),
temp = runif(504, min = - 10, max = 25)
)
df.h <- tbl_df(df.h)
df.h <- group_by(df.h, hour)
group_size(df.h) # checks out, 21 obs. for each factor variable
# different attempts:
reg.models <- do(df.h, formula = price ~ wind + temp)
reg.models <- do(df.h, .f = lm(price ~ wind + temp, data = df.h))
我尝试了各种变体,但无法使用。
最佳答案
从2020年中期开始,tchakravarty's answer将失败。为了规避broom
和dpylr
似乎相互作用的新方法,可以使用以下broom::tidy
,broom::augment
和broom::glance
的用法。我们只需要在do()
中使用它们,然后在unnest()
中使用小标题即可。
library(dplyr)
library(broom)
df.h = data.frame(
hour = factor(rep(1:24, each = 21)),
price = runif(504, min = -10, max = 125),
wind = runif(504, min = 0, max = 2500),
temp = runif(504, min = - 10, max = 25)
)
df.h %>% group_by(hour) %>%
do(fitHour = tidy(lm(price ~ wind + temp, data = .))) %>%
unnest(fitHour)
# # A tibble: 72 x 6
# hour term estimate std.error statistic p.value
# <fct> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 (Intercept) 82.4 18.1 4.55 0.000248
# 2 1 wind -0.0212 0.0108 -1.96 0.0655
# 3 1 temp -1.01 0.792 -1.28 0.218
# 4 2 (Intercept) 25.9 19.7 1.31 0.206
# 5 2 wind 0.0204 0.0131 1.57 0.135
# 6 2 temp 0.680 1.01 0.670 0.511
# 7 3 (Intercept) 88.3 15.5 5.69 0.0000214
# 8 3 wind -0.0188 0.00998 -1.89 0.0754
# 9 3 temp -0.669 0.653 -1.02 0.319
# 10 4 (Intercept) 73.4 14.2 5.17 0.0000639
df.h %>% group_by(hour) %>%
do(fitHour = augment(lm(price ~ wind + temp, data = .))) %>%
unnest(fitHour)
# # A tibble: 24 x 13
# hour r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 0.246 0.162 39.0 2.93 0.0790 2 -105. 218. 222. 27334.
# 2 2 0.161 0.0674 43.5 1.72 0.207 2 -107. 223. 227. 34029.
# 3 3 0.192 0.102 33.9 2.14 0.147 2 -102. 212. 217. 20739.
# 4 4 0.0960 -0.00445 34.3 0.956 0.403 2 -102. 213. 217. 21169.
# 5 5 0.230 0.144 31.7 2.68 0.0955 2 -101. 210. 214. 18088.
# 6 6 0.0190 -0.0900 39.8 0.174 0.842 2 -106. 219. 223. 28507.
# 7 7 0.0129 -0.0967 37.1 0.118 0.889 2 -104. 216. 220. 24801.
# 8 8 0.197 0.108 35.3 2.21 0.139 2 -103. 214. 218. 22438.
# 9 9 0.0429 -0.0634 39.4 0.403 0.674 2 -105. 219. 223. 27918.
# 10 10 0.0943 -0.00633 35.6 0.937 0.410 2 -103. 214. 219. 22854.
# # … with 14 more rows, and 2 more variables: df.residual <int>, nobs <int>
df.h %>% group_by(hour) %>%
do(fitHour = glance(lm(price ~ wind + temp, data = .))) %>%
unnest(fitHour)
# # A tibble: 504 x 10
# hour price wind temp .fitted .resid .std.resid .hat .sigma .cooksd
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 94.2 883. -6.64 70.4 23.7 0.652 0.129 39.6 0.0209
# 2 1 19.3 2107. 2.40 35.4 -16.0 -0.431 0.0864 39.9 0.00584
# 3 1 60.5 2161. 18.3 18.1 42.5 1.18 0.146 38.5 0.0795
# 4 1 116. 1244. 12.0 44.0 71.9 1.91 0.0690 35.8 0.0902
# 5 1 117. 1624. -8.05 56.1 60.6 1.67 0.128 36.9 0.136
# 6 1 75.0 220. -0.838 78.6 -3.58 -0.101 0.175 40.1 0.000724
# 7 1 106. 765. 6.15 60.0 45.7 1.22 0.0845 38.4 0.0461
# 8 1 -9.89 2055. 12.3 26.5 -36.4 -0.979 0.0909 39.0 0.0319
# 9 1 96.1 215. -8.36 86.3 9.82 0.287 0.232 40.0 0.00830
# 10 1 27.2 323. 22.4 52.9 -25.7 -0.777 0.278 39.4 0.0774
# # … with 494 more rows
感谢Bob Muenchen's Blog的启发。关于r - 用dplyr拟合多个回归模型,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/22713325/