我想使用dplyr每小时拟合一个模型(因子变量),但出现错误,并且我不太确定出什么问题了。

df.h <- data.frame(
  hour     = factor(rep(1:24, each = 21)),
  price    = runif(504, min = -10, max = 125),
  wind     = runif(504, min = 0, max = 2500),
  temp     = runif(504, min = - 10, max = 25)
)

df.h <- tbl_df(df.h)
df.h <- group_by(df.h, hour)

group_size(df.h) # checks out, 21 obs. for each factor variable

# different attempts:
reg.models <- do(df.h, formula = price ~ wind + temp)

reg.models <- do(df.h, .f = lm(price ~ wind + temp, data = df.h))

我尝试了各种变体,但无法使用。

最佳答案

从2020年中期开始,tchakravarty's answer将失败。为了规避broomdpylr似乎相互作用的新方法,可以使用以下broom::tidybroom::augmentbroom::glance的用法。我们只需要在do()中使用它们,然后在unnest()中使用小标题即可。

library(dplyr)
library(broom)

df.h = data.frame(
  hour     = factor(rep(1:24, each = 21)),
  price    = runif(504, min = -10, max = 125),
  wind     = runif(504, min = 0, max = 2500),
  temp     = runif(504, min = - 10, max = 25)
)

df.h %>% group_by(hour) %>%
  do(fitHour = tidy(lm(price ~ wind + temp, data = .))) %>%
  unnest(fitHour)
# # A tibble: 72 x 6
#    hour  term        estimate std.error statistic   p.value
#    <fct> <chr>          <dbl>     <dbl>     <dbl>     <dbl>
#  1 1     (Intercept)   82.4     18.1         4.55  0.000248
#  2 1     wind         -0.0212   0.0108      -1.96  0.0655
#  3 1     temp         -1.01     0.792       -1.28  0.218
#  4 2     (Intercept)   25.9     19.7         1.31  0.206
#  5 2     wind          0.0204   0.0131       1.57  0.135
#  6 2     temp          0.680    1.01         0.670 0.511
#  7 3     (Intercept)   88.3     15.5         5.69  0.0000214
#  8 3     wind         -0.0188   0.00998     -1.89  0.0754
#  9 3     temp         -0.669    0.653       -1.02  0.319
# 10 4     (Intercept)   73.4     14.2         5.17  0.0000639

df.h %>% group_by(hour) %>%
  do(fitHour = augment(lm(price ~ wind + temp, data = .))) %>%
  unnest(fitHour)
# # A tibble: 24 x 13
#    hour  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance
#    <fct>     <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>
#  1 1        0.246        0.162    39.0     2.93   0.0790     2  -105.  218.  222.   27334.
#  2 2        0.161        0.0674   43.5     1.72   0.207      2  -107.  223.  227.   34029.
#  3 3        0.192        0.102    33.9     2.14   0.147      2  -102.  212.  217.   20739.
#  4 4        0.0960      -0.00445  34.3     0.956  0.403      2  -102.  213.  217.   21169.
#  5 5        0.230        0.144    31.7     2.68   0.0955     2  -101.  210.  214.   18088.
#  6 6        0.0190      -0.0900   39.8     0.174  0.842      2  -106.  219.  223.   28507.
#  7 7        0.0129      -0.0967   37.1     0.118  0.889      2  -104.  216.  220.   24801.
#  8 8        0.197        0.108    35.3     2.21   0.139      2  -103.  214.  218.   22438.
#  9 9        0.0429      -0.0634   39.4     0.403  0.674      2  -105.  219.  223.   27918.
# 10 10       0.0943      -0.00633  35.6     0.937  0.410      2  -103.  214.  219.   22854.
# # … with 14 more rows, and 2 more variables: df.residual <int>, nobs <int>

df.h %>% group_by(hour) %>%
  do(fitHour = glance(lm(price ~ wind + temp, data = .))) %>%
  unnest(fitHour)
# # A tibble: 504 x 10
#    hour   price  wind   temp .fitted .resid .std.resid   .hat .sigma  .cooksd
#    <fct>  <dbl> <dbl>  <dbl>   <dbl>  <dbl>      <dbl>  <dbl>  <dbl>    <dbl>
#  1 1      94.2   883. -6.64     70.4  23.7       0.652 0.129    39.6 0.0209
#  2 1      19.3  2107.  2.40     35.4 -16.0      -0.431 0.0864   39.9 0.00584
#  3 1      60.5  2161. 18.3      18.1  42.5       1.18  0.146    38.5 0.0795
#  4 1     116.   1244. 12.0      44.0  71.9       1.91  0.0690   35.8 0.0902
#  5 1     117.   1624. -8.05     56.1  60.6       1.67  0.128    36.9 0.136
#  6 1      75.0   220. -0.838    78.6  -3.58     -0.101 0.175    40.1 0.000724
#  7 1     106.    765.  6.15     60.0  45.7       1.22  0.0845   38.4 0.0461
#  8 1      -9.89 2055. 12.3      26.5 -36.4      -0.979 0.0909   39.0 0.0319
#  9 1      96.1   215. -8.36     86.3   9.82      0.287 0.232    40.0 0.00830
# 10 1      27.2   323. 22.4      52.9 -25.7      -0.777 0.278    39.4 0.0774
# # … with 494 more rows
感谢Bob Muenchen's Blog的启发。

关于r - 用dplyr拟合多个回归模型,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/22713325/

10-09 17:08