r - ggplot2:如何为geom_smooth中的预测获取鲁棒的置信区间？

考虑这个简单的例子

dataframe <- data_frame(x = c(1,2,3,4,5,6),
                        y = c(12,24,24,34,12,15))
> dataframe
# A tibble: 6 x 2
      x     y
  <dbl> <dbl>
1     1    12
2     2    24
3     3    24
4     4    34
5     5    12
6     6    15

dataframe %>% ggplot(., aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = 'lm', formula = y~x)

此处，标准误差是使用默认选项计算的。但是，我想使用软件包sandwich和lmtest中可用的健壮方差-协方差矩阵

也就是说，使用vcovHC(mymodel, "HC3")

有没有一种方法可以使用geom_smooth()函数以简单的方式获得它？

r - ggplot2:如何为geom_smooth中的预测获取鲁棒的置信区间？-LMLPHP

最佳答案

HC健壮的SE（简单）

现在，借助于estimatr包及其lm_robust函数系列，可以轻松完成此操作。例如。

library(tidyverse)
library(estimatr)

dataframe <- data.frame(x = c(1,2,3,4,5,6),
                        y = c(12,24,24,34,12,15))

dataframe %>%
  ggplot(aes(x = x, y = y)) +
  geom_point() +
  geom_smooth(method = 'lm_robust', formula = y~x, fill="#E41A1C") + ## Robust (HC) SEs
  geom_smooth(method = 'lm', formula = y~x, col = "grey50") + ## Just for comparison
  labs(
    title = "Plotting HC robust SEs in ggplot2",
    subtitle = "Regular SEs in grey for comparison"
    ) +
  theme_minimal()

reprex package（v0.3.0）创建于2020-03-08

HAC强大的SE（更多的腿法）

一个警告是，估计does not仍然为Newey-West提供了HAC（即异方差和自相关一致）的支持。但是，可以通过三明治包装手动获取这些……这还是原始问题一直在问的那种。然后可以使用geom_ribbon()绘制它们。

作为记录，我会说HAC SE对这个特定数据集没有多大意义。但是，这里有一个示例，说明如何在相关主题上给出this excellent SO答案。

library(tidyverse)
library(sandwich)

dataframe <- data.frame(x = c(1,2,3,4,5,6),
                        y = c(12,24,24,34,12,15))

reg1 <- lm(y~x, data = dataframe)

## Generate a prediction DF
pred_df <- data.frame(fit = predict(reg1))

## Get the design matrix
X_mat <- model.matrix(reg1)

## Get HAC VCOV matrix and calculate SEs
v_hac <- NeweyWest(reg1, prewhite = FALSE, adjust = TRUE) ## HAC VCOV (adjusted for small data sample)
#> Warning in meatHAC(x, order.by = order.by, prewhite = prewhite, weights =
#> weights, : more weights than observations, only first n used
var_fit_hac <- rowSums((X_mat %*% v_hac) * X_mat)  ## Point-wise variance for predicted mean
se_fit_hac <- sqrt(var_fit_hac) ## SEs

## Add these to pred_df and calculate the 95% CI
pred_df <-
  pred_df %>%
  mutate(se_fit_hac = se_fit_hac) %>%
  mutate(
    lwr_hac = fit - qt(0.975, df=reg1$df.residual)*se_fit_hac,
    upr_hac = fit + qt(0.975, df=reg1$df.residual)*se_fit_hac
    )

pred_df
#>        fit se_fit_hac   lwr_hac  upr_hac
#> 1 20.95238   4.250961  9.149822 32.75494
#> 2 20.63810   2.945392 12.460377 28.81581
#> 3 20.32381   1.986900 14.807291 25.84033
#> 4 20.00952   1.971797 14.534936 25.48411
#> 5 19.69524   2.914785 11.602497 27.78798
#> 6 19.38095   4.215654  7.676421 31.08548

## Plot it
bind_cols(
  dataframe,
  pred_df
  ) %>%
  ggplot(aes(x = x, y = y, ymin=lwr_hac, ymax=upr_hac)) +
  geom_point() +
  geom_ribbon(fill="#E41A1C", alpha=0.3, col=NA) + ## Robust (HAC) SEs
  geom_smooth(method = 'lm', formula = y~x, col = "grey50") + ## Just for comparison
  labs(
    title = "Plotting HAC SEs in ggplot2",
    subtitle = "Regular SEs in grey for comparison",
    caption = "Note: Do HAC SEs make sense for this dataset? Definitely not!"
    ) +
  theme_minimal()

reprex package（v0.3.0）创建于2020-03-08

请注意，如果您愿意，也可以使用此方法手动计算和绘制其他鲁棒的SE预测（例如HC1，HC2等）。您所需要做的就是使用相关的三明治估算器。例如，使用vcovHC(reg1, type = "HC2")代替NeweyWest(reg1, prewhite = FALSE, adjust = TRUE)将为您提供与使用estimatr软件包的第一个示例相同的HC鲁棒CI。

关于r - ggplot2:如何为geom_smooth中的预测获取鲁棒的置信区间？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/45313482/

Se

r - ggplot2:如何为geom_smooth中的预测获取鲁棒的置信区间？