本文介绍了ggplot2 geom_smooth,方法= lm的扩展模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用geom_smooth从某个线性回归模型中获得一条拟合线.

I would like to use geom_smooth to get a fitted line from a certain linear regression model.

在我看来,该公式只能使用xy,而不能使用任何其他参数.

It seems to me that the formula can only take x and y and not any additional parameter.

更清楚地显示我想要什么:

To show more clearly what I want:

library(dplyr)
library(ggplot2)
set.seed(35413)
df <- data.frame(pred = runif(100,10,100),
           factor = sample(c("A","B"), 100, replace = TRUE)) %>%
  mutate(
    outcome = 100 + 10*pred +
    ifelse(factor=="B", 200, 0) +
    ifelse(factor=="B", 4, 0)*pred +
    rnorm(100,0,60))

使用

ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm") +
  theme_bw()

由于color=factor选项,我生成的拟合线基本上是线性模型lm(outcome ~ pred*factor, df)

I produce fitted lines that, due to the color=factor option, are basically the output of the linear model lm(outcome ~ pred*factor, df)

但是,在某些情况下,我希望这些行是不同模型拟合的输出,例如lm(outcome ~ pred + factor, df),为此,我可以使用类似以下内容的东西:

In some cases, however, I prefer the lines to be the output of a different model fit, like lm(outcome ~ pred + factor, df), for which I can use something like:

fit <- lm(outcome ~ pred+factor, df)
predval <- expand.grid(
  pred = seq(
    min(df$pred), max(df$pred), length.out = 1000),
  factor = unique(df$factor)) %>%
  mutate(outcome = predict(fit, newdata = .))

ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point() +
  geom_line(data = predval) +
  theme_bw()

结果为:

我的问题:有没有办法利用geom_smooth来生成后一个图?我知道在geom_smooth中有一个formula =-选项,但是我无法使formula = y ~ x + factorformula = y ~ x + color(我定义为color = factor)这样的东西起作用.

My question: is there a way to produce the latter graph exploiting the geom_smooth instead? I know there is a formula = - option in geom_smooth but I can't make something like formula = y ~ x + factor or formula = y ~ x + color (as I defined color = factor) work.

推荐答案

这是一个非常有趣的问题. geom_smooth如此难以"允许使用多个变量的自定义模型的主要原因可能是它仅限于生成二维曲线.因此,其参数被设计用于处理二维数据(即,公式=响应变量〜自变量).

This is a very interesting question. Probably the main reason why geom_smooth is so "resistant" to allowing custom models of multiple variables is that it is limited to producing 2-D curves; consequently, its arguments are designed for handling two-dimensional data (i.e. formula = response variable ~ independent variable).

获取请求内容的技巧是使用geom_smooth中的mapping参数而不是formula.从查看文档可能已经看到,formula仅允许您可以指定模型的数学结构(例如线性,二次等).相反,使用mapping参数可以直接指定新的y值-例如可以使用predict()调用的自定义线性模型的输出.

The trick to getting what you requested is using the mapping argument within geom_smooth, instead of formula. As you've probably seen from looking at the documentation, formula only allows you to specify the mathematical structure of the model (e.g. linear, quadratic, etc.). Conversely, the mapping argument allows you to directly specify new y-values - such as the output of a custom linear model that you can call using predict().

请注意,默认情况下,inherit.aes设置为TRUE,因此您绘制的回归将由类别变量适当地着色.这是代码:

Note that, by default, inherit.aes is set to TRUE, so your plotted regressions will be coloured appropriately by your categorical variable. Here's the code:

# original plot
plot1 <- ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm") +
  ggtitle("outcome ~ pred") +
  theme_bw()

# declare new model here
plm <- lm(formula = outcome ~ pred + factor, data=df)

# plot with lm for outcome ~ pred + factor
plot2 <-ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm", mapping=aes(y=predict(plm,df))) +
  ggtitle("outcome ~ pred + factor") +
  theme_bw()

这篇关于ggplot2 geom_smooth,方法= lm的扩展模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 03:57