问题描述
我想用geom_smooth
从某个线性回归模型中获得一条拟合线.
I would like to use geom_smooth
to get a fitted line from a certain linear regression model.
在我看来,该公式只能使用x
和y
,而不能使用任何其他参数.
It seems to me that the formula can only take x
and y
and not any additional parameter.
更清楚地显示我想要什么:
To show more clearly what I want:
library(dplyr)
library(ggplot2)
set.seed(35413)
df <- data.frame(pred = runif(100,10,100),
factor = sample(c("A","B"), 100, replace = TRUE)) %>%
mutate(
outcome = 100 + 10*pred +
ifelse(factor=="B", 200, 0) +
ifelse(factor=="B", 4, 0)*pred +
rnorm(100,0,60))
使用
ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point(aes(color=factor)) +
geom_smooth(method = "lm") +
theme_bw()
由于color=factor
选项,我生成的拟合线基本上是线性模型lm(outcome ~ pred*factor, df)
I produce fitted lines that, due to the color=factor
option, are basically the output of the linear model lm(outcome ~ pred*factor, df)
但是,在某些情况下,我希望这些行是不同模型拟合的输出,例如lm(outcome ~ pred + factor, df)
,为此,我可以使用类似以下内容的东西:
In some cases, however, I prefer the lines to be the output of a different model fit, like lm(outcome ~ pred + factor, df)
, for which I can use something like:
fit <- lm(outcome ~ pred+factor, df)
predval <- expand.grid(
pred = seq(
min(df$pred), max(df$pred), length.out = 1000),
factor = unique(df$factor)) %>%
mutate(outcome = predict(fit, newdata = .))
ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point() +
geom_line(data = predval) +
theme_bw()
结果为:
我的问题:有没有办法利用geom_smooth
来生成后一个图?我知道在geom_smooth
中有一个formula =
-选项,但是我无法使formula = y ~ x + factor
或formula = y ~ x + color
(我定义为color = factor
)这样的东西起作用.
My question: is there a way to produce the latter graph exploiting the geom_smooth
instead? I know there is a formula =
- option in geom_smooth
but I can't make something like formula = y ~ x + factor
or formula = y ~ x + color
(as I defined color = factor
) work.
推荐答案
这是一个非常有趣的问题. geom_smooth
如此难以"允许使用多个变量的自定义模型的主要原因可能是它仅限于生成二维曲线.因此,其参数被设计用于处理二维数据(即,公式=响应变量〜自变量).
This is a very interesting question. Probably the main reason why geom_smooth
is so "resistant" to allowing custom models of multiple variables is that it is limited to producing 2-D curves; consequently, its arguments are designed for handling two-dimensional data (i.e. formula = response variable ~ independent variable).
获取请求内容的技巧是使用geom_smooth
中的mapping
参数而不是formula
.从查看文档可能已经看到,formula
仅允许您可以指定模型的数学结构(例如线性,二次等).相反,使用mapping
参数可以直接指定新的y值-例如可以使用predict()
调用的自定义线性模型的输出.
The trick to getting what you requested is using the mapping
argument within geom_smooth
, instead of formula
. As you've probably seen from looking at the documentation, formula
only allows you to specify the mathematical structure of the model (e.g. linear, quadratic, etc.). Conversely, the mapping
argument allows you to directly specify new y-values - such as the output of a custom linear model that you can call using predict()
.
请注意,默认情况下,inherit.aes
设置为TRUE
,因此您绘制的回归将由类别变量适当地着色.这是代码:
Note that, by default, inherit.aes
is set to TRUE
, so your plotted regressions will be coloured appropriately by your categorical variable. Here's the code:
# original plot
plot1 <- ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point(aes(color=factor)) +
geom_smooth(method = "lm") +
ggtitle("outcome ~ pred") +
theme_bw()
# declare new model here
plm <- lm(formula = outcome ~ pred + factor, data=df)
# plot with lm for outcome ~ pred + factor
plot2 <-ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point(aes(color=factor)) +
geom_smooth(method = "lm", mapping=aes(y=predict(plm,df))) +
ggtitle("outcome ~ pred + factor") +
theme_bw()
这篇关于ggplot2 geom_smooth,方法= lm的扩展模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!