问题描述
我需要为下图中的置信带之外的数据点着色,而不是为这些带内的数据点着色.是否应该在数据集中添加单独的列以记录数据点是否在置信带内?你能举个例子吗?
I need to colour datapoints that are outside of the the confidence bands on the plot below differently from those within the bands. Should I add a separate column to my dataset to record whether the data points are within the confidence bands? Can you provide an example please?
## Dataset from http://www.apsnet.org/education/advancedplantpath/topics/RModules/doc1/04_Linear_regression.html
## Disease severity as a function of temperature
# Response variable, disease severity
diseasesev<-c(1.9,3.1,3.3,4.8,5.3,6.1,6.4,7.6,9.8,12.4)
# Predictor variable, (Centigrade)
temperature<-c(2,1,5,5,20,20,23,10,30,25)
## For convenience, the data may be formatted into a dataframe
severity <- as.data.frame(cbind(diseasesev,temperature))
## Fit a linear model for the data and summarize the output from function lm()
severity.lm <- lm(diseasesev~temperature,data=severity)
# Take a look at the data
plot(
diseasesev~temperature,
data=severity,
xlab="Temperature",
ylab="% Disease Severity",
pch=16,
pty="s",
xlim=c(0,30),
ylim=c(0,30)
)
title(main="Graph of % Disease Severity vs Temperature")
par(new=TRUE) # don't start a new plot
## Get datapoints predicted by best fit line and confidence bands
## at every 0.01 interval
xRange=data.frame(temperature=seq(min(temperature),max(temperature),0.01))
pred4plot <- predict(
lm(diseasesev~temperature),
xRange,
level=0.95,
interval="confidence"
)
## Plot lines derrived from best fit line and confidence band datapoints
matplot(
xRange,
pred4plot,
lty=c(1,2,2), #vector of line types and widths
type="l", #type of plot for each column of y
xlim=c(0,30),
ylim=c(0,30),
xlab="",
ylab=""
)
推荐答案
最简单的方法可能是计算一个TRUE/FALSE
值的向量,该向量指示数据点是否在置信区间内.我将重新整理一下您的示例,以便在执行绘图命令之前完成所有计算-这提供了程序逻辑中的清晰分隔,如果您将其中的一些打包到函数中,则可以利用该逻辑
The easiest way is probably to calculate a vector of TRUE/FALSE
values that indicate if a data point is inside of the confidence interval or not. I'm going to reshuffle your example a little bit so that all of the calculations are completed before the plotting commands are executed- this provides a clean separation in the program logic that could be exploited if you were to package some of this into a function.
第一部分几乎相同,除了我用severity.lm
变量替换了predict()
中对lm()
的附加调用-无需使用额外的计算资源来重新计算线性模型将其存储:
The first part is pretty much the same, except I replaced the additional call to lm()
inside predict()
with the severity.lm
variable- there is no need to use additional computing resources to recalculate the linear model when we already have it stored:
## Dataset from
# apsnet.org/education/advancedplantpath/topics/
# RModules/doc1/04_Linear_regression.html
## Disease severity as a function of temperature
# Response variable, disease severity
diseasesev<-c(1.9,3.1,3.3,4.8,5.3,6.1,6.4,7.6,9.8,12.4)
# Predictor variable, (Centigrade)
temperature<-c(2,1,5,5,20,20,23,10,30,25)
## For convenience, the data may be formatted into a dataframe
severity <- as.data.frame(cbind(diseasesev,temperature))
## Fit a linear model for the data and summarize the output from function lm()
severity.lm <- lm(diseasesev~temperature,data=severity)
## Get datapoints predicted by best fit line and confidence bands
## at every 0.01 interval
xRange=data.frame(temperature=seq(min(temperature),max(temperature),0.01))
pred4plot <- predict(
severity.lm,
xRange,
level=0.95,
interval="confidence"
)
现在,我们将计算原始数据点的置信区间,并进行测试以查看这些点是否在区间内:
Now, we'll calculate the confidence intervals for the origional data points and run a test to see if the points are inside the interval:
modelConfInt <- predict(
severity.lm,
level = 0.95,
interval = "confidence"
)
insideInterval <- modelConfInt[,'lwr'] < severity[['diseasesev']] &
severity[['diseasesev']] < modelConfInt[,'upr']
然后,我们将进行绘图-首先是您在示例中使用的高级绘图功能plot()
,但我们只会在区间内绘制点.然后,我们将继续使用低级函数points()
,该函数将以不同的颜色绘制间隔之外的所有点.最后,matplot()
将用于填充您使用的置信区间.但是,我不想将add=TRUE
传递给高级函数,而不是调用par(new=TRUE)
,以使它们像低级函数一样起作用.
Then we'll do the plot- first a the high-level plotting function plot()
, as you used it in your example, but we will only plot the points inside the interval. We will then follow up with the low-level function points()
which will plot all the points outside the interval in a different color. Finally, matplot()
will be used to fill in the confidence intervals as you used it. However instead of calling par(new=TRUE)
I prefer to pass the argument add=TRUE
to high-level functions to make them act like low level functions.
使用par(new=TRUE)
就像在使用绘图功能玩弄肮脏的戏一样,这可能会带来无法预料的后果.许多功能提供了add
参数,以使它们向绘图添加信息而不是重绘-我建议尽可能利用此参数,并在最后的方法上使用par()
操作.
Using par(new=TRUE)
is like playing a dirty trick a plotting function- which can have unforeseen consequences. The add
argument is provided by many functions to cause them to add information to a plot rather than redraw it- I would recommend exploiting this argument whenever possible and fall back on par()
manipulations as a last resort.
# Take a look at the data- those points inside the interval
plot(
diseasesev~temperature,
data=severity[ insideInterval,],
xlab="Temperature",
ylab="% Disease Severity",
pch=16,
pty="s",
xlim=c(0,30),
ylim=c(0,30)
)
title(main="Graph of % Disease Severity vs Temperature")
# Add points outside the interval, color differently
points(
diseasesev~temperature,
pch = 16,
col = 'red',
data = severity[ !insideInterval,]
)
# Add regression line and confidence intervals
matplot(
xRange,
pred4plot,
lty=c(1,2,2), #vector of line types and widths
type="l", #type of plot for each column of y
add = TRUE
)
这篇关于有条件地为R中的置信带之外的数据点着色的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!