我需要为下图中的置信带之外的数据点着色,而不是为这些带内的数据点着色。是否应该在数据集中添加单独的列以记录数据点是否在置信带内?你能举个例子吗?

示例数据集:

## Dataset from http://www.apsnet.org/education/advancedplantpath/topics/RModules/doc1/04_Linear_regression.html

## Disease severity as a function of temperature

# Response variable, disease severity
diseasesev<-c(1.9,3.1,3.3,4.8,5.3,6.1,6.4,7.6,9.8,12.4)

# Predictor variable, (Centigrade)
temperature<-c(2,1,5,5,20,20,23,10,30,25)

## For convenience, the data may be formatted into a dataframe
severity <- as.data.frame(cbind(diseasesev,temperature))

## Fit a linear model for the data and summarize the output from function lm()
severity.lm <- lm(diseasesev~temperature,data=severity)

# Take a look at the data
plot(
  diseasesev~temperature,
  data=severity,
  xlab="Temperature",
  ylab="% Disease Severity",
  pch=16,
  pty="s",
  xlim=c(0,30),
  ylim=c(0,30)
)
title(main="Graph of % Disease Severity vs Temperature")
par(new=TRUE) # don't start a new plot

## Get datapoints predicted by best fit line and confidence bands
## at every 0.01 interval
xRange=data.frame(temperature=seq(min(temperature),max(temperature),0.01))
pred4plot <- predict(
                        lm(diseasesev~temperature),
                        xRange,
                        level=0.95,
                        interval="confidence"
                    )

## Plot lines derrived from best fit line and confidence band datapoints
matplot(
  xRange,
  pred4plot,
  lty=c(1,2,2),   #vector of line types and widths
  type="l",       #type of plot for each column of y
  xlim=c(0,30),
  ylim=c(0,30),
  xlab="",
  ylab=""
)

最佳答案

最简单的方法可能是计算TRUE/FALSE值的向量,该值指示数据点是否在置信区间内。我将重新整理一下您的示例,以便在执行绘图命令之前完成所有计算-这在程序逻辑中提供了清晰的分隔,如果您将其中的一些打包到函数中,则可以利用该逻辑。

第一部分几乎相同,除了我用lm()变量替换了对predict()中的severity.lm的附加调用-当我们已经存储了线性模型时,不需要使用其他计算资源来重新计算线性模型:

## Dataset from
#  apsnet.org/education/advancedplantpath/topics/
#    RModules/doc1/04_Linear_regression.html

## Disease severity as a function of temperature

# Response variable, disease severity
diseasesev<-c(1.9,3.1,3.3,4.8,5.3,6.1,6.4,7.6,9.8,12.4)

# Predictor variable, (Centigrade)
temperature<-c(2,1,5,5,20,20,23,10,30,25)

## For convenience, the data may be formatted into a dataframe
severity <- as.data.frame(cbind(diseasesev,temperature))

## Fit a linear model for the data and summarize the output from function lm()
severity.lm <- lm(diseasesev~temperature,data=severity)

## Get datapoints predicted by best fit line and confidence bands
## at every 0.01 interval
xRange=data.frame(temperature=seq(min(temperature),max(temperature),0.01))
pred4plot <- predict(
  severity.lm,
  xRange,
  level=0.95,
  interval="confidence"
)

现在,我们将计算原始数据点的置信区间,并运行测试以查看这些点是否在区间内:
modelConfInt <- predict(
  severity.lm,
  level = 0.95,
  interval = "confidence"
)

insideInterval <- modelConfInt[,'lwr'] < severity[['diseasesev']] &
  severity[['diseasesev']] < modelConfInt[,'upr']

然后,我们将进行绘图-首先是在示例中使用的高级绘图函数plot(),但是我们只会在区间内绘制点。然后,我们将继续使用低级函数points(),该函数将以不同的颜色绘制区间外的所有点。最后,使用matplot()来填充置信区间。但是,我不调用par(new=TRUE)而是将参数add=TRUE传递给高级函数,以使其像低级函数一样工作。

使用par(new=TRUE)就像在玩弄绘图功能的恶作剧一样,这可能会带来无法预料的后果。许多函数提供了add参数,以使它们向绘图添加信息,而不是重绘它-我建议尽可能利用此参数,并在最后一种方法上依靠par()操作。
# Take a look at the data- those points inside the interval
plot(
  diseasesev~temperature,
  data=severity[ insideInterval,],
  xlab="Temperature",
  ylab="% Disease Severity",
  pch=16,
  pty="s",
  xlim=c(0,30),
  ylim=c(0,30)
)
title(main="Graph of % Disease Severity vs Temperature")

# Add points outside the interval, color differently
points(
  diseasesev~temperature,
  pch = 16,
  col = 'red',
  data = severity[ !insideInterval,]
)

# Add regression line and confidence intervals
matplot(
  xRange,
  pred4plot,
  lty=c(1,2,2),   #vector of line types and widths
  type="l",       #type of plot for each column of y
  add = TRUE
)

关于r - 有条件地为R中的置信带之外的数据点着色,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/2687212/

10-12 16:40
查看更多