本文介绍了显示LOESS回归线和置信区间的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在尝试将LOESS回归与数据集竞争时遇到一些问题.我已经能够正确地创建线条,但是我无法正确绘制它.

I am having some issues trying to compete a LOESS regression with a data set. I have been able to properly create the line, but I am unable to get it to plot correctly.

我浏览了这样的数据.

I ran through the data like this.

animals.lo <- loess(X15p5 ~ Period, animals, weights = n.15p5)
animals.lo
summary(animals.lo)
plot(X15p5~ Period, animals)
lines(animals$X15p5, animals.lo, col="red")

这时我收到了一个错误

我四处搜寻,并发现此问题可能是由于需要订购积分而引起的,所以我继续进行.

I searched around and read that this issue could be due to the points needing to be ordered, so I proceeded.

a <- order(animals$Period)
lines(animals$X15p5[a], animals.lo$Period[a], col="red", lwd=3)

这时没有错误,但是LOESS线仍未显示在图中.点显示正确,但线显示不正确.

There were no errors at this point, but the LOESS line was still not showing up in the plot. The points were displayed correctly, but not the line.

这类似于我正在使用的数据集...

This is similar to the data set I am using...

structure(list(Site = c("Cat", "Dog", "Bear", "Chicken", "Cow",
"Bird", "Tiger", "Lion", "Leopard", "Wolf", "Puppy", "Kitten",
"Emu", "Ostrich", "Elephant", "Sheep", "Goat", "Fish", "Iguana",
"Monkey", "Gorilla", "Baboon", "Lemming", "Mouse", "Rat", "Hamster",
"Eagle", "Parrot", "Crow", "Dove", "Falcon", "Hawk", "Sparrow",
"Kite", "Chimpanzee", "Giraffe", "Bear", "Donkey", "Mule", "Horse",
"Zebra", "Ox", "Snake", "Cobra", "Iguana", "Lizard", "Fly", "Mosquito",
"Llama", "Butterfly", "Moth", "Worm", "Centipede", "Unicorn",
"Pegasus", "Griffin", "Ogre", "Monster", "Demon", "Witch", "Vampire",
"Mummy", "Ghoul", "Zombie"), Region = c(6L, 4L, 4L, 5L, 7L, 6L,
2L, 4L, 6L, 7L, 7L, 4L, 6L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 4L, 6L, 6L,
4L, 2L, 7L, 4L, 2L, 2L, 7L, 3L, 4L, 7L, 4L, 4L, 4L, 7L, 7L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 8L), Period = c(-2715, -3500,
-3500, -4933.333333, -2715, -2715, -2715, -3500, -2715, -4350,
-3500, -3500, -2950, -4350, -3650, -3500, -3500, -2715, -3650,
-4350, -3500, -3500, -3400, -4350, -3500, -3500, -4350, -3900,
-3808.333333, -4233.333333, -3500, -3900, -3958.333333, -3900,
-3500, -3500, -3500, -2715, -3650, -2715, -2715, -2715, -2715,
-3500, -2715, -2715, -3500, -4350, -3650, -3650, -4350, -5400,
-3500, -3958.333333, -3400, -3400, -4350, -3600, -4350, -3650,
-3500, -2715, -5400, -3500), Value = c(0.132625995, 0.163120567,
0.228840125, 0.154931973, 0.110047847, 0.054347826, 0.188679245,
0.245014245, 0.128378378, 0.021428571, 0.226277372, 0.176923077,
0.104938272, 0.17659805, 0.143798024, 0.086956522, 0.0625, 0.160714286,
0, 0.235588972, 0, 0, 0.208333333, 0.202247191, 0.364705882,
0.174757282, 0, 0.4, 0.1, 0.184027778, 0.232876712, 0.160493827,
0.74702381, 0.126984127, 0.080645161, 0.06557377, 0, 0.057692308,
0.285714286, 0.489361702, 0.108695652, 0.377777778, 0, 0.522727273,
0.024390244, 0.097560976, 0.275, 0, 0.0625, 0.255319149, 0.135135135,
0.216216216, 0.222222222, 0.296296296, 0.222222222, 0.146341463,
0.09375, 0.125, 0.041666667, 0.078947368, 0.2, 0.137931034, 0.571428571,
0.142857143), Sample_size = c(188.5, 105.75, 79.75, 70, 52.25,
46, 39.75, 39, 37, 35, 34.25, 32.5, 32.4, 30.76666667, 30.36666667,
28.75, 28, 28, 28, 26.6, 25, 25, 24, 22.25, 21.25, 20.6, 20,
20, 20, 19.2, 18.25, 18, 18, 16.8, 15.5, 15.25, 15, 13, 12.6,
11.75, 11.5, 11.25, 11, 11, 10.25, 10.25, 10, 10, 9.6, 9.4, 9.25,
9.25, 9, 9, 9, 8.2, 8, 8, 8, 7.6, 7.5, 7.25, 7, 7), Sample_sub = c(25,
17.25, 18.25, 10.8452381, 5.75, 2.5, 7.5, 9.555555556, 4.75,
0.75, 7.75, 5.75, 3.4, 5.433333333, 4.366666667, 2.5, 1.75, 4.5,
0, 6.266666667, 0, 0, 5, 4.5, 7.75, 3.6, 0, 8, 2, 3.533333333,
4.25, 2.888888889, 13.44642857, 2.133333333, 1.25, 1, 0, 0.75,
3.6, 5.75, 1.25, 4.25, 0, 5.75, 0.25, 1, 2.75, 0, 0.6, 2.4, 1.25,
2, 2, 2.666666667, 2, 1.2, 0.75, 1, 0.333333333, 0.6, 1.5, 1,
4, 1)), .Names = c("Site", "Region", "Period", "Value", "Sample_size",
"Sample_sub"), class = "data.frame", row.names = c(NA, -64L))

我已经为此工作了一段时间,并尝试尽可能多地阅读,但是我没有取得任何其他进展.任何建议或指导将不胜感激.

I have been working for this a while and trying to read up as much as I can, but I haven't been able to make any additional headway. Any advice or guidance would be greatly appreciated.

在为绘图添加置信区间的后续操作

我一直试图按照此页面上网站上的另一个示例添加置信区间.

I have been trying to add in confidence intervals following another example found on the site on this page How to get the confidence intervals for LOWESS fit using R? .

该页面上的示例为:

plot(cars)
plx<-predict(loess(cars$dist ~ cars$speed), se=T)

lines(cars$speed,plx$fit)
lines(cars$speed,plx$fit - qt(0.975,plx$df)*plx$se, lty=2)
lines(cars$speed,plx$fit + qt(0.975,plx$df)*plx$se, lty=2)

我对此进行了修改:

plot(X15p5 ~ Period, animals)
animals.lo2<-predict(loess(animals$X15p5 ~ animals$Period), se=T)
a <- order(animals$Period)
lines(animals$Period[a],animals.lo2$fit, col="red", lwd=3)
lines(animals$Period[a],animals.lo2$fit - qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)
lines(animals$Period[a],animals.lo2$fit + qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)

尽管这确实提供了置信区间,但是回归线是完全错误的.我不确定这是predict函数的问题还是其他问题.再次感谢!

Although this does provide confidence intervals, the regression line is all wrong. I'm not sure if it is an issue with the predict function, or another issue. Thanks again!

推荐答案

正确的代码

不,不.订购问题与您看到的错误无关.要克服该错误,您需要更换

No, no. The ordering issue is not related to the error you see. To overcome the error, You need to replace

lines(animals$X15p5, animals.lo, col="red")

使用

lines(animals$Period, animals.lo$fitted, col="red")

原因如下:

  1. loess返回对象列表,而不是单个向量.请参见str(animals.lo)names(animals.lo).
  2. 为什么将animals$X15p5用作x轴?您适合您的模型:X15p5 ~ Period,因此x轴应为Period.
  1. loess returns a list of objects, not a single vector. See str(animals.lo) or names(animals.lo).
  2. why do you use animals$X15p5 as x-axis? You fit your model: X15p5 ~ Period, so x-axis should be Period.

关于重新排序

您需要进行排序,因为默认情况下,R按顺序排列点.以此为例:

You need to do ordering, because by default, R lines up points in order. Take this as an example:

set.seed(0); x <- runif(100, 0, 10)  ## x is not in order
set.seed(1); y <- sqrt(x)  ## plot curve y = sqrt(x)
par(mfrow = c(1,2))
plot(x, y, type = "l")  ## this is a mess!!
reorder <- order(x)
plot(x[reorder], y[reorder], type = "l")  ## this is nice

类似地,做:

a <- order(animals$Period)
lines(animals$Period[a], animals.lo$fitted[a], col="red", lwd=3)


关注置信区间

尝试一下:

plot(X15p5 ~ Period, animals)
animals.lo <- loess(X15p5 ~ Period, animals)
pred <- predict(animals.lo, se = TRUE)
a <- order(animals$Period)
lines(animals$Period[a], pred$fit[a], col="red", lwd=3)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)

您忘了重新订购.您需要重新调整拟合值和标准误差.

You forgot about reordering again. You need to reorder both fitted values, as well as standard errors.

现在,cars数据的dist ~ speed模型无需重新排序.因为:

Now, the dist ~ speed model for cars data has no need for reordering. Because:

is.unsorted(cars$speed)  ## FALSE

是的,数据已经在那里排序了.

Yes, data are already sorted there.

请注意,我对您的代码进行了另外两项更改:

Note I have made two other changes to your code:

  1. 我将loess呼叫和predict呼叫分开了;也许您不需要这样做,但是将模型拟合和模型预测分离开来并保留两个对象的副本通常是一个好习惯.
  2. 我已将loess(animals$X15p5 ~ animals$Period)更改为loess(X15p5 ~ Period, animals).在指定模型公式时使用$符号是一个坏习惯.我在 https://stackoverflow.com/a/37307270/4891738 中有另一个答案,显示了这种样式的缺点.您可以在那儿的更新"部分中阅读.我以glm为例,但是对于lmglmloess,情况是相同的.
  1. I have separated loess call and predict call; Maybe you don't need to do this, but it is generally a good habit to separate model fitting and model prediction, and keeps a copy of both objects.
  2. I have changed loess(animals$X15p5 ~ animals$Period) to loess(X15p5 ~ Period, animals). It is a bad habit to use $ sign in specifying model formula. I have another answer at https://stackoverflow.com/a/37307270/4891738 showing the draw back of such style. You can read on the "update" section over there. I have used the glm as an example, but for lm, glm, loess, things are the same.

这篇关于显示LOESS回归线和置信区间的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 17:17