问题描述
我在尝试将LOESS回归与数据集竞争时遇到一些问题.我已经能够正确地创建线条,但是我无法正确绘制它.
I am having some issues trying to compete a LOESS regression with a data set. I have been able to properly create the line, but I am unable to get it to plot correctly.
我浏览了这样的数据.
I ran through the data like this.
animals.lo <- loess(X15p5 ~ Period, animals, weights = n.15p5)
animals.lo
summary(animals.lo)
plot(X15p5~ Period, animals)
lines(animals$X15p5, animals.lo, col="red")
这时我收到了一个错误
我四处搜寻,并发现此问题可能是由于需要订购积分而引起的,所以我继续进行.
I searched around and read that this issue could be due to the points needing to be ordered, so I proceeded.
a <- order(animals$Period)
lines(animals$X15p5[a], animals.lo$Period[a], col="red", lwd=3)
这时没有错误,但是LOESS线仍未显示在图中.点显示正确,但线显示不正确.
There were no errors at this point, but the LOESS line was still not showing up in the plot. The points were displayed correctly, but not the line.
这类似于我正在使用的数据集...
This is similar to the data set I am using...
structure(list(Site = c("Cat", "Dog", "Bear", "Chicken", "Cow",
"Bird", "Tiger", "Lion", "Leopard", "Wolf", "Puppy", "Kitten",
"Emu", "Ostrich", "Elephant", "Sheep", "Goat", "Fish", "Iguana",
"Monkey", "Gorilla", "Baboon", "Lemming", "Mouse", "Rat", "Hamster",
"Eagle", "Parrot", "Crow", "Dove", "Falcon", "Hawk", "Sparrow",
"Kite", "Chimpanzee", "Giraffe", "Bear", "Donkey", "Mule", "Horse",
"Zebra", "Ox", "Snake", "Cobra", "Iguana", "Lizard", "Fly", "Mosquito",
"Llama", "Butterfly", "Moth", "Worm", "Centipede", "Unicorn",
"Pegasus", "Griffin", "Ogre", "Monster", "Demon", "Witch", "Vampire",
"Mummy", "Ghoul", "Zombie"), Region = c(6L, 4L, 4L, 5L, 7L, 6L,
2L, 4L, 6L, 7L, 7L, 4L, 6L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 4L, 6L, 6L,
4L, 2L, 7L, 4L, 2L, 2L, 7L, 3L, 4L, 7L, 4L, 4L, 4L, 7L, 7L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 8L), Period = c(-2715, -3500,
-3500, -4933.333333, -2715, -2715, -2715, -3500, -2715, -4350,
-3500, -3500, -2950, -4350, -3650, -3500, -3500, -2715, -3650,
-4350, -3500, -3500, -3400, -4350, -3500, -3500, -4350, -3900,
-3808.333333, -4233.333333, -3500, -3900, -3958.333333, -3900,
-3500, -3500, -3500, -2715, -3650, -2715, -2715, -2715, -2715,
-3500, -2715, -2715, -3500, -4350, -3650, -3650, -4350, -5400,
-3500, -3958.333333, -3400, -3400, -4350, -3600, -4350, -3650,
-3500, -2715, -5400, -3500), Value = c(0.132625995, 0.163120567,
0.228840125, 0.154931973, 0.110047847, 0.054347826, 0.188679245,
0.245014245, 0.128378378, 0.021428571, 0.226277372, 0.176923077,
0.104938272, 0.17659805, 0.143798024, 0.086956522, 0.0625, 0.160714286,
0, 0.235588972, 0, 0, 0.208333333, 0.202247191, 0.364705882,
0.174757282, 0, 0.4, 0.1, 0.184027778, 0.232876712, 0.160493827,
0.74702381, 0.126984127, 0.080645161, 0.06557377, 0, 0.057692308,
0.285714286, 0.489361702, 0.108695652, 0.377777778, 0, 0.522727273,
0.024390244, 0.097560976, 0.275, 0, 0.0625, 0.255319149, 0.135135135,
0.216216216, 0.222222222, 0.296296296, 0.222222222, 0.146341463,
0.09375, 0.125, 0.041666667, 0.078947368, 0.2, 0.137931034, 0.571428571,
0.142857143), Sample_size = c(188.5, 105.75, 79.75, 70, 52.25,
46, 39.75, 39, 37, 35, 34.25, 32.5, 32.4, 30.76666667, 30.36666667,
28.75, 28, 28, 28, 26.6, 25, 25, 24, 22.25, 21.25, 20.6, 20,
20, 20, 19.2, 18.25, 18, 18, 16.8, 15.5, 15.25, 15, 13, 12.6,
11.75, 11.5, 11.25, 11, 11, 10.25, 10.25, 10, 10, 9.6, 9.4, 9.25,
9.25, 9, 9, 9, 8.2, 8, 8, 8, 7.6, 7.5, 7.25, 7, 7), Sample_sub = c(25,
17.25, 18.25, 10.8452381, 5.75, 2.5, 7.5, 9.555555556, 4.75,
0.75, 7.75, 5.75, 3.4, 5.433333333, 4.366666667, 2.5, 1.75, 4.5,
0, 6.266666667, 0, 0, 5, 4.5, 7.75, 3.6, 0, 8, 2, 3.533333333,
4.25, 2.888888889, 13.44642857, 2.133333333, 1.25, 1, 0, 0.75,
3.6, 5.75, 1.25, 4.25, 0, 5.75, 0.25, 1, 2.75, 0, 0.6, 2.4, 1.25,
2, 2, 2.666666667, 2, 1.2, 0.75, 1, 0.333333333, 0.6, 1.5, 1,
4, 1)), .Names = c("Site", "Region", "Period", "Value", "Sample_size",
"Sample_sub"), class = "data.frame", row.names = c(NA, -64L))
我已经为此工作了一段时间,并尝试尽可能多地阅读,但是我没有取得任何其他进展.任何建议或指导将不胜感激.
I have been working for this a while and trying to read up as much as I can, but I haven't been able to make any additional headway. Any advice or guidance would be greatly appreciated.
在为绘图添加置信区间的后续操作
我一直试图按照此页面上网站上的另一个示例添加置信区间.
I have been trying to add in confidence intervals following another example found on the site on this page How to get the confidence intervals for LOWESS fit using R? .
该页面上的示例为:
plot(cars)
plx<-predict(loess(cars$dist ~ cars$speed), se=T)
lines(cars$speed,plx$fit)
lines(cars$speed,plx$fit - qt(0.975,plx$df)*plx$se, lty=2)
lines(cars$speed,plx$fit + qt(0.975,plx$df)*plx$se, lty=2)
我对此进行了修改:
plot(X15p5 ~ Period, animals)
animals.lo2<-predict(loess(animals$X15p5 ~ animals$Period), se=T)
a <- order(animals$Period)
lines(animals$Period[a],animals.lo2$fit, col="red", lwd=3)
lines(animals$Period[a],animals.lo2$fit - qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)
lines(animals$Period[a],animals.lo2$fit + qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)
尽管这确实提供了置信区间,但是回归线是完全错误的.我不确定这是predict
函数的问题还是其他问题.再次感谢!
Although this does provide confidence intervals, the regression line is all wrong. I'm not sure if it is an issue with the predict
function, or another issue. Thanks again!
推荐答案
正确的代码
不,不.订购问题与您看到的错误无关.要克服该错误,您需要更换
No, no. The ordering issue is not related to the error you see. To overcome the error, You need to replace
lines(animals$X15p5, animals.lo, col="red")
使用
lines(animals$Period, animals.lo$fitted, col="red")
原因如下:
-
loess
返回对象列表,而不是单个向量.请参见str(animals.lo)
或names(animals.lo)
. - 为什么将
animals$X15p5
用作x轴?您适合您的模型:X15p5 ~ Period
,因此x轴应为Period
.
loess
returns a list of objects, not a single vector. Seestr(animals.lo)
ornames(animals.lo)
.- why do you use
animals$X15p5
as x-axis? You fit your model:X15p5 ~ Period
, so x-axis should bePeriod
.
关于重新排序
您需要进行排序,因为默认情况下,R按顺序排列点.以此为例:
You need to do ordering, because by default, R lines up points in order. Take this as an example:
set.seed(0); x <- runif(100, 0, 10) ## x is not in order
set.seed(1); y <- sqrt(x) ## plot curve y = sqrt(x)
par(mfrow = c(1,2))
plot(x, y, type = "l") ## this is a mess!!
reorder <- order(x)
plot(x[reorder], y[reorder], type = "l") ## this is nice
类似地,做:
a <- order(animals$Period)
lines(animals$Period[a], animals.lo$fitted[a], col="red", lwd=3)
关注置信区间
尝试一下:
plot(X15p5 ~ Period, animals)
animals.lo <- loess(X15p5 ~ Period, animals)
pred <- predict(animals.lo, se = TRUE)
a <- order(animals$Period)
lines(animals$Period[a], pred$fit[a], col="red", lwd=3)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
您忘了重新订购.您需要重新调整拟合值和标准误差.
You forgot about reordering again. You need to reorder both fitted values, as well as standard errors.
现在,cars
数据的dist ~ speed
模型无需重新排序.因为:
Now, the dist ~ speed
model for cars
data has no need for reordering. Because:
is.unsorted(cars$speed) ## FALSE
是的,数据已经在那里排序了.
Yes, data are already sorted there.
请注意,我对您的代码进行了另外两项更改:
Note I have made two other changes to your code:
- 我将
loess
呼叫和predict
呼叫分开了;也许您不需要这样做,但是将模型拟合和模型预测分离开来并保留两个对象的副本通常是一个好习惯. - 我已将
loess(animals$X15p5 ~ animals$Period)
更改为loess(X15p5 ~ Period, animals)
.在指定模型公式时使用$
符号是一个坏习惯.我在 https://stackoverflow.com/a/37307270/4891738 中有另一个答案,显示了这种样式的缺点.您可以在那儿的更新"部分中阅读.我以glm
为例,但是对于lm
,glm
,loess
,情况是相同的.
- I have separated
loess
call andpredict
call; Maybe you don't need to do this, but it is generally a good habit to separate model fitting and model prediction, and keeps a copy of both objects. - I have changed
loess(animals$X15p5 ~ animals$Period)
toloess(X15p5 ~ Period, animals)
. It is a bad habit to use$
sign in specifying model formula. I have another answer at https://stackoverflow.com/a/37307270/4891738 showing the draw back of such style. You can read on the "update" section over there. I have used theglm
as an example, but forlm
,glm
,loess
, things are the same.
这篇关于显示LOESS回归线和置信区间的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!