问题描述
我试图理解predict.loess
函数如何能够在原始数据中不存在的点x
处计算新的预测值(y_hat
).例如(这是一个简单的例子,我知道这种例子显然不需要黄土,但它说明了这一点):
I am attempting to understand how the predict.loess
function is able to compute new predicted values (y_hat
) at points x
that do not exist in the original data. For example (this is a simple example and I realize loess is obviously not needed for an example of this sort but it illustrates the point):
x <- 1:10
y <- x^2
mdl <- loess(y ~ x)
predict(mdl, 1.5)
[1] 2.25
loess
回归是通过在每个x
处使用多项式来进行的,因此在每个y
处都创建了预测的y_hat
.但是,由于没有存储系数,因此在这种情况下,模型"只是用于预测每个y_hat
(例如,span
或degree
)的详细信息.当我执行predict(mdl, 1.5)
时,predict
如何在这个新的x
处产生一个值?是否在两个最接近的现有x
值及其关联的y_hat
之间进行插值?如果是这样,它如何执行此操作的详细信息是什么?
loess
regression works by using polynomials at each x
and thus it creates a predicted y_hat
at each y
. However, because there are no coefficients being stored, the "model" in this case is simply the details of what was used to predict each y_hat
, for example, the span
or degree
. When I do predict(mdl, 1.5)
, how is predict
able to produce a value at this new x
? Is it interpolating between two nearest existing x
values and their associated y_hat
? If so, what are the details behind how it is doing this?
我已经在线阅读了cloess
文档,但是找不到在何处进行讨论.
I have read the cloess
documentation online but am unable to find where it discusses this.
推荐答案
也许您已经使用过print(mdl)
命令,或者只是使用了mdl
来查看模型mdl
包含的内容,但事实并非如此.该模型非常复杂,并且存储了大量参数.
Maybe you have used print(mdl)
command or simply mdl
to see what the model mdl
contains, but this is not the case. The model is really complicated and stores a big number of parameters.
要了解其中的内容,可以使用unlist(mdl)
并查看其中的大量参数.
To have an idea what's inside, you may use unlist(mdl)
and see the big list of parameters in it.
这是命令手册的一部分,描述了命令的实际工作方式:
This is a part of the manual of the command describing how it really works:
对于默认族,拟合度是(加权)最小二乘.为了 family ="symmetric"的M估计过程的一些迭代 使用Tukey的biweight.请注意,因为初始值是 最小二乘拟合,这不必是非常可靠的拟合.
For the default family, fitting is by (weighted) least squares. For family="symmetric" a few iterations of an M-estimation procedure with Tukey's biweight are used. Be aware that as the initial value is the least-squares fit, this need not be a very resistant fit.
我相信它试图在每个点的附近拟合一个多项式模型(不仅仅是整个集合的一个多项式).但是,邻域并不仅仅意味着一个点之前和之后的一个点,如果我正在实现这样的功能,我会给与点x最接近的点赋予较大的权重,而给远端点赋予较轻的权重,并尝试拟合一个适合最高的总重量.
What I believe is that it tries to fit a polynomial model in the neighborhood of every point (not just a single polynomial for the whole set). But the neighborhood does not mean only one point before and one point after, if I was implementing such a function I put a big weight on the nearest points to the point x, and lower weights to distal points, and tried to fit a polynomial that fits the highest total weight.
然后,如果要为其预测高度的给定x'最接近点x,我尝试使用拟合在点x的邻域上的多项式-说P(x)-并将其应用于x'-说P(x')-那将是预测.
Then if the given x' for which height should be predicted is closest to point x, I tried to use the polynomial fitted on the neighborhoods of the point x - say P(x) - and applied it over x' - say P(x') - and that would be the prediction.
如果您正在寻找特殊的东西,请告诉我.
Let me know if you are looking for anything special.
这篇关于黄土以新的x值预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!