问题描述
我有一组数据,我想比较哪行最能描述它(不同阶数,指数或对数的多项式).
I have a set of data and I want to compare which line describes it best (polynomials of different orders, exponential or logarithmic).
我使用Python和Numpy,对于多项式拟合,有一个函数polyfit()
.但是我没有找到用于指数和对数拟合的函数.
I use Python and Numpy and for polynomial fitting there is a function polyfit()
. But I found no such functions for exponential and logarithmic fitting.
有没有?或如何解决呢?
Are there any? Or how to solve it otherwise?
推荐答案
用于拟合 y = A + B log x ,将 y 恰好适合(log x ).
For fitting y = A + B log x, just fit y against (log x).
>>> x = numpy.array([1, 7, 20, 50, 79])
>>> y = numpy.array([10, 19, 30, 35, 51])
>>> numpy.polyfit(numpy.log(x), y, 1)
array([ 8.46295607, 6.61867463])
# y ≈ 8.46 log(x) + 6.62
为拟合 y = Ae ,取双方的对数给出对数 y = log A + Bx .因此,将(log y )与 x 匹配.
For fitting y = Ae, take the logarithm of both side gives log y = log A + Bx. So fit (log y) against x.
请注意,将拟合(log y )视为线性拟合将强调 y 的较小值,从而导致较大的 y 产生较大偏差.这是因为polyfit
(线性回归)通过最小化∑ (Δ Y ) = ∑而起作用( Y − Ŷ ).当 Y = log y 时,残基Δ Y =Δ(log y )≈Δ y /| y |.因此,即使polyfit
对于较大的 y 做出了非常错误的决定,除以|| y |"因数会对其进行补偿,从而导致polyfit
偏爱较小的值.
Note that fitting (log y) as if it is linear will emphasize small values of y, causing large deviation for large y. This is because polyfit
(linear regression) works by minimizing ∑ (ΔY) = ∑ (Y − Ŷ). When Y = log y, the residues ΔY = Δ(log y) ≈ Δy / |y|. So even if polyfit
makes a very bad decision for large y, the "divide-by-|y|" factor will compensate for it, causing polyfit
favors small values.
可以通过为每个条目赋予与 y 成比例的权重"来缓解这种情况. polyfit
通过w
关键字参数支持加权最小二乘.
This could be alleviated by giving each entry a "weight" proportional to y. polyfit
supports weighted-least-squares via the w
keyword argument.
>>> x = numpy.array([10, 19, 30, 35, 51])
>>> y = numpy.array([1, 7, 20, 50, 79])
>>> numpy.polyfit(x, numpy.log(y), 1)
array([ 0.10502711, -0.40116352])
# y ≈ exp(-0.401) * exp(0.105 * x) = 0.670 * exp(0.105 * x)
# (^ biased towards small values)
>>> numpy.polyfit(x, numpy.log(y), 1, w=numpy.sqrt(y))
array([ 0.06009446, 1.41648096])
# y ≈ exp(1.42) * exp(0.0601 * x) = 4.12 * exp(0.0601 * x)
# (^ not so biased)
请注意,Excel,LibreOffice和大多数科学计算器通常对指数回归/趋势线使用未加权(有偏)公式.如果您希望结果与这些平台兼容,请不要包括权重即使可以提供更好的结果.
Note that Excel, LibreOffice and most scientific calculators typically use the unweighted (biased) formula for the exponential regression / trend lines. If you want your results to be compatible with these platforms, do not include the weights even if it provides better results.
现在,如果可以使用scipy,则可以使用 scipy.optimize.curve_fit
以适合任何无需转换的模型.
Now, if you can use scipy, you could use scipy.optimize.curve_fit
to fit any model without transformations.
对于 y = A + B log x ,结果与转换方法相同:
For y = A + B log x the result is the same as the transformation method:
>>> x = numpy.array([1, 7, 20, 50, 79])
>>> y = numpy.array([10, 19, 30, 35, 51])
>>> scipy.optimize.curve_fit(lambda t,a,b: a+b*numpy.log(t), x, y)
(array([ 6.61867467, 8.46295606]),
array([[ 28.15948002, -7.89609542],
[ -7.89609542, 2.9857172 ]]))
# y ≈ 6.62 + 8.46 log(x)
对于 y = Ae ,但是,由于它计算Δ(直接记录 y ).但是我们需要提供一个初始化猜测,以便curve_fit
可以达到所需的局部最小值.
For y = Ae, however, we can get a better fit since it computes Δ(log y) directly. But we need to provide an initialize guess so curve_fit
can reach the desired local minimum.
>>> x = numpy.array([10, 19, 30, 35, 51])
>>> y = numpy.array([1, 7, 20, 50, 79])
>>> scipy.optimize.curve_fit(lambda t,a,b: a*numpy.exp(b*t), x, y)
(array([ 5.60728326e-21, 9.99993501e-01]),
array([[ 4.14809412e-27, -1.45078961e-08],
[ -1.45078961e-08, 5.07411462e+10]]))
# oops, definitely wrong.
>>> scipy.optimize.curve_fit(lambda t,a,b: a*numpy.exp(b*t), x, y, p0=(4, 0.1))
(array([ 4.88003249, 0.05531256]),
array([[ 1.01261314e+01, -4.31940132e-02],
[ -4.31940132e-02, 1.91188656e-04]]))
# y ≈ 4.88 exp(0.0553 x). much better.
这篇关于如何在Python中进行指数和对数曲线拟合?我发现只有多项式拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!