问题描述
我目前正在Python中使用scipy.optimize包的curve_fit函数,并且知道如果采用从curve_fit获得的协方差矩阵对角线项的平方根,则会得到curve_fit计算的参数。我不确定标准偏差到底是什么意思。据我所知,这是使用黑塞矩阵的近似值,但是确切的计算结果是什么?高斯钟形曲线的标准偏差告诉您在曲线的特定范围内面积的百分比,因此我为curve_fit假定,它告诉您在某些参数值之间有多少个数据点,但是显然这是不正确的...
I'm currently using the curve_fit function of the scipy.optimize package in Python, and know that if you take the square root of the diagonal entries of the covariance matrix that you get from curve_fit, you get the standard deviation on the parameters that curve_fit calculated. What I'm not sure about, is what exactly this standard deviation means. It's an approximation using a Hesse matrix as far as I understand, but what would the exact calculation be? Standard deviation on the Gaussian Bell Curve tells you what percentage of area is within a certain range of the curve, so I assumed for curve_fit it tells you how many datapoints are between certain parameter values, but apparently that isn't right...
很抱歉,如果这应该是曲线拟合的基础知识,但是我真的无法弄清楚标准偏差的作用,它们在参数上表示错误,但是这些参数是按照最适合该函数的方式计算的,这不像是存在一个完整的最佳参数集合,我们可以得到该集合的平均值,因此也有一个标准偏差。只有一个最佳值,可与之进行比较?我想我的问题确实归结为:我该如何手动准确地计算这些标准偏差,而不仅仅是使用黑森矩阵来获得近似值?
I'm sorry if this should be basic knowledge for curve fitting, but I really can't figure out what the standard deviations do, they express an error on the parameters, but those parameters are calculated as the best possible fit for the function, it's not like there's a whole collection of optimal parameters, and we get the average value of that collection and consequently also have a standard deviation. There's only one optimal value, what is there to compare it with? I guess my question really comes down to this: how can I manually and accurately calculate these standard deviations, and not just get an approximation using a Hesse matrix?
推荐答案
拟合参数的方差表示基于模型与数据的拟合质量的最佳拟合值中的不确定性。也就是说,它描述了值可以从最佳拟合值变化多少而仍然具有与最佳拟合值几乎一样的拟合度。
The variance in the fitted parameters represents the uncertainty in the best-fit value based on the quality of the fit of the model to the data. That is, it describes by how much the value could change away from the best-fit value and still have a fit that is almost as good as the best-fit value.
使用卡方的标准定义,
chi_square =(((data-model)/ epsilon)** 2).sum()
With standard definition of chi-square,
chi_square = ( ( (data - model)/epsilon )**2 ).sum()
和 reduced_chi_square = chi_square /(ndata-nvarys)
(其中 data
是数据值的数组, model
是计算模型的数组, epsilon
是数据中的不确定性数据, ndata
是数据点的数量, nvarys
变量的数量),一个很好的拟合应该具有 reduced_chi_square
大约1或 chi_square
大约 ndata-nvary
。 (注意:不为0,因为数据中存在噪声,所以拟合将不完美。)
and reduced_chi_square = chi_square / (ndata - nvarys)
(where data
is the array of the data values, model
the array of the calculated model, epsilon
is uncertainty in the data, ndata
is the number of data points, and nvarys
the number of variables), a good fit should have reduced_chi_square
around 1 or chi_square
around ndata-nvary
. (Note: not 0 -- the fit will not be perfect as there is noise in the data).
变量的最佳拟合值方差为可以更改值(并重新优化所有其他值)并按1增大卡方的数量。得出不确定性的所谓 1-sigma值。
The variance in the best-fit value for a variable gives the amount by which you can change the value (and re-optimize all other values) and increase chi-square by 1. That gives the so-called '1-sigma' value of the uncertainty.
正如您所说,这些值以 scipy.optimize.curve_fit
返回的协方差矩阵的对角线形式表示(非对角线项表示变量之间的相关性:如果一个变量的值偏离其最佳值,那么其他变量将如何响应以使拟合更好。该协方差矩阵是在拟合完成时使用试验值和解附近的导数构建的-它计算参数空间的曲率(即,变量值变化时卡方变化多少)。
As you say, these values are expressed in the diagonal terms of the covariance matrix returned by scipy.optimize.curve_fit
(the off-diagonal terms give the correlations between variables: if a value for one variable is changed away from its optimal value, how would the others respond to make the fit better). This covariance matrix is built using the trial values and derivatives near the solution as the fit is being done -- it calculates the "curvature" of the parameter space (ie, how much chi-square changes when a variables value changes).
您可以手动计算这些不确定性。 lmfit
库()的例程可以更明确地探索最小二乘最小化或曲线拟合中变量的置信区间。这些在
。使用 lmfit
进行曲线拟合可能是最简单的方法,而不是尝试重新实现 curve_fit
的置信区间代码
You can calculate these uncertainties by hand. The lmfit
library (https://lmfit.github.io/lmfit-py/) has routines to more explicitly explore the confidence intervals of variables from least-squares minimization or curve-fitting. These are described in more detail athttps://lmfit.github.io/lmfit-py/confidence.html. It's probably easiest to use lmfit
for the curve-fitting rather than trying to re-implement the confidence interval code for curve_fit
.
这篇关于SciPy曲线拟合参数的方差到底是什么? (蟒蛇)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!