如何使用共享参数拟合多个数据集

如何使用共享参数拟合多个数据集

本文介绍了Python 和 lmfit:如何使用共享参数拟合多个数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 lmfit 模块来拟合函数到可变数量的数据集,具有一些共享参数和一些单独参数.

I would like to use the lmfit module to fit a function to a variable number of data-sets, with some shared and some individual parameters.

这是一个生成高斯数据并分别拟合每个数据集的示例:

Here is an example generating Gaussian data, and fitting to each data-set individually:

import numpy as np
import matplotlib.pyplot as plt
from lmfit import minimize, Parameters, report_fit

def func_gauss(params, x, data=[]):
    A = params['A'].value
    mu = params['mu'].value
    sigma = params['sigma'].value
    model = A*np.exp(-(x-mu)**2/(2.*sigma**2))

    if data == []:
        return model
    return data-model

x  = np.linspace( -1, 2, 100 )
data = []
for i in np.arange(5):
    params = Parameters()
    params.add( 'A'    , value=np.random.rand() )
    params.add( 'mu'   , value=np.random.rand()+0.1 )
    params.add( 'sigma', value=0.2+np.random.rand()*0.1 )
    data.append(func_gauss(params,x))

plt.figure()
for y in data:
    fit_params = Parameters()
    fit_params.add( 'A'    , value=0.5, min=0, max=1)
    fit_params.add( 'mu'   , value=0.4, min=0, max=1)
    fit_params.add( 'sigma', value=0.4, min=0, max=1)
    minimize(func_gauss, fit_params, args=(x, y))
    report_fit(fit_params)

    y_fit = func_gauss(fit_params,x)
    plt.plot(x,y,'o',x,y_fit,'-')
plt.show()


# ideally I would like to write:
#
# fit_params = Parameters()
# fit_params.add( 'A'    , value=0.5, min=0, max=1)
# fit_params.add( 'mu'   , value=0.4, min=0, max=1)
# fit_params.add( 'sigma', value=0.4, min=0, max=1, shared=True)
# minimize(func_gauss, fit_params, args=(x, data))
#
# or:
#
# fit_params = Parameters()
# fit_params.add( 'A'    , value=0.5, min=0, max=1)
# fit_params.add( 'mu'   , value=0.4, min=0, max=1)
#
# fit_params_shared = Parameters()
# fit_params_shared.add( 'sigma', value=0.4, min=0, max=1)
# call_function(func_gauss, fit_params, fit_params_shared, args=(x, data))

推荐答案

我认为您已经大致了解了.您需要将数据集放入一个数组或结构中,该数组或结构可用于提供给 minimum() 的单个全局目标函数,并使用所有数据集的单个参数集拟合所有数据集.您可以根据需要在数据集之间共享此集.稍微扩展一下您的示例,下面的代码确实可以对 5 个不同的高斯函数进行一次拟合.对于跨数据集绑定参数的示例,我对 sigma 使用了几乎相同的值,这 5 个数据集具有相同的值.我创建了 5 个不同的 sigma 参数('sig_1'、'sig_2'、...、'sig_5'),然后使用数学约束强制它们具有相同的值.因此问题中有 11 个变量,而不是 15 个.

I think you're most of the way there. You need to put the data sets into an array or structure that can be used in a single, global objective function that you give to minimize() and fits all data sets with a single set of Parameters for all the data sets. You can share this set among data sets as you like. Expanding on your example a bit, the code below does work to do a single fit to the 5 different Gaussian functions. For an example of tying parameters across data sets, I used nearly identical value for sigma the 5 datasets the same value. I created 5 different sigma Parameters ('sig_1', 'sig_2', ..., 'sig_5'), but then forced these to have the same values using a mathematical constraint. Thus there are 11 variables in the problem, not 15.

import numpy as np
import matplotlib.pyplot as plt
from lmfit import minimize, Parameters, report_fit

def gauss(x, amp, cen, sigma):
    "basic gaussian"
    return amp*np.exp(-(x-cen)**2/(2.*sigma**2))

def gauss_dataset(params, i, x):
    """calc gaussian from params for data set i
    using simple, hardwired naming convention"""
    amp = params['amp_%i' % (i+1)].value
    cen = params['cen_%i' % (i+1)].value
    sig = params['sig_%i' % (i+1)].value
    return gauss(x, amp, cen, sig)

def objective(params, x, data):
    """ calculate total residual for fits to several data sets held
    in a 2-D array, and modeled by Gaussian functions"""
    ndata, nx = data.shape
    resid = 0.0*data[:]
    # make residual per data set
    for i in range(ndata):
        resid[i, :] = data[i, :] - gauss_dataset(params, i, x)
    # now flatten this to a 1D array, as minimize() needs
    return resid.flatten()

# create 5 datasets
x  = np.linspace( -1, 2, 151)
data = []
for i in np.arange(5):
    params = Parameters()
    amp   =  0.60 + 9.50*np.random.rand()
    cen   = -0.20 + 1.20*np.random.rand()
    sig   =  0.25 + 0.03*np.random.rand()
    dat   = gauss(x, amp, cen, sig) + np.random.normal(size=len(x), scale=0.1)
    data.append(dat)

# data has shape (5, 151)
data = np.array(data)
assert(data.shape) == (5, 151)

# create 5 sets of parameters, one per data set
fit_params = Parameters()
for iy, y in enumerate(data):
    fit_params.add( 'amp_%i' % (iy+1), value=0.5, min=0.0,  max=200)
    fit_params.add( 'cen_%i' % (iy+1), value=0.4, min=-2.0,  max=2.0)
    fit_params.add( 'sig_%i' % (iy+1), value=0.3, min=0.01, max=3.0)

# but now constrain all values of sigma to have the same value
# by assigning sig_2, sig_3, .. sig_5 to be equal to sig_1
for iy in (2, 3, 4, 5):
    fit_params['sig_%i' % iy].expr='sig_1'

# run the global fit to all the data sets
result = minimize(objective, fit_params, args=(x, data))
report_fit(result)

# plot the data sets and fits
plt.figure()
for i in range(5):
    y_fit = gauss_dataset(fit_params, i, x)
    plt.plot(x, data[i, :], 'o', x, y_fit, '-')

plt.show()

就其价值而言,我会考虑将多个数据集保存在字典或 DataSet 类列表中,而不是多维数组中.无论如何,我希望这有助于让您继续做您真正需要做的事情.

For what it's worth, I would consider holding the multiple data sets in a dictionary or list of DataSet class instead of a multi-dimensional array. Anyway, I hope this helps get you going onto what you really need to do.

这篇关于Python 和 lmfit:如何使用共享参数拟合多个数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:13