如何在Gekko中使用缺失数据实现动态参数估计?

本文介绍了如何在Gekko中使用缺失数据实现动态参数估计?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

来回浏览

从gekko导入

 导入GEKKO将numpy导入为np导入matplotlib.pyplot作为pltxm = np.array([0,1,2,3,4,5])ym = np.array([2.0,1.5，np.nan，2.2,3.0,5.0])m = GEKKO(远程=假)m.时间= xma = m.FV(磅= 0.1，ub = 2.0)a.STATUS = 1y = m.CV(值= ym，名称='y'，fixed_initial =假)y.FSTATUS = 1m.方程式(y.dt()== a * y)m.options.IMODE = 5m.options.SOLVER = 1m.solve(disp = True)print('Optimized，a ='+ str(a.value [0]))plt.figure(figsize =(6,2))plt.plot(xm，ym，'bo'，label ='Meas')plt.plot(xm，y.value，'r-'，label ='Pred')plt.ylabel('y')plt.ylim([0,6])plt.legend()plt.show()

如您所见， m.time 的长度必须与测量值的长度相同.如果缺少值，则可以在数据范围的开头添加一个 np.nan .默认情况下，Gekko使用在 value 属性中指定的第一个值来设置初始条件.如果您不希望Gekko使用该值，请为您的 CV 设置 fixed_initial = False .

具有免费初始条件的动态回归

  y = m.CV(value = ym，name ='y'，fixed_initial = False)

Going back and forth through the documentation, I was able to set-up a dynamic parameter estimation in Gekko.

Here's the code, with measurement values shown below (the file is named MeasuredAlgebrProductionRate_30min_18h.csv on my system, and uses ;as separator):

import numpy as np
import matplotlib.pyplot as plt
from gekko import GEKKO

#%% Read measurement data from CSV file
t_x_q_obs = np.genfromtxt('MeasuredAlgebrProductionRate_30min_18h.csv', delimiter=';')
#t_obs, x_obs, q_obs = t_xq_obs[:,0:3]

#%% Initialize Model
m = GEKKO(remote=False)
m.time = t_x_q_obs[:,0] #np.arange(0, 18/24+1e-6, 1/2*1/24)

# Declare parameter
V_liq   = m.Param(value = 159.0)

# Declare FVs
k_1     = m.FV(value = 0.80)
k_1.STATUS = 1
f_1     = m.FV(value = 10.0)
f_1.STATUS = 1

# Diff. Variables
X       = m.Var(value = 80.0) # at t=0
Y       = m.Var(value = 80.0*0.2)

rho_1   = m.Intermediate(k_1*X)
#q_prod  = m.Intermediate(0.52*f_1*X/24)
#X       = m.CV(value = t_x_q_obs[:,1])
q_prod  = m.CV(value = t_x_q_obs[:,2])

#%% Equations
m.Equations([X.dt() == -rho_1, Y.dt() == 0, q_prod == 0.52*f_1*X/24])

m.options.IMODE = 5
m.solve(disp=False)

#%% Plot some results
plt.plot(m.time, np.array(X.value)/10, label='X')
plt.plot(t_x_q_obs[:,0], t_x_q_obs[:,2], label='q_prod Meas.')
plt.plot(m.time, q_prod.value, label='q_prod Sim.')
plt.xlabel('time')
plt.ylabel('X / q_prod')
plt.grid()
plt.legend(loc='best')
plt.show()

So far, so good. Suppose I also have measurements of X (second column) at some time points (first column), the rest is not available (therefore NaN).I would like to adjust k_1 and f_1, so that simulated and observed variables X and q_prod match as closely as possible.

Is this feasible with Gekko? If so, how?

Another question: Gekko throws an error if m.time has more elements than there are time points of observed variables. However, my initial values of X and Y refer to t=0, not t=0.0208333333. Hence, the commented out part after m.time =, see above. (Measurements at t=0 are not available.) Do initial conditions in Gekko refer to the first element of m.time, as they do in Matlab, or to t=0?

解决方案

If you have a missing measurement then you can include a non-numeric value such as NaN and Gekko ignores that entry in the objective function. Here is a test case with one NaN value in ym:

Nonlinear Regression with NaN Data Value

from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
xm = np.array([0,1,2,3,4,5])
ym = np.array([0.1,0.2,np.nan,0.5,0.8,2.0])
m = GEKKO(remote=False)
x = m.Param(value=xm,name='x')
a = m.FV()
a.STATUS=1
y = m.CV(value=ym,name='y')
y.FSTATUS=1
m.Equation(y==0.1*m.exp(a*x))
m.options.IMODE = 2
m.options.SOLVER = 1
m.solve(disp=True)
print('Optimized, a = ' + str(a.value[0]))
plt.plot(xm,ym,'bo')
plt.plot(xm,y.value,'r-')
m.open_folder()
plt.show()

When you open the run folder with m.open_folder() and look at the data file gk_model0.csv, there is the NaN in the y value column.

y,x
0.1,0
0.2,1
nan,2
0.5,3
0.8,4
2.0,5

This is IMODE=2 so it is a steady state regression problem but shows the same thing that happens with dynamic estimation problems. There is more information on the objective function with m.options.EV_TYPE=1 (default) or m.options.EV_TYPE=2 for estimation and how bad values are handled in a data file. When the measurement is a non-numeric value, that bad value is dropped from the objective function summation. Here is a version with a dynamic model:

Dynamic Regression with Fixed Initial Condition

from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
xm = np.array([0,1,2,3,4,5])
ym = np.array([2.0,1.5,np.nan,2.2,3.0,5.0])
m = GEKKO(remote=False)
m.time = xm
a = m.FV(lb=0.1,ub=2.0)
a.STATUS=1
y = m.CV(value=ym,name='y',fixed_initial=False)
y.FSTATUS=1
m.Equation(y.dt()==a*y)
m.options.IMODE = 5
m.options.SOLVER = 1
m.solve(disp=True)
print('Optimized, a = ' + str(a.value[0]))
plt.figure(figsize=(6,2))
plt.plot(xm,ym,'bo',label='Meas')
plt.plot(xm,y.value,'r-',label='Pred')
plt.ylabel('y')
plt.ylim([0,6])
plt.legend()
plt.show()

As you observed, you need to have the same length for m.time as for your measurement values. If you are missing values then you can include append a np.nan to the beginning of the data horizon. By default, Gekko uses the first value specified in the value property to set the initial condition. If you don't want Gekko to use that value then set fixed_initial=False for your CV.

Dynamic Regression with Free Initial Condition

y = m.CV(value=ym,name='y',fixed_initial=False)

这篇关于如何在Gekko中使用缺失数据实现动态参数估计?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！