python - 具有自定义损失的Keras中的无监督编码

我正在尝试在Keras中使用RNN对时变协方差建模，在这里我将信号Y的协方差分解为随时间变化的加权和：C_Y ^ t = SUM_i ^ npriors（alpha_i ^ t * beta_i），其中beta_i是一些固定的基集和alpha_i ^ t是我试图推断的术语。

作为成本函数，我（当前）使用负对数似然，其中似然度是具有推论协方差C_Y ^ t（如上所示）的零均值MVN：似然度= MVN（Y； 0，C_Y ^ t ）。一旦正确实施，我将使用带有KL分歧的reparam技巧。

我不想明确地在经典的自动编码器设置中重建数据-我只想推断最适合随时间变化的协方差动态变化的alpha项。因此，在调用模型时，输出应仅为alpha_mu和alpha_sigma：

alpha_model_net = tf.keras.Model(inputs=[inputs_layer],
                                  outputs= [alpha_mu,alpha_sigma],
                                  name='Alpha_MODEL')

但是我不知道这些alpha术语是先验的，因此在调用alpha_model_net.fit(Y_observed,[alpha_mu_predict,alpha_sigma_predict])时，很难知道这些[alpha_mu_predict,alpha_sigma_predict]术语在无监督的环境中应该是什么。

因此，我的问题分为两个部分：

如果我不认识它们，我应该用什么作为alpha_predict？
我实际上是在这里所示的尝试实现中使用我的自定义成本函数中的alpha分布样本，即alpha_ast吗？

我自己去实施了。我的代码的关键部分可以在下面和a complete example with data simulation can be found on a Google Colab doc here中看到。

模型

mini_batch_length = 10 # feature length
nchans = 5 # number of features/channels of observed data, Y
nunits = 10 # number of GRU units
npriors = 2 # i.e. how many basis functions we have

inputs_layer = layers.Input(shape=(mini_batch_length,nchans), name='Y_input')
output,state = tf.compat.v1.keras.layers.CuDNNGRU(nunits, # number of units
                                          return_state=True,
                                          return_sequences=True,
                                          name='uni_INF_GRU')(inputs_layer)

alpha_mu = tf.keras.layers.Dense(npriors,activation='linear',name='alpha_mu')(output)
alpha_sigma = tf.keras.layers.Dense(npriors,activation='linear',name='alpha_sigma')(output)

# use reparameterization trick to push the sampling out as input
alpha_ast = layers.Lambda(sampling,
                          name='alpha_ast')([alpha_mu, alpha_sigma])

# instantiate alpha MODEL network:
alpha_model_net = tf.keras.Model(inputs=[inputs_layer],
                                  outputs= [alpha_ast],
                                  name='Alpha_MODEL')

tf.keras.utils.plot_model(alpha_model_net, to_file='vae_mlp_encoder.png', show_shapes=True)

成本函数

def vae_loss(Y_portioned, alpha_ast):
  """
  Our cost function is just the NLL

  The likelihood is a multivariate normal with zero mean and time-varying
  covariance:
                  P(Y|alpha^t) = MVN(Y; 0, C_Y^t)
  where
                      C_Y^t  = SUM_i^npriors (alpha_ast_i^t beta_i)

  Y is our observed data
  alpha_ast_i^t are our samples from the inferred parameters (mu,sigma)
  beta_i are the basis functions (corresponding to covariance_matrix below)
  and (perhaps obviously) are not trainable.
  """
  # Alphas need to end up being of dimension (?,mini_batch_length,npriors,1,1),
  # and need to undergo softplus transformation:
  alpha_ext = tf.keras.backend.expand_dims(tf.keras.backend.expand_dims(
    tf.keras.activations.softplus(alpha_ast),
    axis=-1),axis=-1)

  # Covariance basis set
  # This needs to be of dim [npriors, sensors, sensors]:
  covariance_basis = np.tile(np.zeros((nchans,nchans)),(npriors,1,1)).astype('float32')
  covariance_basis[0,0,0] = 1
  covariance_basis[1,1,1] = 1

  # Covariance basis functions need to be of dimension [1,1, npriors, sensors, sensors]
  covariance_ext = tf.reshape(covariance_basis,(1,1,npriors,nchans,nchans))

  # Do the multiplicative sum over the npriors dimension:
  cov_arg = tf.reduce_sum(tf.multiply(alpha_ext,covariance_ext),2)
  safety_add = 1e-6*np.eye(nchans, nchans)
  cov_arg = cov_arg + safety_add

  mvn=tfd.MultivariateNormalFullCovariance(
  loc = np.zeros((mini_batch_length,nchans)).astype('float32'),
  covariance_matrix=cov_arg,
  allow_nan_stats=False)

  # Evaluate the -log(MVN) at the current batch of data. We add a tiny constant
  # to avoid any NaN or inf troubles
  loss = tf.reduce_sum(-tf.math.log(mvn.prob(Y_portioned)+1e-9))

  return loss

拟合模型

opt = tf.keras.optimizers.Adam(lr=0.001)
alpha_model_net.compile(optimizer=opt, loss=vae_loss)

history=alpha_model_net.fit(Y_portioned, # Observed data.
                            Y_portioned, # ???
                    verbose=1,
                    shuffle=True,
                    epochs=100,
                    batch_size=400)

在此先非常感谢-如果我缺少任何关键细节，请告诉我。

使用TensorFlow 2.1.0后端

更新1：
我只是使用add_loss函数使用张量来计算NLL。现在看来这是可行的，并且我不需要在model.fit（x，y）中指定有害的y。如果不正确，将再次更新。

示例模型

inputs_layer = layers.Input(shape=(mini_batch_length,nchans), name='Y_portioned_in')
output,state = tf.compat.v1.keras.layers.CuDNNGRU(nunits, # number of units
                                          return_state=True,
                                          return_sequences=True,
                                          name='uni_INF_GRU')(inputs_layer)

dense_layer_mu = tf.keras.layers.Dense(npriors,activation='linear')(output)
dense_layer_sigma = tf.keras.layers.Dense(npriors,activation='linear')(output)

alpha_ast = layers.Lambda(sampling,
                          name='alpha_ast')([dense_layer_mu, dense_layer_sigma])

model = tf.keras.Model(inputs=[inputs_layer], outputs=[dense_layer_mu])

# Construct your custom loss as a tensor
loss = my_beautiful_custom_loss(alpha_ast,inputs_layer,npriors,nchans)

# Add loss to model
model.add_loss(loss)

# Compile without specifying a loss
opt = tf.keras.optimizers.Adam(lr=0.001)
model.compile(optimizer=opt)

history=model.fit(Y_portioned, # Input or "Y_true"
                    verbose=1,
                    shuffle=True,
                    epochs=400,
                    batch_size=200)

哪里

def my_beautiful_custom_loss(alpha_ast,Y_portioned,npriors,nchans):
  # <Do something with input tensors here>

  return loss

最佳答案

不确定这是最明智的做法，但是我使用了add_loss函数来解决此问题。

我将以完整的实施方式更新我的原始问题。