本文介绍了带有预训练卷积基的 keras 模型中损失函数的奇怪行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


我正在尝试在 Keras 中创建一个模型,以根据图片进行数值预测.我的模型有 densenet121 卷积基础,顶部有几个附加层.除了最后两个层之外的所有层都设置为 layer.trainable = False.我的损失是均方误差,因为它是一个回归任务.在训练期间我得到 loss: ~3,而对同一批数据的评估得到 loss: ~30:


Epoch 1/1 32/32 [==============================] - 0s 11ms/step -损失:2.5571


32/32 [==============================] - 2 秒 59 毫秒/步29.276123046875

我在训练和评估期间提供了完全相同的 32 张图片.我还使用来自 y_pred=model.predict(dat[0]) 的预测值计算了损失,然后使用 numpy 构建了均方误差.结果与我从评估中得到的结果相同(即 29.276123...).

有人建议这种行为可能是由于卷积基础中的 BatchNormalization 层 (github 上的讨论).当然,我模型中的所有 BatchNormalization 层也都设置为 layer.trainable=False.也许有人遇到过这个问题并找到了解决方案?


看起来我找到了解决方案.正如我所建议的,问题出在 BatchNormalization 层上.他们制作树的东西

  1. 减去均值并按标准进行归一化
  2. 使用运行平均值收集均值和标准差的统计数据
  3. 训练两个额外的参数(每个节点两个).

当设置 trainableFalse 时,这两个参数 freeze 和 layer 也停止收集对均值和标准差的统计.但看起来该层在训练期间仍然使用训练批次执行标准化.很可能是 keras 中的错误,或者他们出于某种原因故意这样做.因此,与预测时间相比,训练期间前向传播的计算是不同的即使可训练属性设置为 False.


  1. 将所有 BatchNormalization 层设置为可训练.在这种情况下,这些层将从您的数据集中收集统计信息,而不是使用预训练的(可能会有很大不同!).在这种情况下,您将在训练期间将所有 BatchNorm 层调整为您的自定义数据集.
  2. 将模型分成两部分model=model_base+model_top.之后,使用model_base通过model_base.predict()提取特征,然后将这些特征输入model_top,只训练model_top.


model.fit(x=dat[0],y=dat[1],batch_size=32)时代 1/132/32 [==============================] - 1 秒 28 毫秒/步 - 损失:**3.1053**模型.评估(x=dat[0],y=dat[1])32/32 [==============================] - 0 秒 10 毫秒/步**2.487905502319336**

这是在一些训练之后 - 需要等到收集到足够的均值和标准差统计数据.



I'm trying to create a model in Keras to make numerical predictions from the pictures. My model has densenet121 convolutional base, with couple of additional layers on top. All layers except for the two last ones are set to layer.trainable = False. My loss is mean squared error, since it's a regression task. During training I get loss: ~3, while evaluation on the very same batch of the data gives loss: ~30:


I feed exactly the same 32 pictures during training and evaluation. And I also calculated loss using predicted values from y_pred=model.predict(dat[0]) and then constructed mean squared error using numpy. The result was the same as what I've got from evaluation (i.e. 29.276123...).

There was suggestion that this behavior might be due to BatchNormalization layers in convolutional base (discussion on github). Of course, all BatchNormalization layers in my model have been set to layer.trainable=False as well. Maybe somebody has encountered this problem and figured out the solution?


Looks like I found the solution. As I have suggested the problem is with BatchNormalization layers. They make tree things

  1. subtract mean and normalize by std
  2. collect statistics on mean and std using running average
  3. train two additional parameters (two per node).

When one sets trainable to False, these two parameters freeze and layer also stops collecting statistic on mean and std. But it looks like the layer still performs normalization during training time using the training batch. Most likely it's a bug in keras or maybe they did it on purpose for some reason. As a result the calculations on forward propagation during training time are different as compared with prediction time even though the trainable atribute is set to False.

There are two possible solutions i can think of:

  1. To set all BatchNormalization layers to trainable. In this case these layers will collect statistics from your dataset instead of using pretrained one (which can be significantly different!). In this case you will adjust all the BatchNorm layers to your custom dataset during the training.
  2. Split the model in two parts model=model_base+model_top. After that, use model_base to extract features by model_base.predict() and then feed these features into model_top and train only the model_top.

I've just tried the first solution and it looks like it's working:


Epoch 1/1
32/32 [==============================] - 1s 28ms/step - loss: **3.1053**


32/32 [==============================] - 0s 10ms/step

This was after some training - one need to wait till enough statistics on mean and std are collected.

Second solution i haven't tried yet, but i'm pretty sure it's gonna work since forward propagation during training and prediction will be the same.

Update. I found a great blog post where this issue has been discussed in all the details. Check it out here

这篇关于带有预训练卷积基的 keras 模型中损失函数的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 02:56