问题描述
我正在观察Keras的一些奇怪行为.我正在训练一个小模型,该模型的训练损失在第一个时期结束时就变成了 only .
I am observing some strange behavior from Keras. I am training a small model where the training loss becomes nan only at the end of the first epoch.
因此,如果我有100批次,并且我在第99批次终止训练,然后再恢复99,则可以很好地训练.否则,一旦到达纪元末尾,它将始终返回nan.
So if I have 100 batches, and I terminate training at batch 99, then resume for another 99 it trains fine. Otherwise, once it reaches the end of an epoch it always returns nan.
我正在使用自定义损失函数:
I am using a custom loss function:
def corr(x, y):
xc = x - K.mean(x)
yc = y - K.mean(y)
r_num = K.mean(xc*yc)
r_den = K.std(x)*K.std(y)
return r_num/r_den
我尝试了所有标准技巧,例如降低学习率,削减梯度的范数和价值以及增加批次大小.只有在将批处理大小增加到不现实的情况(例如100,000个(我有100万个数据点))的情况下,它实际上才持续了一个纪元,但我想了解最终导致这种奇怪行为的原因.我还尝试了不同的优化器(当前使用Adam),并在不同的系统上进行了尝试,以确保这不是我的一台计算机上的问题.
And I have tried all of the standard tricks like dropping my learning rate, clipping the norm and value of my gradient, and increasing batch size. Only in the event of increasing my batch size to something unrealistic like 100,000 (I have 1 million data points) does it actually continue past an epoch, but I would like to understand what is going on at the end that is causing this strange behavior. I also tried different optimizers (currently using Adam), and tried this on different systems to make sure it wasn't a problem on my one computer.
我的输入和输出是一维的,下面总结了我的模型.
My input and output is one dimensional and my model is summarized below.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_7 (InputLayer) (None, 1) 0
_________________________________________________________________
dense_7 (Dense) (None, 100) 200
_________________________________________________________________
dense_8 (Dense) (None, 100) 10100
_________________________________________________________________
dense_9 (Dense) (None, 1) 101
=================================================================
Total params: 10,401
Trainable params: 10,401
Non-trainable params: 0
_________________________________________________________________
Keras在某个时代结束时有什么特别之处吗?除了标准的记录器回调外,我什么都找不到.我还编写了一个自定义回调,该回调对每个批次的模型进行评估并存储输出,并且随着时间的流逝,它似乎不会爆炸或发生任何奇怪的事情.看起来好像正在慢慢改善,然后训练就死了.
Does Keras so something special at the end of an epoch? I couldn't find anything other than the standard logger callback. I also wrote a custom callback which evaluates my model each batch and stores the output, and when I plot it over time it does not appear to blow up or do anything strange. It just looks like it's slowly improving, then the training dies.
推荐答案
可能是由损失函数中的零除引起的.通过在分母上添加一个小常数,确保分母始终为正.您可以将K.epsilon()
用于此目的:
Probably it is caused by a division by zero in the loss function. Make sure the denominator is always positive by adding a small constant to it. You can use K.epsilon()
for this purpose:
return r_num / (r_den + K.epsilon())
这篇关于Keras损失仅在时代结束时才变成难解之谜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!