问题描述
当我设置epsilon=10e-8
时,AdamOptimizer
不起作用.当我将其设置为1时,它就可以正常工作.
When I set epsilon=10e-8
, AdamOptimizer
doesn't work. When I set it to 1, it works just fine.
推荐答案
lr_t<-学习率* sqrt(1-beta2 ^ t)/(1-beta1 ^ t)
lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)
m_t<-beta1 * m_ {t-1} +(1-beta1)* g
m_t <- beta1 * m_{t-1} + (1 - beta1) * g
v_t<-beta2 * v_ {t-1} +(1-beta2)* g * g
v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g
其中g是梯度
变量<-变量-lr_t * m_t/(sqrt(v_t)+ epsilon)
variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)
在梯度几乎为零的情况下更新变量时,epsilon将避免上式中的零误差.因此,理想情况下,ε值应该很小.但是,分母中的epsilon较小会进行较大的权重更新,并且在随后的归一化中,较大的权重将始终归一化为1.
The epsilon is to avoid divide by zero error in the above equation while updating the variable when the gradient is almost zero. So, ideally epsilon should be a small value. But, having a small epsilon in the denominator will make larger weight updates and with subsequent normalization larger weights will always be normalized to 1.
所以,我想当您使用小型epsilon进行训练时,优化器将变得不稳定.
So, I guess when you train with small epsilon the optimizer will become unstable.
要权衡的是,您制作的epsilon(和分母)越大,重量更新就越小,因此训练进度会越慢.大多数时候,您希望分母能够变小.通常,ε值大于10e-4会更好.
The trade-off is that the bigger you make epsilon (and the denominator), the smaller the weight updates are and thus slower the training progress will be. Most times you want the denominator to be able to get small. Usually, the epsilon value greater than 10e-4 performs better.
这篇关于epsilon超参数如何影响tf.train.AdamOptimizer?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!