问题描述
在LSTM网络(了解LSTM )中,为什么输入门和输出门使用tanh?这背后的直觉是什么?这只是一个非线性变换?如果是,我可以同时将其更改为另一个激活功能(例如ReLU)吗?
In LSTM Network (Understanding LSTMs), Why input gate and output gate use tanh? what is the intuition behind this?it is just a nonlinear transformation?if it is, can I change both to another activation function (e.g. ReLU)?
推荐答案
Sigmoid
专门用作LSTM
中3个门(进,出,忘记)的门控功能,因为它输出的值介于0和1,它要么使整个闸门不流通,要么使信息流通不畅.另一方面,要克服消失的梯度问题,我们需要一个函数,该函数的二阶导数在变为零之前可以维持很长的范围. Tanh
是具有上述属性的良好功能.
Sigmoid
specifically, is used as the gating function for the 3 gates(in, out, forget) in LSTM
, since it outputs a value between 0 and 1, it can either let no flow or complete flow of information throughout the gates. On the other hand, to overcome the vanishing gradient problem, we need a function whose second derivative can sustain for a long range before going to zero. Tanh
is a good function with the above property.
一个好的神经元单元应该是有界的,易于区分的,单调的(有利于凸优化)并且易于处理.如果您考虑这些品质,那么我相信您可以使用ReLU
代替tanh
函数,因为它们是彼此很好的替代品.但是在选择激活功能之前,您必须知道选择相对于其他功能的优缺点是什么.我将简要介绍一些激活功能及其优势.
A good neuron unit should be bounded, easily differentiable, monotonic (good for convex optimization) and easy to handle. If you consider these qualities, then i believe you can use ReLU
in place of tanh
function since they are very good alternatives of each other. But before making a choice for activation functions, you must know what are the advantages and disadvantages of your choice over others. I am shortly describing some of the activation functions and their advantages.
Sigmoid
数学表达式:sigmoid(z) = 1 / (1 + exp(-z))
一阶导数:sigmoid'(z) = -exp(-z) / 1 + exp(-z)^2
优势:
(1) Sigmoid function has all the fundamental properties of a good activation function.
Tanh
数学表达式:tanh(z) = [exp(z) - exp(-z)] / [exp(z) + exp(-z)]
一阶导数:tanh'(z) = 1 - ([exp(z) - exp(-z)] / [exp(z) + exp(-z)])^2 = 1 - tanh^2(z)
优势:
(1) Often found to converge faster in practice
(2) Gradient computation is less expensive
Hard Tanh
数学表达式:hardtanh(z) = -1 if z < -1; z if -1 <= z <= 1; 1 if z > 1
一阶导数:hardtanh'(z) = 1 if -1 <= z <= 1; 0 otherwise
优势:
(1) Computationally cheaper than Tanh
(2) Saturate for magnitudes of z greater than 1
ReLU
数学表达式:relu(z) = max(z, 0)
一阶导数:relu'(z) = 1 if z > 0; 0 otherwise
优势:
(1) Does not saturate even for large values of z
(2) Found much success in computer vision applications
泄漏的ReLU
数学表达式:leaky(z) = max(z, k dot z) where 0 < k < 1
一阶导数:relu'(z) = 1 if z > 0; k otherwise
优势:
(1) Allows propagation of error for non-positive z which ReLU doesn't
此论文介绍了一些有趣的激活功能.您可以考虑阅读它.
This paper explains some fun activation function. You may consider to read it.
这篇关于在LSTM中使用tanh的直觉是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!