本文介绍了在 LSTM 中使用 tanh 的直觉是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 LSTM 网络中(了解 LSTM),为什么输入门和输出门使用tanh?

In an LSTM network (Understanding LSTMs), why does the input gate and output gate use tanh?

这背后的直觉是什么?

这只是非线性变换吗?如果是,我可以将两者都更改为另一个激活函数(例如 ReLU)吗?

It is just a nonlinear transformation? If it is, can I change both to another activation function (e.g., ReLU)?

推荐答案

Sigmoid 具体来说,用作,因为它输出一个介于 0 和 1 之间的值,它可以不让流或完全流整个大门的信息.

Sigmoid specifically, is used as the gating function for the three gates (in, out, and forget) in LSTM, since it outputs a value between 0 and 1, and it can either let no flow or complete flow of information throughout the gates.

另一方面,为了克服梯度消失的问题,我们需要一个函数,其二阶导数在趋于零之前可以维持很长时间.Tanh 是具有上述性质的好函数.

On the other hand, to overcome the vanishing gradient problem, we need a function whose second derivative can sustain for a long range before going to zero. Tanh is a good function with the above property.

一个好的神经元单元应该是有界的、容易区分的、单调的(有利于凸优化)并且易于处理.如果您考虑这些特性,那么我相信您可以使用 ReLU 代替 tanh 函数,因为它们是彼此非常好的替代品.

A good neuron unit should be bounded, easily differentiable, monotonic (good for convex optimization) and easy to handle. If you consider these qualities, then I believe you can use ReLU in place of the tanh function since they are very good alternatives of each other.

但是在选择激活函数之前,您必须知道您的选择与其他选择相比有哪些优点和缺点.我将简要介绍一些激活函数及其优点.

But before making a choice for activation functions, you must know what the advantages and disadvantages of your choice over others are. I am shortly describing some of the activation functions and their advantages.

Sigmoid

数学表达式:sigmoid(z) = 1/(1 + exp(-z))

一阶导数:sigmoid'(z) = -exp(-z)/1 + exp(-z)^2

优点:

(1) The sigmoid function has all the fundamental properties of a good activation function.

Tanh

数学表达式:tanh(z) = [exp(z) - exp(-z)]/[exp(z) + exp(-z)]

一阶导数:tanh'(z) = 1 - ([exp(z) - exp(-z)]/[exp(z) + exp(-z)])^2 = 1- tanh^2(z)

优点:

(1) Often found to converge faster in practice
(2) Gradient computation is less expensive

硬黑

数学表达式:hardtanh(z) = -1 if z 1

一阶导数:hardtanh'(z) = 1 if -1

优点:

(1) Computationally cheaper than Tanh
(2) Saturate for magnitudes of z greater than 1

ReLU

数学表达式:relu(z) = max(z, 0)

一阶导数:relu'(z) = 1 if z >0;0 否则

优点:

(1) Does not saturate even for large values of z
(2) Found much success in computer vision applications

泄漏 ReLU

数学表达式:leaky(z) = max(z, k dot z) 其中 0

一阶导数:relu'(z) = 1 if z >0;k 否则

优点:

(1) Allows propagation of error for non-positive z which ReLU doesn't

这篇论文解释了一些有趣的激活函数.你可以考虑阅读一下.

This paper explains some fun activation function. You may consider to read it.

这篇关于在 LSTM 中使用 tanh 的直觉是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-27 21:15