本文介绍了在LSTM中使用tanh有什么直觉?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在LSTM网络(了解LSTM )中,为什么输入门和输出门使用tanh?

In an LSTM network (Understanding LSTMs), why does the input gate and output gate use tanh?

这背后的直觉是什么?

这只是一个非线性变换吗?如果是,我可以同时将其更改为另一个激活功能(例如ReLU)吗?

It is just a nonlinear transformation? If it is, can I change both to another activation function (e.g., ReLU)?

推荐答案

Sigmoid 专门用作 LSTM ,因为它输出的值介于0到1之间,所以它可以让无流量或完整流量信息遍布整个大门.

Sigmoid specifically, is used as the gating function for the three gates (in, out, and forget) in LSTM, since it outputs a value between 0 and 1, and it can either let no flow or complete flow of information throughout the gates.

另一方面,要克服消失的梯度问题,我们需要一个函数,该函数的二阶导数在变为零之前可以维持很长的范围. Tanh 具有上述特性,是一个很好的功能.

On the other hand, to overcome the vanishing gradient problem, we need a function whose second derivative can sustain for a long range before going to zero. Tanh is a good function with the above property.

一个好的神经元单元应该是有界的,易于区分的,单调的(有利于凸优化)并且易于处理.如果您考虑这些品质,那么我相信您可以使用 ReLU 代替 tanh 函数,因为它们可以很好地相互替代.

A good neuron unit should be bounded, easily differentiable, monotonic (good for convex optimization) and easy to handle. If you consider these qualities, then I believe you can use ReLU in place of the tanh function since they are very good alternatives of each other.

但是,在选择激活功能之前,您必须知道选择所拥有的优缺点是什么.我将简要介绍一些激活功能及其优势.

But before making a choice for activation functions, you must know what the advantages and disadvantages of your choice over others are. I am shortly describing some of the activation functions and their advantages.

Sigmoid

数学表达式: sigmoid(z)= 1/(1 + exp(-z))

一阶导数: sigmoid'(z)= -exp(-z)/1 + exp(-z)^ 2

优势:

(1) The sigmoid function has all the fundamental properties of a good activation function.

Tanh

数学表达式: tanh(z)= [exp(z)-exp(-z)]/[exp(z)+ exp(-z)]

一阶导数: tanh'(z)= 1-([exp(z)-exp(-z)]/[exp(z)+ exp(-z)])^ 2 = 1-tanh ^ 2(z)

优势:

(1) Often found to converge faster in practice
(2) Gradient computation is less expensive

Hard Tanh

数学表达式:如果z <,则 hardtanh(z)= -1.-1;如果-1≤z≤1,则为z;如果z>则为1.1

Mathematical expression: hardtanh(z) = -1 if z < -1; z if -1 <= z <= 1; 1 if z > 1

一阶导数:如果-1≤z≤1,则 hardtanh'(z)= 1;0否则

First-order derivative: hardtanh'(z) = 1 if -1 <= z <= 1; 0 otherwise

优势:

(1) Computationally cheaper than Tanh
(2) Saturate for magnitudes of z greater than 1

ReLU

数学表达式: relu(z)= max(z,0)

一阶导数:如果z>则relu'(z)= 1.0;0否则

First-order derivative: relu'(z) = 1 if z > 0; 0 otherwise

优势:

(1) Does not saturate even for large values of z
(2) Found much success in computer vision applications

泄漏的ReLU

数学表达式: leaky(z)= max(z,k点z)其中0<k <1

一阶导数:如果z>则relu'(z)= 1.0;k否则

First-order derivative: relu'(z) = 1 if z > 0; k otherwise

优势:

(1) Allows propagation of error for non-positive z which ReLU doesn't

本文介绍了一些有趣的激活功能.您可以考虑阅读它.

This paper explains some fun activation function. You may consider to read it.

这篇关于在LSTM中使用tanh有什么直觉?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-27 21:14