在RNN/lstm RNN中使用批处理规范化是正常的吗? | RNN中使用批处理规范化是正常的吗

本文介绍了在RNN/lstm RNN中使用批处理规范化是正常的吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是深度学习的初学者.我知道在常规的神经网络中，人们会在激活之前使用批处理规范，这将减少对良好的权重初始化的依赖.我想知道当我使用它时是否会对RNN/lstm RNN起到同样的作用.有人有经验吗?谢谢.

I am a beginner in deep learning.I know in regular neural nets people use batch norm before activation and it will reduce the reliance on good weight initialization. I wonder if it would do the same to RNN/lstm RNN when i use it. Does anyone have any experience with it? Thank you.

推荐答案

应用于RNN的批处理归一化与应用于CNN的批处理归一化:您可以以保持层的循环/卷积属性的方式计算统计信息在应用BN之后.

Batch normalization applied to RNNs is similar to batch normalization applied to CNNs: you compute the statistics in such a way that the recurrent/convolutional properties of the layer still hold after BN is applied.

对于CNN，这意味着不仅要在微型批次上，而且还要在两个空间维度上计算相关统计信息；换句话说，将归一化应用于通道维.

For CNNs, this means computing the relevant statistics not just over the mini-batch, but also over the two spatial dimensions; in other words, the normalization is applied over the channels dimension.

对于RNN，这意味着在小批量和时间/步长维度上计算相关统计信息，因此仅对向量深度进行归一化.这也意味着您只能对转换后的输入进行批量归一化(例如在垂直方向上，例如BN(W_x * x))，因为水平(跨时间)连接是时间相关的，不应仅仅进行平均.

For RNNs, this means computing the relevant statistics over the mini-batch and the time/step dimension, so the normalization is applied only over the vector depths. This also means that you only batch normalize the transformed input (so in the vertical directions, e.g. BN(W_x * x)) since the horizontal (across time) connections are time-dependent and shouldn't just be plainly averaged.

这篇关于在RNN/lstm RNN中使用批处理规范化是正常的吗?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！