问题描述
我是深度学习的初学者.我知道在常规的神经网络中,人们会在激活之前使用批处理规范,这将减少对良好的权重初始化的依赖.我想知道当我使用它时是否会对RNN/lstm RNN起到同样的作用.有人有经验吗?谢谢.
I am a beginner in deep learning.I know in regular neural nets people use batch norm before activation and it will reduce the reliance on good weight initialization. I wonder if it would do the same to RNN/lstm RNN when i use it. Does anyone have any experience with it? Thank you.
推荐答案
应用于RNN的批处理归一化与应用于CNN的批处理归一化:您可以以保持层的循环/卷积属性的方式计算统计信息在应用BN之后.
Batch normalization applied to RNNs is similar to batch normalization applied to CNNs: you compute the statistics in such a way that the recurrent/convolutional properties of the layer still hold after BN is applied.
对于CNN,这意味着不仅要在微型批次上,而且还要在两个空间维度上计算相关统计信息;换句话说,将归一化应用于通道维.
For CNNs, this means computing the relevant statistics not just over the mini-batch, but also over the two spatial dimensions; in other words, the normalization is applied over the channels dimension.
对于RNN,这意味着在小批量和时间/步长维度上计算相关统计信息,因此仅对向量深度进行归一化.这也意味着您只能对转换后的输入进行批量归一化(例如在垂直方向上,例如BN(W_x * x)
),因为水平(跨时间)连接是时间相关的,不应仅仅进行平均.
For RNNs, this means computing the relevant statistics over the mini-batch and the time/step dimension, so the normalization is applied only over the vector depths. This also means that you only batch normalize the transformed input (so in the vertical directions, e.g. BN(W_x * x)
) since the horizontal (across time) connections are time-dependent and shouldn't just be plainly averaged.
这篇关于在RNN/lstm RNN中使用批处理规范化是正常的吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!