问题描述
有时我看到某些型号使用的是SpatialDropout1D
而不是Dropout
.例如,在语音标记神经网络部分中,他们使用:
Occasionally I see some models are using SpatialDropout1D
instead of Dropout
. For example, in the Part of speech tagging neural network, they use:
model = Sequential()
model.add(Embedding(s_vocabsize, EMBED_SIZE,
input_length=MAX_SEQLEN))
model.add(SpatialDropout1D(0.2)) ##This
model.add(GRU(HIDDEN_SIZE, dropout=0.2, recurrent_dropout=0.2))
model.add(RepeatVector(MAX_SEQLEN))
model.add(GRU(HIDDEN_SIZE, return_sequences=True))
model.add(TimeDistributed(Dense(t_vocabsize)))
model.add(Activation("softmax"))
根据Keras的文档,内容如下:
According to Keras' documentation, it says:
但是,我无法理解 entrie 1D功能的含义.更具体地说,我无法以法定人数.有人可以使用与法定人数相同的模型来解释这个概念吗?
However, I am unable to understand the meaning of entrie 1D feature. More specifically, I am unable to visualize SpatialDropout1D
in the same model explained in quora.Can someone explain this concept by using the same model as in quora?
此外,在什么情况下我们将使用SpatialDropout1D
代替Dropout
?
Also, under what situation we will use SpatialDropout1D
instead of Dropout
?
推荐答案
噪声形状
为了理解SpatialDropout1D
,您应该习惯于噪声形状的概念.在普通的香草辍学中,每个元素都是独立保存或删除的.例如,如果张量为[2, 2, 2]
,则可以根据随机硬币翻转(具有某些正面"概率)将8个元素中的每一个清零.总共会有8次独立的硬币翻转,从0
到8
的任何数量的值都可能变为零.
The noise shape
In order to understand SpatialDropout1D
, you should get used to the notion of the noise shape. In plain vanilla dropout, each element is kept or dropped independently. For example, if the tensor is [2, 2, 2]
, each of 8 elements can be zeroed out depending on random coin flip (with certain "heads" probability); in total, there will be 8 independent coin flips and any number of values may become zero, from 0
to 8
.
有时还需要做更多的事情.例如,可能需要沿0
轴放置整个切片.在这种情况下,noise_shape
是[1, 2, 2]
,并且退出仅涉及4次独立的随机硬币翻转.第一个组件将保持在一起或一起下降.归零元素的数量可以是0
,2
,4
,6
或8
.不能是1
或5
.
Sometimes there is a need to do more than that. For example, one may need to drop the whole slice along 0
axis. The noise_shape
in this case is [1, 2, 2]
and the dropout involves only 4 independent random coin flips. The first component will either be kept together or be dropped together. The number of zeroed elements can be 0
, 2
, 4
, 6
or 8
. It cannot be 1
or 5
.
查看此问题的另一种方式是假设输入张量实际上是[2, 2]
,但是每个值都是双精度(或多精度).该层不会删除中间的字节,而是会删除完整的多字节值.
Another way to view this is to imagine that input tensor is in fact [2, 2]
, but each value is double-precision (or multi-precision). Instead of dropping the bytes in the middle, the layer drops the full multi-byte value.
以上示例仅用于说明,在实际应用中并不常见.更现实的示例是:shape(x) = [k, l, m, n]
和noise_shape = [k, 1, 1, n]
.在这种情况下,每个批次和通道组件将独立保存,但每个行和列将保留或不保留在一起.换句话说,整个 [l, m]
功能图将被保留或删除.
The example above is just for illustration and isn't common in real applications. More realistic example is this: shape(x) = [k, l, m, n]
and noise_shape = [k, 1, 1, n]
. In this case, each batch and channel component will be kept independently, but each row and column will be kept or not kept together. In other words, the whole [l, m]
feature map will be either kept or dropped.
您可能要这样做以解决相邻像素的相关性,尤其是在早期卷积层中.有效地,您想防止像素与相邻像素在特征图中的共同适应,并使它们像没有其他特征图那样学习.这正是SpatialDropout2D
所做的:它促进了特征图之间的独立性.
You may want to do this to account for adjacent pixels correlation, especially in the early convolutional layers. Effectively, you want to prevent co-adaptation of pixels with its neighbors across the feature maps, and make them learn as if no other feature maps exist. This is exactly what SpatialDropout2D
is doing: it promotes independence between feature maps.
SpatialDropout1D
非常相似:在给定shape(x) = [k, l, m]
的情况下,它使用noise_shape = [k, 1, m]
并删除整个一维特征图.
The SpatialDropout1D
is very similar: given shape(x) = [k, l, m]
it uses noise_shape = [k, 1, m]
and drops entire 1-D feature maps.
参考:使用卷积网络进行有效的对象本地化乔纳森·汤普森(Jonathan Tompson)等人撰写.
Reference: Efficient Object Localization Using Convolutional Networksby Jonathan Tompson at al.
这篇关于如何理解SpatialDropout1D以及何时使用它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!