I have a problem of applying masking layer to CNNs in RNN/LSTM model.
My data is not original image, but I converted into a shape of (16, 34, 4)(channels_first). The data is sequential, and the longest step length is 22. So for invariant way, I set the timestep as 22. Since it may be shorter than 22 steps, I fill others with np.zeros. However, for 0 padding data, it's about half among all dataset, so with 0 paddings, the training cannot reach a very good result with so much useless data. Then I want to add a mask to cancel these 0 padding data.
mask = np.zeros((16,34,4), dtype = np.int8)
input_shape = (22, 16, 34, 4)
model = Sequential()
model.add(TimeDistributed(Masking(mask_value=mask), input_shape=input_shape, name = 'mask'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name = 'conv1'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn1'))
model.add(Dropout(0.5, name = 'drop1'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv2'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn2'))
model.add(Dropout(0.5, name = 'drop2'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv3'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn3'))
model.add(Dropout(0.5, name = 'drop3'))
model.add(TimeDistributed(Flatten(), name = 'flatten'))
model.add(GRU(256, activation='tanh', return_sequences=True, name = 'gru'))
model.add(Dropout(0.4, name = 'drop_gru'))
model.add(Dense(35, activation = 'softmax', name = 'softmax'))
Here's the model structure.
Layer (type) Output Shape Param #
mask (TimeDist (None, 22, 16, 34, 4) 0
conv1 (TimeDistributed) (None, 22, 100, 30, 3) 16100
bn1 (TimeDistributed) (None, 22, 100, 30, 3) 12
drop1 (Dropout) (None, 22, 100, 30, 3) 0
conv2 (TimeDistributed) (None, 22, 100, 26, 2) 100100
bn2 (TimeDistributed) (None, 22, 100, 26, 2) 8
drop2 (Dropout) (None, 22, 100, 26, 2) 0
conv3 (TimeDistributed) (None, 22, 100, 22, 1) 100100
bn3 (TimeDistributed) (None, 22, 100, 22, 1) 4
drop3 (Dropout) (None, 22, 100, 22, 1) 0
flatten (TimeDistributed) (None, 22, 2200) 0
gru (GRU) (None, 22, 256) 1886976
drop_gru (Dropout) (None, 22, 256) 0
softmax (Dense) (None, 22, 35) 8995
Total params: 2,112,295
Trainable params: 2,112,283
Non-trainable params: 12
For mask_value, I tried with either 0 or this mask structure, but neither works and it still trains through all the data with half 0 paddings in it.
Can anyone help me?
B.T.W.,我在这里使用TimeDistributed连接RNN,我知道另一个名为ConvLSTM2D的.有谁知道其中的区别? ConvLSTM2D需要更多的模型参数,并且训练比TimeDistributed慢得多.
B.T.W., I used TimeDistributed here to connect RNN, and I know another one called ConvLSTM2D. Does anyone know the difference? ConvLSTM2D takes much more params for the model, and get training much slower than TimeDistributed...
不幸的是,Keras Conv图层尚不支持遮罩.在Keras Github页面上已经发布了有关此问题的几个问题,这里是一个与该主题进行最实质性的对话.似乎有一些挂起的实现细节,但问题从未解决.
Unfortunately masking is not yet supported by the Keras Conv layers. There have been several issues posted about this on the Keras Github page, here is the one with the most substantial conversation on the topic. It appears that there was some hang up implementation details and the issue was never resolved.
讨论中提出的解决方法是对序列中的填充字符进行显式嵌入,并进行全局池化. 这里是另一个我发现的解决方法(对我的用例没有帮助,但可能对您有帮助)-保留掩码数组以通过乘法合并.
The workaround proposed in the discussion is to have an explicit embedding for the padding character in sequences and do global pooling. Here is another workaround I found (not helpful for my use case but maybe helpful to you) - keeping a mask array to merge through multiplication.
You can also check out the conversation around this question which is similar to yours.