问题描述
我正在尝试训练2D卷积LSTM以基于视频数据进行分类预测.但是,我的输出层似乎遇到了问题:
I am trying to train a 2D convolutional LSTM to make categorical predictions based on video data. However, my output layer seems to be running into a problem:
"ValueError:检查目标时出错:预期density_1具有5个维,但数组的形状为(1,1939,9)"
"ValueError: Error when checking target: expected dense_1 to have 5 dimensions, but got array with shape (1, 1939, 9)"
我当前的模型基于 ConvLSTM2D示例由Keras团队提供.我认为上述错误是由于我对该示例及其基本原理有误解造成的.
My current model is based off of the ConvLSTM2D example provided by Keras Team. I believe that the above error is the result of my misunderstanding the example and its basic principles.
数据
我有任意数量的视频,其中每个视频包含任意数量的帧.每帧为135x240x1(最后一个颜色通道).这将导致输入形状为(None,None,135、240、1),其中两个"None"值分别是批次大小和时间步长.如果我训练一个具有1052帧的视频,那么我的输入形状将变为(1、1052、135、240、1).
I have an arbitrary number of videos, where each video contains an arbitrary number of frames. Each frame is 135x240x1 (color channels last). This results in an input shape of (None, None, 135, 240, 1), where the two "None" values are batch size and timesteps in that order. If I train on a single video with a 1052 frames, then my input shape becomes (1, 1052, 135, 240, 1).
对于每帧,模型应该在9个类别中预测0到1之间的值.这意味着我的输出形状为(None,None,9).如果我在一个具有1052帧的视频上进行训练,则此形状将变为(1、1052、9).
For each frame, the model should predict values between 0 and 1 across 9 classes. This means that my output shape is (None, None, 9). If I train on a single video with 1052 frames, then this shape becomes (1, 1052, 9).
模型
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D) (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D) (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
dense_1 (Dense) (None, None, 135, 240, 9) 369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240
源代码
model = Sequential()
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
input_shape=(None, 135, 240, 1),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(Dense(
units=classes,
activation='softmax'
))
model.compile(
loss='categorical_crossentropy',
optimizer='adadelta'
)
model.fit_generator(generator=training_sequence)
跟踪
Epoch 1/1
Traceback (most recent call last):
File ".\lstm.py", line 128, in <module>
main()
File ".\lstm.py", line 108, in main
model.fit_generator(generator=training_sequence)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\models.py", line 1253, in fit_generator
initial_epoch=initial_epoch)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 2244, in fit_generator
class_weight=class_weight)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1884, in train_on_batch
class_weight=class_weight)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1487, in _standardize_user_data
exception_prefix='target')
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 113, in _standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking target: expected dense_1 to have 5 dimensions, but got array with shape (1, 1939, 9)
以批量大小设置为1打印的样本输入形状为(1、1389、135、240、1).这种形状符合我上面描述的要求,所以我认为我的Keras Sequence子类(在源代码中为"training_sequence")是正确的.
A sample input shape printed with batch size set to 1 is (1, 1389, 135, 240, 1). This shape matches the requirements I described above, so I think my Keras Sequence subclass (in the source code as "training_sequence") is correct.
我怀疑问题是由我直接从BatchNormalization()转到Dense()引起的.毕竟,回溯表明问题是在density_1(最后一层)中发生的.但是,我不想让任何人误入歧途,所以请带着一点盐进行评估.
I suspect that the problem is caused by my going directly from BatchNormalization() to Dense(). After all, the traceback indicates that the problem is occurring in dense_1 (the final layer). However, I wouldn't want to lead anyone astray with my beginner-level knowledge, so please take my assessment with a grain of salt.
编辑3/27/2018
在阅读了涉及类似模型的此线程后,我更改了最终的ConvLSTM2D层,以使return_sequences参数为设置为False而不是True.我还在我的Dense层之前添加了GlobalAveragePooling2D层.更新后的模型如下:
After reading this thread, which involves a similar model, I changed my final ConvLSTM2D layer so that the return_sequences parameter is set to False instead of True. I also added a GlobalAveragePooling2D layer before my Dense layer. The updated model is as follows:
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D) (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D) (None, 135, 240, 40) 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, 135, 240, 40) 160
_________________________________________________________________
global_average_pooling2d_1 ( (None, 40) 0
_________________________________________________________________
dense_1 (Dense) (None, 9) 369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240
这是回溯的新副本:
Traceback (most recent call last):
File ".\lstm.py", line 131, in <module>
main()
File ".\lstm.py", line 111, in main
model.fit_generator(generator=training_sequence)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\models.py", line 1253, in fit_generator
initial_epoch=initial_epoch)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 2244, in fit_generator
class_weight=class_weight)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1884, in train_on_batch
class_weight=class_weight)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1487, in _standardize_user_data
exception_prefix='target')
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 113, in _standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1, 1034, 9)
我在此运行中打印了x和y形状. x为(1,1034,135,240,1),y为(1,1034,9).这可以缩小问题的范围.看来问题出在y资料,而不是x资料.具体来说,密集层不喜欢时间暗淡.但是,我不确定如何解决此问题.
I printed the x and y shapes on this run. x was (1, 1034, 135, 240, 1) and y was (1, 1034, 9). This may narrow the problem down. It looks like the problem is related to the y data rather than the x data. Specifically, the Dense layer doesn't like the temporal dim. However, I am not sure how to rectify this issue.
编辑2018年3月28日
Yu-Yang的解决方案奏效了.对于有类似问题的任何人,想要查看最终模型是什么样的,以下是摘要:
Yu-Yang's solution worked. For anyone with a similar problem who wants to see what the final model looked like, here is the summary:
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D) (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D) (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
average_pooling3d_1 (Average (None, None, 1, 1, 40) 0
_________________________________________________________________
reshape_1 (Reshape) (None, None, 40) 0
_________________________________________________________________
dense_1 (Dense) (None, None, 9) 369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240
此外,源代码:
model = Sequential()
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
input_shape=(None, 135, 240, 1),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(AveragePooling3D((1, 135, 240)))
model.add(Reshape((-1, 40)))
model.add(Dense(
units=9,
activation='sigmoid'))
model.compile(
loss='categorical_crossentropy',
optimizer='adadelta'
)
推荐答案
如果要每帧进行预测,则绝对应在最后一个ConvLSTM2D
层中设置return_sequences=True
.
If you want a prediction per frame, then you should definitely set return_sequences=True
in your last ConvLSTM2D
layer.
对于目标形状上的ValueError
,将GlobalAveragePooling2D()
层替换为AveragePooling3D((1, 135, 240))
加Reshape((-1, 40))
,以使输出形状与目标阵列兼容.
For the ValueError
on target shape, replace the GlobalAveragePooling2D()
layer with AveragePooling3D((1, 135, 240))
plus Reshape((-1, 40))
to make the output shape compatible with your target array.
这篇关于Keras ConvLSTM2D:输出层上的ValueError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!