问题描述
我正在尝试将第k个动作数据集提供给cnn.我在重塑数据时遇到困难.我创建了这个数组(99,75,120,160)type = uint8,即99个属于一个类的视频,每个视频有75帧,每帧120x160尺寸.
I am trying to feed kth action dataset to a cnn. I am having difficulty with reshaping the data. I have created this array (99,75,120,160) type=uint8 ie, 99 videos belonging to a class with each video having 75 frames, 120x160 dimension for each frame.
model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same'),
input_shape=()))
###need to reshape data in input_shape
我应该先指定一个密集层吗?
should i specify a dense layer first?
这是我的代码
model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same'),
input_shape=(75,120,160)))
###need to reshape data in input_shape
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=64, return_sequences=True))
model.add(TimeDistributed(Reshape((8, 8, 1))))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(16, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(32, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(64, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(1, (3,3), padding='same')))
model.compile(optimizer='adam', loss='mse')
data = np.load(r"C:\Users\shj_k\Desktop\Project\handclapping.npy")
print (data.shape)
(x_train,x_test) = train_test_split(data)
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
print (x_train.shape)
print (x_test.shape)
model.fit(x_train, x_train,
epochs=100,
batch_size=1,
shuffle=False,
validation_data=(x_test, x_test))
变量是x_test(25,75,120,160)type = float32x_train(74,75,120,160)type = float32
the variables arex_test (25,75,120,160) type=float32x_train (74,75,120,160) type=float32
评论中的一个完全错误是
complete error for the one in comment is
文件",第1行,在 运行文件('C:/Users/shj_k/Desktop/Project/cnn_lstm.py',wdir ='C:/Users/shj_k/Desktop/Project')
File "", line 1, in runfile('C:/Users/shj_k/Desktop/Project/cnn_lstm.py', wdir='C:/Users/shj_k/Desktop/Project')
文件 "C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ spyder_kernels \ customize \ spydercustomize.py", 运行文件中的第668行 execfile(文件名,命名空间)
File "C:\Users\shj_k\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile execfile(filename, namespace)
文件 "C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ spyder_kernels \ customize \ spydercustomize.py", execfile中的第108行 exec(compile(f.read(),文件名,'exec'),命名空间)
File "C:\Users\shj_k\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
文件"C:/Users/shj_k/Desktop/Project/cnn_lstm.py",第63行,在 validation_data =(x_test,x_test))
File "C:/Users/shj_k/Desktop/Project/cnn_lstm.py", line 63, in validation_data=(x_test, x_test))
文件 "C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training.py", 符合规定的952号线 batch_size = batch_size)
File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine\training.py", line 952, in fit batch_size=batch_size)
文件 "C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training.py", _standardize_user_data中的第751行 exception_prefix ='input')
File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine\training.py", line 751, in _standardize_user_data exception_prefix='input')
文件 "C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training_utils.py", 第128行,位于standardize_input_data中 'with shape'+ str(data_shape))
File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine\training_utils.py", line 128, in standardize_input_data 'with shape ' + str(data_shape))
ValueError:检查输入时出错:预期 time_distributed_403_input具有5个维度,但具有 形状(74、75、120、160)
ValueError: Error when checking input: expected time_distributed_403_input to have 5 dimensions, but got array with shape (74, 75, 120, 160)
谢谢您的回复
推荐答案
以下几点:
Keras中的TimeDistributed层需要一个时间维度,因此对于视频图像处理,此处可能是75(帧).
The TimeDistributed layer in Keras needs a time dimension, so for video image processing this could be 75 here (the frames).
它还希望图像以形状(120、60、3)发送.因此,TimeDistributed图层的input_shape应该为(75、120、160、3). 3代表RGB通道.如果您有灰度图像,则最后一个尺寸应为1.
It also expects images to be sent in shape (120, 60, 3). So the TimeDistributed layer input_shape should be (75, 120, 160, 3). 3 stands for the RGB channels. If you have greyscale images, 1 as the last dimension should work.
input_shape始终忽略示例的行"维,在您的情况下为99.
The input_shape always ignores the "row" dimension of your examples, in your case 99.
要检查模型各层创建的输出形状,请在编译后放置model.summary()
.
To check the output shapes created by each layer of the model, put model.summary()
after compiling it.
请参阅: https://www.tensorflow.org/api_docs /python/tf/keras/layers/TimeDistributed
您可以使用Keras.preprocessing.image将图像转换为形状为(X,Y,3)的numpy数组.
You can convert images into numpy arrays with shape (X, Y, 3) using Keras.preprocessing.image.
from keras.preprocessing import image
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 160))
# convert PIL.Image.Image type to 3D tensor with shape (120, 160, 3)
x = image.img_to_array(img)
更新 :似乎必须使所有图像平方(128,128,1)的原因是在model.fit()中,训练示例(x_train)和标签(通常是y_train)是同一组.如果查看下面的模型摘要,则在展平"层之后,所有内容都会变成正方形.因此,期望标签为正方形.这是有道理的:使用此模型进行预测会将(120,160,1)图像转换为形状(128、128、1)的图像.因此,将模型训练更改为以下代码应该可行:
Update:It seems the reason you had to make all images squared (128,128,1) is that in model.fit(), training examples (x_train) and labels (normally y_train) are the same set. If you look at the model summary below, after the Flatten layer everything becomes a square. It is therefore expecting labels to be squares. It makes sense: using this model for prediction would transform a (120,160,1) image into something of the shape (128, 128, 1). Changing model training to below code should therefore work:
x_train = random.random((90, 5, 120, 160, 1)) # training data
y_train = random.random((90, 5, 128, 128, 1)) # labels
model.fit(x_train, y_train)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_1 (TimeDist (None, 5, 120, 160, 64) 320
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 60, 80, 64) 0
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 60, 80, 32) 18464
_________________________________________________________________
time_distributed_4 (TimeDist (None, 5, 30, 40, 32) 0
_________________________________________________________________
time_distributed_5 (TimeDist (None, 5, 30, 40, 16) 4624
_________________________________________________________________
time_distributed_6 (TimeDist (None, 5, 15, 20, 16) 0
_________________________________________________________________
time_distributed_7 (TimeDist (None, 5, 4800) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 5, 64) 1245440
_________________________________________________________________
time_distributed_8 (TimeDist (None, 5, 8, 8, 1) 0
_________________________________________________________________
time_distributed_9 (TimeDist (None, 5, 16, 16, 1) 0
_________________________________________________________________
time_distributed_10 (TimeDis (None, 5, 16, 16, 16) 160
_________________________________________________________________
time_distributed_11 (TimeDis (None, 5, 32, 32, 16) 0
_________________________________________________________________
time_distributed_12 (TimeDis (None, 5, 32, 32, 32) 4640
_________________________________________________________________
time_distributed_13 (TimeDis (None, 5, 64, 64, 32) 0
_________________________________________________________________
time_distributed_14 (TimeDis (None, 5, 64, 64, 64) 18496
_________________________________________________________________
time_distributed_15 (TimeDis (None, 5, 128, 128, 64) 0
_________________________________________________________________
time_distributed_16 (TimeDis (None, 5, 128, 128, 1) 577
=================================================================
Total params: 1,292,721
Trainable params: 1,292,721
Non-trainable params: 0
更新2 :要使其在不更改y的情况下处理非正方形图像,请设置LSTM(300),Reshape(15、20、1),然后删除Conv2D +上采样层之一.然后,即使在自动编码器中,也可以使用形状为(120,160)的图片.技巧是查看模型摘要,并确保在LSTM之后以正确的形状开始,以便在添加所有其他层之后,最终结果是形状为(120,160).
Update 2:To make it work with non-square images without changing y, set LSTM(300), Reshape(15, 20, 1), and you remove one of the Conv2D + Upsampling layers afterwards. Then you can use pictures with shape (120,160) even in an autoencoder. The trick is to look at the model summary, and make sure after the LSTM you start with the right shape so that after adding all the other layers, the end result is a shape of (120,160).
model = Sequential()
model.add(
TimeDistributed(Conv2D(64, (2, 2), activation="relu", padding="same"), =(5, 120, 160, 1)))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=300, return_sequences=True))
model.add(TimeDistributed(Reshape((15, 20, 1))))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(1, (3, 3), padding='same')))
model.compile(optimizer='adam', loss='mse')
model.summary()
x_train = random.random((90, 5, 120, 160, 1))
y_train = random.random((90, 5, 120, 160, 1))
model.fit(x_train, y_train)
这篇关于如何重塑3通道数据集以输入到神经网络的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!