我正在尝试将第k个动作数据集提供给cnn.我在重塑数据时遇到困难.我创建了这个数组(99,75,120,160)type = uint8,即99个属于一个类的视频,每个视频有75帧,每帧120x160尺寸.
I am trying to feed kth action dataset to a cnn. I am having difficulty with reshaping the data. I have created this array (99,75,120,160) type=uint8 ie, 99 videos belonging to a class with each video having 75 frames, 120x160 dimension for each frame.
model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same'),
###need to reshape data in input_shape
should i specify a dense layer first?
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(LSTM(units=64, return_sequences=True))
model.add(TimeDistributed(Reshape((8, 8, 1))))
model.add(TimeDistributed(Conv2D(16, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(Conv2D(32, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(Conv2D(64, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(Conv2D(1, (3,3), padding='same')))
model.compile(optimizer='adam', loss='mse')
data = np.load(r"C:\Users\shj_k\Desktop\Project\handclapping.npy")
print (data.shape)
(x_train,x_test) = train_test_split(data)
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
print (x_train.shape)
print (x_test.shape)
model.fit(x_train, x_train,
validation_data=(x_test, x_test))
变量是x_test(25,75,120,160)type = float32x_train(74,75,120,160)type = float32
the variables arex_test (25,75,120,160) type=float32x_train (74,75,120,160) type=float32
complete error for the one in comment is
文件",第1行,在 运行文件('C:/Users/shj_k/Desktop/Project/cnn_lstm.py',wdir ='C:/Users/shj_k/Desktop/Project')
File "", line 1, in runfile('C:/Users/shj_k/Desktop/Project/cnn_lstm.py', wdir='C:/Users/shj_k/Desktop/Project')
文件 "C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ spyder_kernels \ customize \ spydercustomize.py", 运行文件中的第668行 execfile(文件名,命名空间)
File "C:\Users\shj_k\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile execfile(filename, namespace)
文件 "C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ spyder_kernels \ customize \ spydercustomize.py", execfile中的第108行 exec(compile(f.read(),文件名,'exec'),命名空间)
File "C:\Users\shj_k\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
文件"C:/Users/shj_k/Desktop/Project/cnn_lstm.py",第63行,在 validation_data =(x_test,x_test))
File "C:/Users/shj_k/Desktop/Project/cnn_lstm.py", line 63, in validation_data=(x_test, x_test))
文件 "C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training.py", 符合规定的952号线 batch_size = batch_size)
File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine\training.py", line 952, in fit batch_size=batch_size)
文件 "C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training.py", _standardize_user_data中的第751行 exception_prefix ='input')
File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine\training.py", line 751, in _standardize_user_data exception_prefix='input')
文件 "C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training_utils.py", 第128行,位于standardize_input_data中 'with shape'+ str(data_shape))
File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine\training_utils.py", line 128, in standardize_input_data 'with shape ' + str(data_shape))
ValueError:检查输入时出错:预期 time_distributed_403_input具有5个维度,但具有 形状(74、75、120、160)
ValueError: Error when checking input: expected time_distributed_403_input to have 5 dimensions, but got array with shape (74, 75, 120, 160)
The TimeDistributed layer in Keras needs a time dimension, so for video image processing this could be 75 here (the frames).
它还希望图像以形状(120、60、3)发送.因此,TimeDistributed图层的input_shape应该为(75、120、160、3). 3代表RGB通道.如果您有灰度图像,则最后一个尺寸应为1.
It also expects images to be sent in shape (120, 60, 3). So the TimeDistributed layer input_shape should be (75, 120, 160, 3). 3 stands for the RGB channels. If you have greyscale images, 1 as the last dimension should work.
The input_shape always ignores the "row" dimension of your examples, in your case 99.
To check the output shapes created by each layer of the model, put model.summary()
after compiling it.
请参阅: https://www.tensorflow.org/api_docs /python/tf/keras/layers/TimeDistributed
You can convert images into numpy arrays with shape (X, Y, 3) using Keras.preprocessing.image.
from keras.preprocessing import image
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 160))
# convert PIL.Image.Image type to 3D tensor with shape (120, 160, 3)
x = image.img_to_array(img)
更新 :似乎必须使所有图像平方(128,128,1)的原因是在model.fit()中,训练示例(x_train)和标签(通常是y_train)是同一组.如果查看下面的模型摘要,则在展平"层之后,所有内容都会变成正方形.因此,期望标签为正方形.这是有道理的:使用此模型进行预测会将(120,160,1)图像转换为形状(128、128、1)的图像.因此,将模型训练更改为以下代码应该可行:
Update:It seems the reason you had to make all images squared (128,128,1) is that in model.fit(), training examples (x_train) and labels (normally y_train) are the same set. If you look at the model summary below, after the Flatten layer everything becomes a square. It is therefore expecting labels to be squares. It makes sense: using this model for prediction would transform a (120,160,1) image into something of the shape (128, 128, 1). Changing model training to below code should therefore work:
x_train = random.random((90, 5, 120, 160, 1)) # training data
y_train = random.random((90, 5, 128, 128, 1)) # labels
model.fit(x_train, y_train)
Layer (type) Output Shape Param #
time_distributed_1 (TimeDist (None, 5, 120, 160, 64) 320
time_distributed_2 (TimeDist (None, 5, 60, 80, 64) 0
time_distributed_3 (TimeDist (None, 5, 60, 80, 32) 18464
time_distributed_4 (TimeDist (None, 5, 30, 40, 32) 0
time_distributed_5 (TimeDist (None, 5, 30, 40, 16) 4624
time_distributed_6 (TimeDist (None, 5, 15, 20, 16) 0
time_distributed_7 (TimeDist (None, 5, 4800) 0
lstm_1 (LSTM) (None, 5, 64) 1245440
time_distributed_8 (TimeDist (None, 5, 8, 8, 1) 0
time_distributed_9 (TimeDist (None, 5, 16, 16, 1) 0
time_distributed_10 (TimeDis (None, 5, 16, 16, 16) 160
time_distributed_11 (TimeDis (None, 5, 32, 32, 16) 0
time_distributed_12 (TimeDis (None, 5, 32, 32, 32) 4640
time_distributed_13 (TimeDis (None, 5, 64, 64, 32) 0
time_distributed_14 (TimeDis (None, 5, 64, 64, 64) 18496
time_distributed_15 (TimeDis (None, 5, 128, 128, 64) 0
time_distributed_16 (TimeDis (None, 5, 128, 128, 1) 577
Total params: 1,292,721
Trainable params: 1,292,721
Non-trainable params: 0
更新2 :要使其在不更改y的情况下处理非正方形图像,请设置LSTM(300),Reshape(15、20、1),然后删除Conv2D +上采样层之一.然后,即使在自动编码器中,也可以使用形状为(120,160)的图片.技巧是查看模型摘要,并确保在LSTM之后以正确的形状开始,以便在添加所有其他层之后,最终结果是形状为(120,160).
Update 2:To make it work with non-square images without changing y, set LSTM(300), Reshape(15, 20, 1), and you remove one of the Conv2D + Upsampling layers afterwards. Then you can use pictures with shape (120,160) even in an autoencoder. The trick is to look at the model summary, and make sure after the LSTM you start with the right shape so that after adding all the other layers, the end result is a shape of (120,160).
model = Sequential()
TimeDistributed(Conv2D(64, (2, 2), activation="relu", padding="same"), =(5, 120, 160, 1)))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(LSTM(units=300, return_sequences=True))
model.add(TimeDistributed(Reshape((15, 20, 1))))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(1, (3, 3), padding='same')))
model.compile(optimizer='adam', loss='mse')
x_train = random.random((90, 5, 120, 160, 1))
y_train = random.random((90, 5, 120, 160, 1))
model.fit(x_train, y_train)