python - Keras GRU/LSTM层输入尺寸错误

我是深度学习的新手，我一直在尝试使用用于自然语言处理和路透数据集的深度学习方法创建一个简单的情感分析器。这是我的代码：

import numpy as np
from keras.datasets import reuters
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Dense, Dropout, GRU
from keras.utils import np_utils
max_length=3000
vocab_size=100000
epochs=10
batch_size=32
validation_split=0.2
(x_train, y_train), (x_test, y_test) = reuters.load_data(path="reuters.npz",
                                                         num_words=vocab_size,
                                                         skip_top=5,
                                                         maxlen=None,
                                                         test_split=0.2,
                                                         seed=113,
                                                         start_char=1,
                                                         oov_char=2,
                                                         index_from=3)

tokenizer = Tokenizer(num_words=max_length)

x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')
y_train = np_utils.to_categorical(y_train, 50)
y_test = np_utils.to_categorical(y_test, 50)


model = Sequential()
model.add(GRU(50, input_shape = (49,1), return_sequences = True))
model.add(Dropout(0.2))
model.add(Dense(256, input_shape=(max_length,), activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(50, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
model.summary()

history = model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=validation_split)

score = model.evaluate(x_test, y_test)
print('Test Accuracy:', round(score[1]*100,2))

我不明白的原因是，为什么每次尝试使用GRU或LSTM单元而不是密集单元时，都会出现此错误：

ValueError：检查输入时出错：预期gru_1_input具有3
尺寸，但具有形状的数组（8982，3000）

我在网上看到添加return_sequences = True可以解决问题，但是如您所见，问题仍然存在。

在这种情况下我该怎么办？

最佳答案

问题在于x_train的形状为(8982, 3000)，因此（考虑到预处理阶段）意味着有8982个句子被编码为具有vocab大小的3000个热向量。另一方面，GRU（或LSTM）图层接受序列作为输入，因此其输入形状应为(batch_size, num_timesteps or sequence_length, feature_size)。当前，您所拥有的功能是句子中特定单词的存在（1）或不存在（0）。因此，要使其与GRU一起使用，您需要在x_train和x_test中添加第三维：

x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

然后删除该return_sequences=True并将GRU的输入形状更改为input_shape=(3000,1)。这样，您就告诉GRU层，您正在处理长度为3000的序列，其中每个元素都包含一个单一要素。（作为旁注，我认为您应该将vocab_size传递给num_words的Tokenizer参数。这表示词汇中的单词数。相反，请将max_length传递给maxlen的load_data参数，这限制了句子的长度。）

但是，我认为如果将Embedding layer用作第一层并且位于GRU层之前，可能会得到更好的结果。这是因为当前对句子进行编码的方式并未考虑句子中单词的顺序（它只是在乎它们的存在）。因此，依靠这种表示来馈送依赖于元素顺序的GRU或LSTM层是没有意义的。