我正在用keras训练2层字符LSTM,以生成与我正在训练的语料库相似的字符序列。但是,当我训练LSTM时,受过训练的LSTM生成的输出会重复相同的顺序。
我已经看到了类似问题的建议,这些问题可以增加LSTM输入序列的长度,增加批处理的大小,添加退出层,并增加退出量。我已经尝试了所有这些方法,但似乎都没有解决此问题。成功的一件事是在生成过程中向LSTM输出的每个向量添加随机噪声向量。这是有道理的,因为LSTM使用上一步的输出来生成下一个输出。但是,通常,如果我添加足够的噪声以使LSTM不再重复产生,输出的质量将大大下降。
我的LSTM培训代码如下:
# [load data from file]
raw_text = collected_statements.lower()
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text + '\b')))
char_to_int = dict((c, i) for i, c in enumerate(chars))
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
seq_in = raw_text[i:i + seq_length]
seq_out = raw_text[i + seq_length]
dataX.append([char_to_int[char] for char in seq_in])
dataY.append(char_to_int[seq_out])
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]),
return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1,
save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fix random seed for reproducibility
seed = 8
numpy.random.seed(seed)
# split into 80% for train and 20% for test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=seed)
# train the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), epochs=18,
batch_size=256, callbacks=callbacks_list)
我的生成代码如下:
filename = "weights-improvement-18-1.5283.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')
int_to_char = dict((i, c) for i, c in enumerate(chars))
# pick a random seed
start = numpy.random.randint(0, len(dataX)-1)
pattern = unpadded_patterns[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(1000):
x = numpy.reshape(pattern, (1, len(pattern), 1))
x = (x / float(n_vocab)) + (numpy.random.rand(1, len(pattern), 1) * 0.01)
prediction = model.predict(x, verbose=0)
index = numpy.argmax(prediction)
#print(index)
result = int_to_char[index]
seq_in = [int_to_char[value] for value in pattern]
sys.stdout.write(result)
pattern.append(index)
pattern = pattern[1:len(pattern)]
print("\nDone.")
运行生成代码时,我会一遍又一遍地得到相同的序列:
we have the best economy in the history of our country." "we have the best
economy in the history of our country." "we have the best economy in the
history of our country." "we have the best economy in the history of our
country." "we have the best economy in the history of our country." "we
have the best economy in the history of our country." "we have the best
economy in the history of our country." "we have the best economy in the
history of our country." "we have the best economy in the history of our
country."
除了重复相同的序列外,还有什么我可以尝试的方法,可以帮助生成某些东西吗?
最佳答案
在您的角色生成中,我建议您从模型输出的概率中取样,而不是直接采用argmax
。这就是keras example char-rnn为获得多样性所做的事情。
这是他们在示例中用于采样的代码:
def sample(preds, temperature=1.0):
# helper function to sample an index from a probability array
preds = np.asarray(preds).astype('float64')
preds = np.log(preds) / temperature
exp_preds = np.exp(preds)
preds = exp_preds / np.sum(exp_preds)
probas = np.random.multinomial(1, preds, 1)
return np.argmax(probas)
在您的代码中,您有
index = numpy.argmax(prediction)
我建议仅将其替换为
index = sample(prediction)
并尝试使用您选择的温度。请记住,较高的温度会使您的输出更加随机,而较低的温度会使您的输出更加随机。关于python - 字符LSTM不断生成相同的字符序列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54030842/