我目前正在使用从one discussion on github获得的这段代码
这是注意力机制的代码:
_input = Input(shape=[max_length], dtype='int32')
# get the embedding layer
embedded = Embedding(
input_dim=vocab_size,
output_dim=embedding_size,
input_length=max_length,
trainable=False,
mask_zero=False
)(_input)
activations = LSTM(units, return_sequences=True)(embedded)
# compute importance for each step
attention = Dense(1, activation='tanh')(activations)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)
sent_representation = merge([activations, attention], mode='mul')
sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)
probabilities = Dense(3, activation='softmax')(sent_representation)
这是正确的方法吗?我有点期待时间分布层的存在,因为关注机制分布在RNN的每个时间步中。我需要有人确认此实现(代码)是注意力机制的正确实现。谢谢你。
最佳答案
如果您想在时间维度上关注,那么这段代码对我来说似乎是正确的:
activations = LSTM(units, return_sequences=True)(embedded)
# compute importance for each step
attention = Dense(1, activation='tanh')(activations)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)
sent_representation = merge([activations, attention], mode='mul')
您已经计算出
(batch_size, max_length)
形状的注意力向量:attention = Activation('softmax')(attention)
我以前从未看过这段代码,所以我不能说这段代码是否正确:
K.sum(xin, axis=-2)
进一步阅读(您可以看一下):
关于python - 如何在keras中添加注意力机制?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/42918446/