如何使用 keras 构建注意力模型?

本文介绍了如何使用 keras 构建注意力模型?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试理解注意力模型并自己构建一个.经过多次搜索，我发现了这个网站用 keras 编码的模型，看起来也很简单.但是当我试图在我的机器上构建相同的模型时，它给出了多个参数错误.该错误是由于在类 Attention 中传递的参数不匹配.在网站的注意力类中，它要求一个参数，但它用两个参数启动注意力对象.

I am trying to understand attention model and also build one myself. After many searches I came across this website which had an atteniton model coded in keras and also looks simple. But when I tried to build that same model in my machine its giving multiple argument error. The error was due to the mismatched argument passing in class Attention. In the website's attention class it's asking for one argument but it initiates the attention object with two arguments.

import tensorflow as tf

max_len = 200
rnn_cell_size = 128
vocab_size=250

class Attention(tf.keras.Model):
    def __init__(self, units):
        super(Attention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)
    def call(self, features, hidden):
        hidden_with_time_axis = tf.expand_dims(hidden, 1)
        score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))
        attention_weights = tf.nn.softmax(self.V(score), axis=1)
        context_vector = attention_weights * features
        context_vector = tf.reduce_sum(context_vector, axis=1)
        return context_vector, attention_weights

sequence_input = tf.keras.layers.Input(shape=(max_len,), dtype='int32')

embedded_sequences = tf.keras.layers.Embedding(vocab_size, 128, input_length=max_len)(sequence_input)

lstm = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM
                                     (rnn_cell_size,
                                      dropout=0.3,
                                      return_sequences=True,
                                      return_state=True,
                                      recurrent_activation='relu',
                                      recurrent_initializer='glorot_uniform'), name="bi_lstm_0")(embedded_sequences)

lstm, forward_h, forward_c, backward_h, backward_c = tf.keras.layers.Bidirectional
    (tf.keras.layers.LSTM
     (rnn_cell_size,
      dropout=0.2,
      return_sequences=True,
      return_state=True,
      recurrent_activation='relu',
      recurrent_initializer='glorot_uniform'))(lstm)

state_h = tf.keras.layers.Concatenate()([forward_h, backward_h])
state_c = tf.keras.layers.Concatenate()([forward_c, backward_c])

#  PROBLEM IN THIS LINE
context_vector, attention_weights = Attention(lstm, state_h)

output = keras.layers.Dense(1, activation='sigmoid')(context_vector)

model = keras.Model(inputs=sequence_input, outputs=output)

# summarize layers
print(model.summary())

我怎样才能使这个模型工作?

How can I make this model work?

推荐答案

初始化注意力层和传递参数的方式有问题.你应该在这个地方指定attention layer单元的数量并修改传入参数的方式:

There is a problem with the way you initialize attention layer and pass parameters. You should specify the number of attention layer units in this place and modify the way of passing in parameters：

context_vector, attention_weights = Attention(32)(lstm, state_h)

结果:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 200)          0
__________________________________________________________________________________________________
embedding (Embedding)           (None, 200, 128)     32000       input_1[0][0]
__________________________________________________________________________________________________
bi_lstm_0 (Bidirectional)       [(None, 200, 256), ( 263168      embedding[0][0]
__________________________________________________________________________________________________
bidirectional (Bidirectional)   [(None, 200, 256), ( 394240      bi_lstm_0[0][0]
                                                                 bi_lstm_0[0][1]
                                                                 bi_lstm_0[0][2]
                                                                 bi_lstm_0[0][3]
                                                                 bi_lstm_0[0][4]
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 256)          0           bidirectional[0][1]
                                                                 bidirectional[0][3]
__________________________________________________________________________________________________
attention (Attention)           [(None, 256), (None, 16481       bidirectional[0][0]
                                                                 concatenate[0][0]
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 1)            257         attention[0][0]
==================================================================================================
Total params: 706,146
Trainable params: 706,146
Non-trainable params: 0
__________________________________________________________________________________________________
None

这篇关于如何使用 keras 构建注意力模型?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！