问题描述
我正在研究语言建模,词汇量很大.所以我想使用来自张量流的 sampled_softmax_loss.问题是作为 sampled_softmax_loss 函数参数的 weights 和 biases 似乎不可训练(它们的值在训练后不会改变)
I am working on language modelling and the vocabulary is large. So I want to use sampled_softmax_loss from tensorflow. The problem is that weights and biases which are the arguments of the sampled_softmax_loss function seems not trainable (their values don't change after training)
所以我想我应该将它们添加到由keras Model自动构建的计算图中,但我花了很多时间仍然没有找到合适的方法.
So I guess that I should add them to the computation graph building automatically by keras Model, but I spent a lot of time and still haven't find a proper way to do so.
所以,再来一次.我想将外部可训练的 tf.Variables 添加到 keras 计算图.有谁知道这样做的方法吗?
So, once again. I want to add external trainable tf.Variables to the keras computation graph. Does anyone know the method to do so?
我的模型(头部和尾部)
my model (head and tail)
input_sentence = Input(shape=(INPUT_LENGTH,), dtype='int32')
words = Embedding(embedding_matrix.shape[0], embedding_matrix.shape[1],
weights=[embedding_matrix], trainable=True)(input_sentence)
...
context = Dense(256, activation='tanh')(context)
model = Model(inputs=input_sentence, outputs=context, name=name)
损失
def softmax_fine_loss(labels, logits, transposed_W=None, b=None):
res = tf.map_fn(lambda (__labels, __logits): tf.nn.sampled_softmax_loss(transposed_W, b, __labels, __logits,
num_sampled=1000, num_classes=OUTPUT_COUNT+1),
(labels, logits), dtype=tf.float32)
return res
loss = lambda labels, logits: softmax_fine_loss(labels, logits, transposed_W=transposed_W, b=b)
model_truncated.compile(optimizer=optimizer, loss=loss, sample_weight_mode='temporal')
推荐答案
我终于找到了解决方法
假设我们需要用我们的模型训练权重W和偏差b.
Let's say we need to train weights W and biases b with our model.
因此解决方法是将它们添加到我们模型的可训练层之一.
So the workaround is just add them to one of the trainable layers of our model.
model.layers[-1].trainable_weights.extend([W, b])
什么时候可以编译模型
model.compile(...)
将变量添加到可训练层非常重要,例如,我已经试验了 Sequential 模型,并且将 [W, b] 添加到激活层并不能使它们真正可训练.
It is extremely important to add variables to trainable layer, for example I've experimented with Sequential model, and adding [W, b] to the Activation layer does not make them actually trainable.
这篇关于keras 将外部可训练变量添加到图中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!