本文介绍了可变长度 rnn 填充和屏蔽填充梯度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

我正在构建一个 rnn 并使用 sequene_length 参数为批次中的序列提供长度列表,并且批次中的所有序列都填充到相同的长度.

I'm building an rnn and using the sequene_length parameter to supply a list of lengths for sequences in a batch, and all of sequences in a batch are padded to the same length.

但是,在进行反向传播时,是否可以屏蔽与填充步骤对应的梯度,因此这些步骤对权重更新的贡献为 0?我已经像这样屏蔽了它们相应的成本(其中 batch_weights 是 0 和 1 的向量,其中与填充步骤对应的元素是 0):

However, when doing backprop, is it possible to mask out the gradients corresponding to the padded steps, so these steps would have 0 contribution to the weight updates? I'm already masking out their corresponding costs like this (where batch_weights is a vector of 0's and 1's, where the elements corresponding to the padding steps are 0's):

loss = tf.mul(tf.nn.sparse_softmax_cross_entropy_with_logits(logits, tf.reshape(self._targets, [-1])), batch_weights)

self._cost = cost = tf.reduce_sum(loss) / tf.to_float(tf.reduce_sum(batch_weights))

问题是我不确定通过执行上述操作是否将填充步骤的梯度归零?

the problem is I'm not sure by doing the above whether the gradients from the padding steps are zeroed out or not?

推荐答案

对于所有 framewise/前馈(非循环)操作,屏蔽损失/成本就足够了.

For all framewise / feed-forward (non-recurrent) operations, masking the loss/cost is enough.

对于所有序列/循环操作(例如 dynamic_rnn),总是有一个 sequence_length 参数需要设置为相应的序列长度.那么零填充步骤就不会出现梯度,或者换句话说,它将具有 0 贡献.

For all sequence / recurrent operations (e.g. dynamic_rnn), there is always a sequence_length parameter which you need to set to the corresponding sequence lengths. Then there wont be a gradient for the zero-padded steps, or in other terms, it will have 0 contribution.

这篇关于可变长度 rnn 填充和屏蔽填充梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 15:55