参数.经过一番阅读之后,找到这种 SO解释,看来我需要什么通过的是一个向量(可能由tf.placeholder
I'm trying to use the dynamic_rnn
function in Tensorflow to speed up training. After doing some reading, my understanding is that one way to speed up training is to explicitly pass a value to the sequence_length
parameter in this function. After a bit more reading, and finding this SO explanation, it seems like what I need to pass is a vector (maybe defined by a tf.placeholder
) that contains the length of each sequence within a batch.
这是我很困惑的地方:为了利用这一点,我应该将每个批次填充到批次中最长长度的序列,而不是训练集中的最长序列吗? Tensorflow如何处理任何较短序列中的其余零/填充令牌?另外,这里的主要优势是真的可以提高速度,还是可以额外保证我们在训练过程中掩盖了打击垫令牌?任何帮助/上下文将不胜感激.
Here's where I'm confused: in order to take advantage of this, should I pad each of my batches to the longest-length sequence within the batch instead of the longest-length sequence in the training set? How does Tensorflow handle the remaining zeros/pad-tokens in any of the shorter sequences? Also, is the main advantage here really speed, or just extra assurance that we're masking pad-tokens during training? Any help/context would be appreciated.
批次中的序列 必须对齐,即必须具有相同的长度.因此,对您的问题的一般回答是是".但是不同批次的长度不必相同,因此您可以将输入序列分层为大致相同大小的组,并相应地填充它们.这项技术称为 bucketing ,您可以在本教程.
The sequences within a batch must be aligned, i.e., have to have the same length. So the general answer to your question is "yes". But different batches doesn't have to be of the same length, so you can stratify input sequences into groups that have roughly the same size and pad them accordingly. This technique is called bucketing and you can read about it in this tutorial.
非常直观. tf.nn.dynamic_rnn
Pretty much intuitive. tf.nn.dynamic_rnn
returns two tensors: output
and states
. Suppose the actual sequence length is t
and the padded sequence length is T
将在i > t
Then the output
will contain zeros after i > t
and states
will contain the t
-th cell state, ignoring the states of trailing cells.
import numpy as np
import tensorflow as tf
n_steps = 2
n_inputs = 3
n_neurons = 5
X = tf.placeholder(dtype=tf.float32, shape=[None, n_steps, n_inputs])
seq_length = tf.placeholder(tf.int32, [None])
basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X,
sequence_length=seq_length, dtype=tf.float32)
X_batch = np.array([
# t = 0 t = 1
[[0, 1, 2], [9, 8, 7]], # instance 0
[[3, 4, 5], [0, 0, 0]], # instance 1
[[6, 7, 8], [6, 5, 4]], # instance 2
seq_length_batch = np.array([2, 1, 2])
with tf.Session() as sess:
outputs_val, states_val = sess.run([outputs, states], feed_dict={
X: X_batch,
seq_length: seq_length_batch
是零向量,而states_val[1] == outputs_val[1,0]
Note that instance 1 is padded, so outputs_val[1,1]
is a zero vector and states_val[1] == outputs_val[1,0]
[[[ 0.76686853 0.8707901 -0.79509073 0.7430128 0.63775384]
[ 1. 0.7427926 -0.9452815 -0.93113345 -0.94975543]]
[[ 0.9998851 0.98436266 -0.9620067 0.61259484 0.43135557]
[ 0. 0. 0. 0. 0. ]]
[[ 0.99999994 0.9982034 -0.9934515 0.43735617 0.1671598 ]
[ 0.99999785 -0.5612586 -0.57177305 -0.9255771 -0.83750355]]]
[[ 1. 0.7427926 -0.9452815 -0.93113345 -0.94975543]
[ 0.9998851 0.98436266 -0.9620067 0.61259484 0.43135557]
[ 0.99999785 -0.5612586 -0.57177305 -0.9255771 -0.83750355]]
Of course, batch processing is more efficient, than feeding the sequences one by one. But the main advantage of specifying the length is that you get the reasonable state out of RNN, i.e., padded items don't affect the result tensor. You will get exactly the same result (and the same speed) if you don't set the length, but select the right states manually.
这篇关于在TensorFlow dynamic_rnn中使用sequence_length参数时如何处理填充的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!