问题描述
我有两个关于 Tensorflow PTB RNN 教程代码的问题 ptb_word_lm.py.下面的代码块来自代码.
I have two question on Tensorflow PTB RNN tutorial code ptb_word_lm.py. Code blocks below are from the code.
是否可以为每个批次重置状态?
Is it okay to reset state for every batch?
self._initial_state = cell.zero_state(batch_size, data_type())
with tf.device("/cpu:0"):
embedding = tf.get_variable(
"embedding", [vocab_size, size], dtype=data_type())
inputs = tf.nn.embedding_lookup(embedding, input_.input_data)
if is_training and config.keep_prob < 1:
inputs = tf.nn.dropout(inputs, config.keep_prob)
outputs = []
state = self._initial_state
with tf.variable_scope("RNN"):
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
(cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
在第 133 行,我们将初始状态设置为零.然后,在第 153 行,我们使用零状态作为 rnn 步骤的起始状态.这意味着批处理的每个起始状态都设置为零.我相信,如果我们想应用 BPTT(通过时间反向传播),我们应该在先前数据完成的步骤中进行外部(非零)状态输入,例如有状态的 RNN(在 Keras 中).
In line 133, we set the initial state as zero. Then, line 153, we use the zero state as the starting state of the rnn steps. It means that every starting state of batch is set to zero. I believe that if we want to apply BPTT(backpropagation through time), we should make external(non-zero) state input of step where previous data is finished, like stateful RNN (in Keras).
我发现将起始状态重置为零实际上是有效的.但是,是否有任何理论背景(或论文)说明为什么会这样?
I found that resetting starting state to zero practically works. But is there any theoretical background (or paper) of why this works?
可以这样衡量测试的困惑度吗?
Is it okay to measure test perplexity like this?
eval_config = get_config()
eval_config.batch_size = 1
eval_config.num_steps = 1
与上一个问题相关...该模型将每个批次的初始状态固定为零.但是,在第 337 ~ 338 行中,我们将批次大小设为 1,将步数设为 1 以进行测试配置.然后,对于测试数据,我们将每次放置单个数据并在没有上下文的情况下预测下一个数据(!),因为每个批次的状态都为零(只有一个时间步).
Related to the previous question... The model fixes the initial state to zero for every batch. However, in line 337 ~ 338, we make batch size 1 and num steps 1 for test configuration. Then, for the test data, we will put single data each time and predict next one without context(!) because the state will be zero for every batch (with only one timestep).
这是测试数据的正确度量吗?是否所有其他语言模型论文都将测试困惑度衡量为在没有上下文的情况下预测下一个单词?
Is this correct measure for the test data? Does every other language model papers measure test perplexity as predicting next word without context?
我运行了这段代码,得到了与代码和原始论文所说的相似的结果.如果这段代码是错误的,我希望不是,你知道如何复制论文结果吗?如果我修改问题,也许我可以提出拉取请求.
I ran this code and got a similar result as the code says and also the original paper says. If this code is wrong, which I hope not, do you have any idea how to replica the paper result? Maybe I can make a pull request if I modify the problems.
推荐答案
Re(1),代码做的(cell_output, state) = cell(inputs[:, time_step, :], state)
.这将下一个时间步的状态指定为该时间步的输出状态.
Re (1), the code does (cell_output, state) = cell(inputs[:, time_step, :], state)
. This assigns the state for the next time step to be the output state of this time step.
当你开始一个新的批处理时,你应该独立于你到目前为止所做的计算(注意批处理之间的区别,它们是完全不同的例子,时间步长在相同的序列中).
When you start a new batch you should do so independently from the computation you've done so far (note the distinction between batch, which are completely different examples, and time steps in the same sequence).
Re (2),大部分时间使用上下文.
Re (2), most of the time context is used.
这篇关于Tensorflow RNN PTB教程测试测量和状态重置是不是错了?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!