python - 队列训练与测试

我正在使用here描述的设置来批量加载一些训练图像，即基本上是这样的：

def read_my_file_format(filename_queue):
  # ... use a reader + a decoder

def input_pipeline(filenames, batch_size, num_epochs=None):
  filename_queue = tf.train.string_input_producer(...)
  example, label = read_my_file_format(filename_queue)
  example_batch, label_batch = tf.train.shuffle_batch(
      [example, label], batch_size=batch_size, ...)
  return example_batch, label_batch

def build_net():
    batch, label = input_pipeline(...)
    y = encoder(batch)  # <- build network using the batch

def train():
  with tf.Session() as sess:
    # ... init vars

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    try:
      while not coord.should_stop():
        # ... training step

    except tf.errors.OutOfRangeError:
        print('Done training -- epoch limit reached')
    finally:
        coord.request_stop()

    coord.join(threads)
    sess.close()

这对培训很有好处-但是，我看不到如何测试生成的网络！令我感到困惑的是：

input_pipeline返回的张量是网络的一部分。要进行测试，我必须更换它吗？
我在想可以创建另一个input_pipeline进行测试，即使用不同的文件名队列。然后，我可以使用tf.cond在不同的输入批次之间进行切换，但是，然后：如何确保一次只耗尽一个队列。我看不到如何访问不同的队列以及如何指定如何卸载它们。

基本上，这个问题可以归结为：测试使用tf.train.shuffle_batch方法构建的网络的规范方法是什么？

最佳答案

为数据集评估创建额外的输入管道的想法绝对是正确的。推荐的方法之一是使用multiple input pipelines，它由两个过程组成-一个过程训练，另一个过程评估。在训练过程中将使用检查点，然后每千步，代码可以尝试针对训练和测试数据集eval模型。

从文档中引用：


  训练过程读取训练输入数据，并定期写入带有所有训练变量的检查点文件。
  评估过程将检查点文件还原到推理模型中，该模型读取验证输入数据。


即使培训结束/退出，也可以进行评估。（see this example）

另一个要考虑的是，通过sharing variables，train和eval可以在共享过程变量的同时在同一过程中在同一图形中运行！

关于队列耗尽问题，如果使用tf.train.shuffle_batch*设置num_threads大于1，它将同时读取单个文件（+速度比1个线程快），而不是一次读取N个文件（请参阅batching）。

关于python - 队列训练与测试，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/40802457/