本文介绍了Tensorflow,缺少检查点文件.saver 是否只允许保留 5 个检查点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 tensorflow,并且一直在训练一些模型并在每个 epoch 之后使用 tf.saver() 方法保存它们.我能够很好地保存和加载模型,而且我正在以通常的方式执行此操作.

I am working with tensorflow and have been training some models and saving them after each epoch using the tf.saver() method. I am able to save and load models just fine and I am doing this in the usual way.

with tf.Graph().as_default(), tf.Session() as session:
    initialiser = tf.random_normal_initializer(config.mean, config.std)

    with tf.variable_scope("model",reuse=None, initializer=initialiser):
        m = a2p(session, config, training=True)

    saver = tf.train.Saver()
    ckpt = tf.train.get_checkpoint_state(model_dir)
    if ckpt and tf.gfile.Exists(ckpt.model_checkpoint_path)
        saver.restore(session, ckpt.model_checkpoint_path)
    ...
    for i in range(epochs):
       runepoch()
       save_path = saver.save(session, '%s.ckpt'%i)

我的代码被设置为为每个时代保存一个模型,应该相应地标记.但是,我注意到经过 15 次训练后,我只有最后 5 次(10、11、12、13、14)的检查点文件.文档对此没有任何说明,因此我不知道为什么会发生这种情况.

My code is set up to save a model for each epoch which should be labelled accordingly. However, I have noticed that after fifteen epochs of training I only have check point files for the last five epochs (10, 11, 12, 13,14). The documentation doesn't say anything about this so I am at a loss as to why it is happening.

保护程序只允许保留五个检查点还是我做错了什么?

Does the saver only allow for keeping five checkpoints or have I done something wrong?

有没有办法确保保留所有检查点?

Is there a way to make sure that all of the checkpoints are kept?

推荐答案

通过设置默认为 5 的 max_to_keep 参数来创建您的 Saver 对象.

You can choose how many checkpoints to save when you create your Saver object by setting the max_to_keep argument which defaults to 5.

saver = tf.train.Saver(max_to_keep=10000)

这篇关于Tensorflow,缺少检查点文件.saver 是否只允许保留 5 个检查点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-06 12:54