问题描述
我正在使用 tensorflow,并且一直在训练一些模型并在每个 epoch 之后使用 tf.saver()
方法保存它们.我能够很好地保存和加载模型,而且我正在以通常的方式执行此操作.
I am working with tensorflow and have been training some models and saving them after each epoch using the tf.saver()
method. I am able to save and load models just fine and I am doing this in the usual way.
with tf.Graph().as_default(), tf.Session() as session:
initialiser = tf.random_normal_initializer(config.mean, config.std)
with tf.variable_scope("model",reuse=None, initializer=initialiser):
m = a2p(session, config, training=True)
saver = tf.train.Saver()
ckpt = tf.train.get_checkpoint_state(model_dir)
if ckpt and tf.gfile.Exists(ckpt.model_checkpoint_path)
saver.restore(session, ckpt.model_checkpoint_path)
...
for i in range(epochs):
runepoch()
save_path = saver.save(session, '%s.ckpt'%i)
我的代码被设置为为每个时代保存一个模型,应该相应地标记.但是,我注意到经过 15 次训练后,我只有最后 5 次(10、11、12、13、14)的检查点文件.文档对此没有任何说明,因此我不知道为什么会发生这种情况.
My code is set up to save a model for each epoch which should be labelled accordingly. However, I have noticed that after fifteen epochs of training I only have check point files for the last five epochs (10, 11, 12, 13,14). The documentation doesn't say anything about this so I am at a loss as to why it is happening.
保护程序只允许保留五个检查点还是我做错了什么?
Does the saver only allow for keeping five checkpoints or have I done something wrong?
有没有办法确保保留所有检查点?
Is there a way to make sure that all of the checkpoints are kept?
推荐答案
通过设置默认为 5 的 max_to_keep
参数来创建您的 Saver
对象.
You can choose how many checkpoints to save when you create your Saver
object by setting the max_to_keep
argument which defaults to 5.
saver = tf.train.Saver(max_to_keep=10000)
这篇关于Tensorflow,缺少检查点文件.saver 是否只允许保留 5 个检查点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!