我正在尝试恢复提供的预先训练模型here,并继续在不同的数据集上进行训练。那里提供的预训练模型都是关于tensorflow_gpu-1.1.0的培训。但我有。当我尝试恢复模型时,我得到以下错误。

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint.

是否可以将模型转换为当前的tensorflow版本?我试了一个提供的脚本,但是没有成功!
如果无法转换,我也可以使用旧的tensorflow版本。但我也无法正确安装旧版本。
here中提供的命令如下
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl

但是当我使用上面的命令安装tensorflow时,会得到下面的错误
Python 2.7.16 |Anaconda, Inc.| (default, Aug 22 2019, 16:00:36)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 51, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.8.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

如果我使用tensorflow_gpu-1.13.1安装tensorflow-1.1.0,import可以工作,但是restore model再次失败,并出现相同的错误
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint.

请帮忙!

最佳答案

我真的不确定是否可以移植一个模型,但我将尝试分享我是如何解决这个问题的。
首先,您应该能够独立于TensorFlow版本创建整个图形。如果出现任何错误,应该是最小的错误。然后,您可以简单地将所有变量从旧模型复制到新模型,方法是:

RESTORE_VARS_BLACKLIST = ['dont', 'load', 'this']
ckpt_vars = tf.train.list_variables(RESTORE_VARS_CKPT)
ass_ops = []
for dst_var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES):
    for (ckpt_var, ckpt_shape) in ckpt_vars:
        if dst_var.name.split(":")[0] == ckpt_var and dst_var.shape == ckpt_shape and ckpt_var not in RESTORE_VARS_BLACKLIST:
            value = tf.train.load_variable(RESTORE_VARS_CKPT, ckpt_var)
            ass_ops.append(tf.assign(dst_var, value))
# Run assign in a session
sess.run(ass_ops)

最后,保存你的新模型。

关于python - 如何在v1.13.1中还原tensorflow v1.1.0保存的模型,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57816305/

10-12 01:15
查看更多