问题描述
我正在尝试恢复在此处提供的预训练模型,并继续在其他数据集上进行训练.在那里可用的预训练模型在上进行训练tensorflow_gpu-1.1.0 .但是我有 tensorflow_gpu-1.13.1
.尝试还原模型时,出现以下错误.
I'm trying to restore the pretrained model provided here and continue training on a different dataset. The pretrained models available there are trained on tensorflow_gpu-1.1.0. But I have tensorflow_gpu-1.13.1
. When I try restoring the model, I get the below error.
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint.
是否可以将模型转换为当前的tensorflow版本?我尝试在此处a>,但是没有运气!
Is it possible to convert the model to current tensorflow version? I tried a script provided here, but no luck!
如果无法转换,我也可以使用旧的tensorflow版本.但是我也无法正确安装旧版本. github页面中提供的命令如下
If not possible to convert, I'm okay to use older tensorflow version as well. But I'm not able to install properly the old version as well.The command provided in the github page is below
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl
但是当我使用上述命令安装tensorflow时,出现以下错误
But when I install tensorflow using the above command, I get the below error
Python 2.7.16 |Anaconda, Inc.| (default, Aug 22 2019, 16:00:36)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import *
File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 51, in <module>
from tensorflow.python import pywrap_tensorflow
File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.8.0: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/install_sources#common_installation_problems
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
如果我使用 conda
安装 tensorflow-1.1.0
,则导入有效,但还原模型再次失败,并出现相同的错误
If I install tensorflow-1.1.0
using conda
, import works, but restore model fails again with the same error
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint.
请帮助!
推荐答案
我真的不确定是否可以移植模型,但我将尝试分享解决问题的方法.
I am really not sure if it is possible to port a model but I will try to share how I solved this issue.
首先,您应该能够独立于 TensorFlow
版本创建整个图形.如果发生任何错误,应该最少.然后,您只需使用以下命令即可将所有变量从旧模型复制到新模型中:
First off, you should be able to create whole graph independent of the TensorFlow
version. If any error occurs there it should be a minimal one. Then, you can simply copy all variables from your old model to the new one with:
RESTORE_VARS_BLACKLIST = ['dont', 'load', 'this']
ckpt_vars = tf.train.list_variables(RESTORE_VARS_CKPT)
ass_ops = []
for dst_var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES):
for (ckpt_var, ckpt_shape) in ckpt_vars:
if dst_var.name.split(":")[0] == ckpt_var and dst_var.shape == ckpt_shape and ckpt_var not in RESTORE_VARS_BLACKLIST:
value = tf.train.load_variable(RESTORE_VARS_CKPT, ckpt_var)
ass_ops.append(tf.assign(dst_var, value))
# Run assign in a session
sess.run(ass_ops)
最后,只需保存新模型即可.
At the end, just save your new model.
这篇关于如何在v1.13.1中还原tensorflow v1.1.0保存的模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!