在循环中构建图时，Tensorflow内存泄漏

本文介绍了在循环中构建图时，Tensorflow内存泄漏的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我因选择使用Tensorflow(版本1.12.0)模型的超参数而进行的网格搜索由于内存消耗激增而崩溃时，我注意到了这一点.

I noticed this when my grid search for selecting hyper-parameters of a Tensorflow (version 1.12.0) model crashed due to explosion in memory consumption.

请注意，与这里的类似问题不同，我确实关闭了图和会话(使用上下文管理器)，并且没有在循环中向图添加节点.

Notice that unlike similar-looking question here, I do close the graph and session (using context managers), and I am not adding nodes to the graph in the loop.

我怀疑tensorflow可能保留了在每次迭代之间都不会清除的全局变量，因此我在迭代之前和之后都调用了globals()，但是在每次迭代之前和之后都没有观察到全局变量集合中的任何区别.

I suspected that maybe tensorflow maintains global variables that do not get cleared between iterations, so I called globals() before and after an iteration but did not observe any difference in the set of global variable before and after each iteration.

我举了一个小例子，重现了这个问题.我在循环中训练了一个简单的MNIST分类器，并绘制了该进程消耗的内存:

I made a small example that reproduces the problem. I train a simple MNIST classifier in a loop and plot the memory consumed by the process:

import matplotlib.pyplot as plt
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import psutil
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
process = psutil.Process(os.getpid())

N_REPS = 100
N_ITER = 10
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x_test, y_test = mnist.test.images, mnist.test.labels

# Runs experiment several times.
mem = []
for i in range(N_REPS):
    with tf.Graph().as_default():
        net = tf.contrib.layers.fully_connected(x_test, 200)
        logits = tf.contrib.layers.fully_connected(net, 10, activation_fn=None)
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_test, logits=logits))
        train_op = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(loss)
        init = tf.global_variables_initializer()
        with tf.Session() as sess:
            # training loop.
            sess.run(init)
            for _ in range(N_ITER):
                sess.run(train_op)
    mem.append(process.memory_info().rss)
plt.plot(range(N_REPS), mem)

结果图如下:

在我的实际项目中，进程内存从几百MB开始(取决于数据集的大小)，最高可达64 GB，直到我的系统内存不足为止.我尝试过一些使增速变慢的方法，例如使用占位符和feed_dicts而不是依赖convert_to_tensor.但是持续的增长仍然存在，只是速度较慢.

In my actual project, process memory starts from a couple of hundreds MB (depending on dataset size), and goes up to 64 GB until my system run out of memory. There are things that I tried that slow down the increase, such as using placeholders and feed_dicts instead of relying on convert_to_tensor. But the constant increase is still there, only slower.

在循环中构建图时

在循环中构建图时，Tensorflow内存泄漏

问题描述

推荐答案