问题描述
在我的tensorflow2.0b程序中,确实出现了这样的错误
In my tensorflow2.0b program I do get an error like this
ResourceExhaustedError: OOM when allocating tensor with shape[727272703] and type int8 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:TopKV2]
此程序中的许多基于GPU的操作已成功执行后,将发生错误.
The error occurs after a number of GPU-based operations within this program have been successfully executed.
我喜欢释放与这些过去的操作相关的所有GPU内存,以避免上述错误.我如何在tensorflow-2.0b中做到这一点?如何在程序中检查内存使用情况?
I like to release all GPU-memory associated with these past operations in order to avoid the above error. How can I do this in tensorflow-2.0b? How could I check memory usage from within my program?
我只能使用tensorflow2.0中不再提供的tf.session()查找相关信息
I was only able to find related information using tf.session() which is not available anymore in tensorflow2.0
推荐答案
您可能对使用 NVIDIA Management Library的Python 3绑定.
我会尝试这样的事情:
import nvidia_smi
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
# card id 0 hardcoded here, there is also a call to get all available card ids, so we could iterate
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
print("Total memory:", info.total)
print("Free memory:", info.free)
print("Used memory:", info.used)
nvidia_smi.nvmlShutdown()
这篇关于如何在Tensorflow 2.0b中检查/释放GPU内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!