问题描述
我在一个计算资源共享的环境中工作,即我们有几台服务器机器,每台机器都配备了几个 Nvidia Titan X GPU.
I work in an environment in which computational resources are shared, i.e., we have a few server machines equipped with a few Nvidia Titan X GPUs each.
对于中小型模型,Titan X 的 12 GB 通常足以让 2-3 人在同一 GPU 上同时运行训练.如果模型足够小以至于单个模型不能充分利用 GPU 的所有计算单元,那么与运行一个接一个的训练过程相比,这实际上会导致加速.即使在并发访问 GPU 确实会减慢个人训练时间的情况下,拥有让多个用户同时在 GPU 上训练的灵活性仍然很好.
For small to moderate size models, the 12 GB of the Titan X is usually enough for 2–3 people to run training concurrently on the same GPU. If the models are small enough that a single model does not take full advantage of all the computational units of the GPU, this can actually result in a speedup compared with running one training process after the other. Even in cases where the concurrent access to the GPU does slow down the individual training time, it is still nice to have the flexibility of having multiple users simultaneously train on the GPU.
TensorFlow 的问题在于,默认情况下,它会在启动时分配全部可用的 GPU 内存.即使对于一个小的两层神经网络,我也看到所有 12 GB 的 GPU 内存都用完了.
The problem with TensorFlow is that, by default, it allocates the full amount of available GPU memory when it is launched. Even for a small two-layer neural network, I see that all 12 GB of the GPU memory is used up.
有没有办法让 TensorFlow 只分配,比如说,4 GB 的 GPU 内存,如果知道这对于给定的模型来说已经足够了?
Is there a way to make TensorFlow only allocate, say, 4 GB of GPU memory, if one knows that this is enough for a given model?
推荐答案
您可以设置在构建 tf.Session
通过传递 tf.GPUOptions
作为可选 config
参数的一部分:
You can set the fraction of GPU memory to be allocated when you construct a tf.Session
by passing a tf.GPUOptions
as part of the optional config
argument:
# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
per_process_gpu_memory_fraction
充当同一台机器上每个 GPU 上的进程将使用的 GPU 内存量的硬上限.目前,这一部分统一应用于同一台机器上的所有 GPU;无法在每个 GPU 的基础上进行设置.
The per_process_gpu_memory_fraction
acts as a hard upper bound on the amount of GPU memory that will be used by the process on each GPU on the same machine. Currently, this fraction is applied uniformly to all of the GPUs on the same machine; there is no way to set this on a per-GPU basis.
这篇关于如何防止 tensorflow 分配 GPU 内存的全部?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!