问题描述
Tensorflow 是否自动利用 Cuda 流在单个 GPU 上并发执行计算图,还是应该将流手动分配给 ops/tensor?
Does Tensorflow utilize Cuda streams automatically for concurrent execution of the computation graph on a single GPU or should streams be assigned manually to ops/tensors ?
推荐答案
目前,TensorFlow 仅使用一个计算流和多个复制流.一些内核可能会选择使用多个流进行计算,同时保持单流语义.
For now, TensorFlow only uses one compute stream, and multiple copy streams. Some kernels may choose to use multiple streams for computation, while maintaining a single-stream semantics.
我们的实验表明,自动启用多流不会带来太多的性能提升,因为我们的大多数内核都足够大,可以利用 GPU 中的所有处理器.但是启用多流会禁用我们当前的设计来积极回收 GPU 内存.
Our experiment showed that enabling multi-stream automatically does not bring much performance gains, since most of our kernels are large enough to utilize all processors in GPU. But enabling multi-stream would disable our current design to recycle GPU memory aggressively.
这是我们将来可能会重新考虑的决定.如果发生这种情况,TensorFlow 很可能会自动将操作/内核分配给不同的 Cuda 流,而不会将它们暴露给用户.
This is a decision we might revisit in the future. If that happens, it is likely for TensorFlow to automatically assign ops/kernels to different Cuda streams, without exposing them to users.
这篇关于Tensorflow 如何支持 Cuda 流?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!