本文介绍了Pytorch 速度比较 - GPU 比 CPU 慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出 GPU 张量运算是否真的比 CPU 运算更快.所以,我在下面写了这个特殊的代码来依次实现 CPU 张量和 GPU cuda 张量的简单 2D 相加,以查看速度差异:

I was trying to find out if GPU tensor operations are actually faster than CPU ones. So, I wrote this particular code below to implement a simple 2D addition of CPU tensors and GPU cuda tensors successively to see the speed difference:

import torch
import time

###CPU
start_time = time.time()
a = torch.ones(4,4)
for _ in range(1000000):
    a += a
elapsed_time = time.time() - start_time

print('CPU time = ',elapsed_time)

###GPU
start_time = time.time()
b = torch.ones(4,4).cuda()
for _ in range(1000000):
    b += b
elapsed_time = time.time() - start_time

print('GPU time = ',elapsed_time)

令我惊讶的是,CPU 时间为 0.93 秒,GPU 时间高达 63 秒.我是否正确地进行了 cuda 张量运算,还是 cuda 张量的概念仅在非常复杂的运算中(例如在神经网络中)才能更快地工作?

To my surprise, the CPU time was 0.93 sec and the GPU time was as high as 63 seconds. Am I doing the cuda tensor operation properly or is the concept of cuda tensors works faster only in very highly complex operations, like in neural networks?

注意:我的 GPU 是 NVIDIA 940MX,torch.cuda.is_available() 调用返回 True.

Note: My GPU is NVIDIA 940MX and torch.cuda.is_available() call returns True.

推荐答案

GPU 加速是通过大量并行计算实现的.在 GPU 上,您有大量内核,每个内核都不是很强大,但这里的大量内核很重要.

GPU acceleration works by heavy parallelization of computation. On a GPU you have a huge amount of cores, each of them is not very powerful, but the huge amount of cores here matters.

像 PyTorch 这样的框架可以尽可能多地并行计算.一般来说,矩阵运算非常适合并行化,但仍然无法并行化计算!

Frameworks like PyTorch do their to make it possible to compute as much as possible in parallel. In general matrix operations are very well suited for parallelization, but still it isn't always possible to parallelize computation!

在您的示例中,您有一个循环:

In your example you have a loop:

b = torch.ones(4,4).cuda()
for _ in range(1000000):
    b += b

您有 1000000 次操作,但由于代码结构的原因,无法并行化大部分计算.如果您考虑一下,要计算 next b 您需要知道 previous(或 current>) b.

You have 1000000 operations, but due to the structure of the code it impossible to parallelize much of these computations. If you think about it, to compute the next b you need to know the value of the previous (or current) b.

所以您有 1000000 次操作,但是每个操作都必须一个接一个地计算.可能的并行化仅限于张量的大小.虽然在您的示例中这个尺寸不是很大:

So you have 1000000 operations, but each of these has to be computed one after another. Possible parallelization is limited to the size of your tensor. This size though is not very large in your example:

torch.ones(4,4)

因此,每次迭代您只能并行化 16 个操作(添加).由于 CPU 具有很少,但功能强大的内核更多,因此对于给定的示例,它的速度要快得多!

So you only can parallelize 16 operations (additions) per iteration.As the CPU has few, but much more powerful cores, it is just much faster for the given example!

但是如果你改变张量的大小,事情就会改变,然后 PyTorch 能够并行化更多的整体计算.我将迭代次数更改为 1000(因为我不想等那么久:),但是您可以输入任何您喜欢的值,CPU 和 GPU 之间的关系应该保持不变.

But things change if you change the size of the tensor, then PyTorch is able to parallelize much more of the overall computation. I changed the iterations to 1000 (because I did not want to wait so long :), but you can put in any value you like, the relation between CPU and GPU should stay the same.

以下是不同张量大小的结果:

Here are the results for different tensor sizes:

#torch.ones(4,4)       - the size you used
CPU time =  0.00926661491394043
GPU time =  0.0431208610534668

#torch.ones(40,40)     - CPU gets slower, but still faster than GPU
CPU time =  0.014729976654052734
GPU time =  0.04474186897277832

#torch.ones(400,400)   - CPU now much slower than GPU
CPU time =  0.9702610969543457
GPU time =  0.04415607452392578

#torch.ones(4000,4000) - GPU much faster then CPU
CPU time =  38.088677167892456
GPU time =  0.044649362564086914

如您所见,在可以并行化内容的地方(这里添加了张量元素),GPU 变得非常强大.
对于给定的计算,GPU 时间根本没有变化,GPU 可以处理更多!
(只要它不会耗尽内存:)

So as you see, where it is possible to parallelize stuff (here the addition of the tensor elements), GPU becomes very powerful.
GPU time is not changing at all for the given calculations, the GPU can handle much more!
(as long as it doesn't run out of memory :)

这篇关于Pytorch 速度比较 - GPU 比 CPU 慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 09:34