问题描述
这是我的意思-a
是1.000.000 np.int64
个元素的向量,b
是1.000.000 np.int16
个元素的向量:
Here is what i mean - a
is a vector of 1.000.000 np.int64
elements, b
is a vector of 1.000.000 np.int16
elements:
In [19]: a = np.random.randint(100, size=(10**6), dtype="int64")
In [20]: b = np.random.randint(100, size=(10**6), dtype="int16")
不同操作的时间:
In [23]: %timeit a + 1
4.48 ms ± 253 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [24]: %timeit b + 1
1.37 ms ± 14.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [25]: %timeit a / 10
5.77 ms ± 31.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [26]: %timeit b / 10
6.09 ms ± 70.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [27]: %timeit a * 10
4.52 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [28]: %timeit b * 10
1.52 ms ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
当Numpy必须在内存中创建新的临时结果时,我可以理解这种差异-基础C代码必须在内存中复制/填充更多数据.
I can understand such a difference when Numpy will have to create a new temporary result in memory - the underlying C code will have to copy / fill much more data in memory.
但是我无法理解在如下所示的地方分配值的这种区别:
But I can't understand such difference for assigning values in place like the following:
In [21]: %timeit a[::2] = 111
409 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [22]: %timeit b[::2] = 111
203 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
您是否知道为什么即使对于Numpy不必创建副本/视图的那些操作来说,它也会变慢呢?
Do you have an idea why is it that slower even for those operations where Numpy doesn't have to create a copy / view?
推荐答案
从内存读取需要花费一些时间.写入内存需要花费一些时间.您读入的数据量是读数据的四倍,写的数据量是读数据的四倍,而写操作则要快得多,因此有效地进行了I/O绑定. CPU的速度仅比内存快(随着时间的流逝,速度比越来越高).如果您要进行内存密集型工作,则较小的变量将运行得更快.
Reading from memory costs something. Writing to memory costs something. You're reading four times as much data in, and writing four times as much data out, and the work is so much faster than the reads/writes to memory that it's effectively I/O bound. CPUs are just faster than memory (and the speed ratio has been getting more and more extreme over time); if you're doing memory-intensive work, smaller variables will go faster.
这篇关于为什么使用dtype np.int64的操作要比使用np.int16的相同操作慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!