问题描述
我刚开始使用Numpy,并注意到对Numpy数组中的每个元素进行迭代的速度比进行相同操作但具有列表列表的速度要慢约4倍.我现在知道这违背了Numpy的目的,如果可能的话,我应该对函数进行向量化.我的问题是,为什么它要慢4倍.这似乎是一个很大的数目.
I just started using Numpy and noticed that iterating through each element in a Numpy array is ~4x slower than doing the same but with a list of lists. I know now that this defeats the purpose of Numpy and I should vectorize the function if possible. My question is though why is it 4x slower. That seems like quite a large amount.
我使用%timeit
import numpy as np
b = np.eye(1000)
a = b.tolist()
%timeit b[100][100] #1000000 loops, best of 3: 692 ns per loop
%timeit a[100][100] #10000000 loops, best of 3: 70.7 ns per loop
%timeit b[100,100] #1000000 loops, best of 3: 343 ns per loop
%timeit b.item(100,100) #1000000 loops, best of 3: 297 ns per loop
我尝试使用dis.dis
来查看引擎盖下发生了什么,但是得到了:
I tried to use dis.dis
to see what was going on under the hood but got:
TypeError: don't know how to disassemble method-wrapper objects
然后,我尝试查看Numpy源代码,但找不到对应于数组元素访问的文件.我很好奇是什么导致了额外的开销,更重要的是,将来如何自己解决这个问题.看来python不能轻易编译为C代码,这样我才能看到其中的区别.但是,有没有办法查看每一行生成的字节码,以了解它们之间的差异?
Then I tried to look at the Numpy source code but couldn't figure out which file corresponded to array element access. I'm curious what accounts for the extra overhead, and more importantly how to figure this out for myself in the future. It seems like python can't be easily compiled to C code so that I can see the difference. But is there a way to see what byte code is generated for each line, to get a sense of the differences?
推荐答案
总而言之:从NumPy数组中获取项目需要创建新的Python对象,而列表并非如此.而且,对于NumPy数组,索引比列表要稍微复杂一些,这可能会增加一些额外的开销.
In summary: getting an item from a NumPy array requires new Python objects to be created, whereas this is not the case for lists. Also, indexing is more slightly more complicated for NumPy arrays than lists which may add some additional overhead.
回顾一下,您列出的NumPy操作执行以下操作:
To recap, the NumPy operations you have listed do the following:
-
b[100][100]
将b
的第100行作为数组返回,然后获取该行的索引100处的值,并将该值作为对象(例如np.int64
类型)返回. -
b[100,100]
直接返回第100行和第100列的值 (首先不返回中间数组). -
b.item(100,100)
与上面的b[100,100]
相同,除了将值转换为原生Python类型并返回.
b[100][100]
returns row 100 ofb
as an array, and then gets the value at index 100 of this row, returning the value as an object (e.g. anp.int64
type).b[100,100]
returns the value at row 100 and column 100 directly (no intermediate array is returned first).b.item(100,100)
does the same as aboveb[100,100]
except that the value is converted to a native Python type and returned.
现在这些操作中,(1)最慢,因为它需要两个连续的NumPy索引操作(我将解释为什么它比下面的列表索引要慢). (2)最快,因为仅执行了一个索引操作.操作(3)可能会更慢,因为它是方法调用(在Python中通常较慢).
Now of these operation, (1) is slowest because it requires two sequential NumPy indexing operations (I'll explain why this is slower than list indexing below). (2) is quickest because only a single indexing operation is performed. Operation (3) is possibly slower as it is a method call (these are generally slow in Python).
为什么 list 访问仍然比b[100,100]
快?
Python列表是指向内存中对象的指针的数组.例如,列表[1, 2, 3]
不直接包含这些整数,而是指向每个整数对象已存在的内存地址的指针.为了从列表中获得一项,Python只需返回对该对象的引用.
Python lists are arrays of pointers to objects in memory. For example, the list [1, 2, 3]
does not contain those integers directly, but rather pointers to the memory addresses were each integer object already exists. To get an item from the list, Python just returns a reference to the object.
NumPy数组不是对象的集合.数组np.array([1, 2, 3])
只是一个连续的内存块,其位设置为表示这些整数值.要从此数组获取整数,必须在与该数组分开的内存中构造一个新的Python对象.例如,索引操作可能会返回np.int64
对象:该对象以前不存在,必须创建.
NumPy arrays are not collections of objects. The array np.array([1, 2, 3])
is just a contiguous block of memory with bits set to represent those integer values. To get an integer from this array, a new Python object must be constructed in memory separate to the array. For instance, an object of np.int64
may be returned by the indexing operation: this object did not exist previously and had to be created.
a[100][100]
(从列表中获取)比b[100,100]
(从数组中获取)更快的另外两个原因是:
Two other reasons why a[100][100]
(getting from the list) is quicker than b[100,100]
(getting from the array) are that:
-
同时为列表和数组建立索引时,将执行字节码操作码
BINARY_SUBSCR
,但针对Python列表进行了优化.
The bytecode opcode
BINARY_SUBSCR
is executed when indexing both lists and arrays, but it is optimised for the case of Python lists.
处理Python列表的整数索引的内部C函数非常简短.另一方面,NumPy索引要复杂得多,需要执行大量代码才能确定所使用的索引类型,以便可以返回正确的值.
The internal C function handling integer indexing for Python lists is very short and simple. On the other hand, NumPy indexing is much more complicated and a significant amount of code is executed to determine the type of indexing being used so that the correct value can be returned.
下面,将更详细地描述使用a[100][100]
和b[100,100]
访问列表和数组中的元素的步骤.
Below, the steps for accessing elements in a list and array with a[100][100]
and b[100,100]
are described in more detail.
为列表和数组触发相同的四个字节码操作码:
The same four bytecode opcodes are triggered for both lists and arrays:
0 LOAD_NAME 0 (a) # the list or array
3 LOAD_CONST 0 (100) # index number (tuple for b[100,100])
6 BINARY_SUBSCR # find correct "getitem" function
7 RETURN_VALUE # value returned from list or array
注意:如果您开始对多维列表进行链式索引,例如a[100][100][100]
,您开始重复这些字节码指令.使用元组索引的NumPy数组不会发生这种情况:b[100,100,100]
仅使用四个指令.这就是为什么随着尺寸数量的增加,时间上的差距开始缩小的原因.
Note: if you start chain indexing for multi-dimensional lists, e.g. a[100][100][100]
, you start to repeat these bytecode instructions. This does not happen for NumPy arrays using the tuple indexing: b[100,100,100]
uses just the four instructions. This is why the gap in the timings begins to close as the number of dimensions increases.
访问列表和数组的功能不同,在每种情况下都需要找到正确的函数.此任务由 BINARY_SUBSCR
操作码处理:
The functions for accessing lists and arrays are different and the correct one needs to be found in each case. This task is handled by the BINARY_SUBSCR
opcode:
w = POP(); // our index
v = TOP(); // our list or NumPy array
if (PyList_CheckExact(v) && PyInt_CheckExact(w)) { // do we have a list and an int?
/* INLINE: list[int] */
Py_ssize_t i = PyInt_AsSsize_t(w);
if (i < 0)
i += PyList_GET_SIZE(v);
if (i >= 0 && i < PyList_GET_SIZE(v)) {
x = PyList_GET_ITEM(v, i); // call "getitem" for lists
Py_INCREF(x);
}
else
goto slow_get;
}
else
slow_get:
x = PyObject_GetItem(v, w); // else, call another function
// to work out what is needed
Py_DECREF(v);
Py_DECREF(w);
SET_TOP(x);
if (x != NULL) continue;
break;
此代码针对Python列表进行了优化.如果函数看到列表,它将快速调用函数PyList_GET_ITEM
.现在可以在所需的索引处访问此列表(请参阅下面的下一部分).
This code is optimised for Python lists. If the function sees a list, it will quickly call the function PyList_GET_ITEM
. This list can now be accessed at the required index (see next section below).
但是,如果没有看到列表(例如,我们有一个NumPy数组),它将采用"slow_get"路径.依次调用另一个函数 PyObject_GetItem
进行检查对象映射到哪个"getitem"函数:
However, if it doesn't see a list (e.g. we have a NumPy array), it takes the "slow_get" path. This in turn calls another function PyObject_GetItem
to check which "getitem" function the object is mapped to:
PyObject_GetItem(PyObject *o, PyObject *key)
{
PyMappingMethods *m;
if (o == NULL || key == NULL)
return null_error();
m = o->ob_type->tp_as_mapping;
if (m && m->mp_subscript)
return m->mp_subscript(o, key);
...
对于NumPy数组,正确的函数位于 PyMappingMethods
(位于PyMappingMethods
结构中).
In the case of NumPy arrays, the correct function is located in mp_subscript
in the PyMappingMethods
structure.
请注意,在可以调用此正确的"get"功能之前,必须先调用其他功能.这些调用增加了b[100]
的开销,尽管开销将取决于Python/NumPy的编译方式,系统体系结构等.
Notice the additional function calls before this correct "get" function can be called. These calls add to the overhead for b[100]
, although how much will depend on how Python/NumPy was compiled, the system architecture, and so on.
上面我们看到函数 PyList_GET_ITEM
一个>被称为.这是一个简短的函数,基本上看起来像这样*:
Above it was seen that the function PyList_GET_ITEM
is called. This is a short function that essentially looks like this*:
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
if (!PyList_Check(op)) { // check if list
PyErr_BadInternalCall();
return NULL;
}
if (i < 0 || i >= Py_SIZE(op)) { // check i is in range
if (indexerr == NULL) {
indexerr = PyUnicode_FromString(
"list index out of range");
if (indexerr == NULL)
return NULL;
}
PyErr_SetObject(PyExc_IndexError, indexerr);
return NULL;
}
return ((PyListObject *)op) -> ob_item[i]; // return reference to object
}
* PyList_GET_ITEM
实际上是此函数的宏形式,它执行相同的功能(减去错误检查).
这意味着将项目获取到Python列表的索引i
相对简单.在内部,Python检查项目的类型是否为列表,i
是否在列表的正确范围内,然后将对引用的引用返回到列表中的对象.
This means that getting the item at index i
of a Python list is relatively simple. Internally, Python checks whether the type of the item being is a list, whether i
is in the correct range for the list, and then returns the reference to the object in the list.
相反,NumPy必须做更多的工作才能返回请求的索引处的值.
In contrast, NumPy has to do much more work before the value at the requested index can be returned.
可以用各种不同的方式对数组建立索引,并且NumPy必须决定需要哪个索引例程.各种索引例程主要由 mapping.c
.
Arrays can be indexed in a variety of different ways and NumPy has to decide which index routine is needed. The various indexing routines are handled largely by code in mapping.c
.
用于索引NumPy数组的所有内容均通过函数 prepare_index
,它开始解析索引并存储有关广播,维度数等的信息.这是该函数的呼叫签名:
Anything used to index NumPy arrays passes through the function prepare_index
which begins the parsing of the index and stores the information about broadcasting, number of dimensions, and so on. Here is the call signature for the function:
NPY_NO_EXPORT int
prepare_index(PyArrayObject *self, PyObject *index,
npy_index_info *indices,
int *num, int *ndim, int *out_fancy_ndim, int allow_boolean)
/* @param the array being indexed
* @param the index object
* @param index info struct being filled (size of NPY_MAXDIMS * 2 + 1)
* @param number of indices found
* @param dimension of the indexing result
* @param dimension of the fancy/advanced indices part
* @param whether to allow the boolean special case
*/
该功能必须进行很多检查.即使对于相对简单的索引(例如b[100,100]
),也必须推断出很多信息,以便NumPy可以将引用(视图)返回到正确的值.
The function has to do a lot of checks. Even for a relatively simple index such as b[100,100]
, a lot of information has to be inferred so that NumPy can return a reference (view) to the correct value.
总而言之,找到NumPy的"getitem"函数需要更长的时间,处理数组索引的函数肯定比Python列表的单个函数复杂.
In conclusion, it takes longer for the "getitem" function for NumPy to be found and the functions handling the indexing of arrays are necessarily more complex than the single function for Python lists.
这篇关于Numpy单个元素的访问速度比列表慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!