本文介绍了Cython:memoryviews的size属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

例如,我在Cython中使用了很多3D内存视图.

I'm using a lot of 3D memoryviews in Cython, e.g.

cython.declare(a='double[:, :, ::1]')
a = np.empty((10, 20, 30), dtype='double')

我经常想遍历a的所有元素.我可以使用类似

I often want to loop over all elements of a. I can do this using a triple loop like

for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        for k in range(a.shape[2]):
            a[i, j, k] = ...

如果我不在乎索引ijk,则像

If I do not care about the indices i, j and k, it is more efficient to do a flat loop, like

cython.declare(a_ptr='double*')
a_ptr = cython.address(a[0, 0, 0])
for i in range(size):
    a_ptr[i] = ...

在这里,我需要知道数组中元素(size)的数量.这是由shape属性中元素的乘积给出的,即size = a.shape[0]*a.shape[1]*a.shape[2],或更普遍地是size = np.prod(np.asarray(a).shape).我发现这些都很难写,而且(尽管很小)计算开销困扰着我.做到这一点的一种好方法是使用memoryviews的内置size属性size = a.size.但是,由于我无法理解的原因,这导致未优化的C代码,从Cython生成的注释html文件中可以明显看出.具体来说,由size = a.shape[0]*a.shape[1]*a.shape[2]生成的C代码很简单

Here I need to know the number of elements (size) in the array. This is given by the product of the elements in the shape attribute, i.e. size = a.shape[0]*a.shape[1]*a.shape[2], or more generally size = np.prod(np.asarray(a).shape). I find both of these ugly to write, and the (albeit small) computational overhead bothers me. The nice way to do it is to use the builtin size attribute of memoryviews, size = a.size. However, for reasons I cannot fathom, this leads to unoptimized C code, as evident from the annotations html file generated by Cython. Specifically, the C code generated by size = a.shape[0]*a.shape[1]*a.shape[2] is simply

__pyx_v_size = (((__pyx_v_a.shape[0]) * (__pyx_v_a.shape[1])) * (__pyx_v_a.shape[2]));

size = a.size生成的C代码在其中

__pyx_t_10 = __pyx_memoryview_fromslice(__pyx_v_a, 3, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 2238, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_10);
__pyx_t_14 = __Pyx_PyObject_GetAttrStr(__pyx_t_10, __pyx_n_s_size); if (unlikely(!__pyx_t_14)) __PYX_ERR(0, 2238, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_14);
__Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
__pyx_t_7 = __Pyx_PyIndex_AsSsize_t(__pyx_t_14); if (unlikely((__pyx_t_7 == (Py_ssize_t)-1) && PyErr_Occurred())) __PYX_ERR(0, 2238, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_14); __pyx_t_14 = 0;
__pyx_v_size = __pyx_t_7;

要生成上述代码,我已通过编译器指令,这意味着无法优化由a.size生成的笨拙的C代码.在我看来,size属性"实际上并不是预先计算的属性,但实际上是在查找时进行计算.此外,与仅使乘积超过shape属性相比,此计算要涉及更多的内容.我在文档.

To generate the above code, I have enabled all possible optimizations through compiler directives, meaning that the unwieldy C code generated by a.size cannot be optimized away. It looks to me as though the size "attribute" is not really a pre-computed attribute, but actually carries out a computation upon lookup. Furthermore, this computation is quite a bit more involved than simply taking the product over the shape attribute. I cannot find any hint of an explanation in the docs.

对此行为的解释是什么?如果我真的很在乎这种微优化,那么我有比写出a.shape[0]*a.shape[1]*a.shape[2]更好的选择吗?

What is the explanation of this behavior, and do I have a better choice than writing out a.shape[0]*a.shape[1]*a.shape[2], if I really care about this micro optimization?

推荐答案

通过查看生成的C代码,您已经可以看到size是属性而不是简单的C成员.这是用于内存视图的原始Cython代码:

Already by looking at the produced C-code, you can already see that size is a property and not a simple C-member. Here is the original Cython-code for memory-views:

@cname('__pyx_memoryview')
cdef class memoryview(object):
...
   cdef object _size
...
    @property
    def size(self):
        if self._size is None:
            result = 1

            for length in self.view.shape[:self.view.ndim]:
                result *= length

            self._size = result

return self._size

很容易看出,乘积只计算一次,然后进行缓存.显然,它对3维数组没有太大的作用,但是对于更大数量的维,缓存可能会变得非常重要(我们将看到,最多8个维,因此,是否对该缓存进行了明确的切割,真的值得).

It is easy to see, that the product is calculated only once and then cached. Clearly it doesn't play a big role for 3 dimensional arrays, but for a higher number of dimensions caching could become pretty important (as we will see, there are at most 8 dimensions, so it is not that clearly cut, whether this caching is really worth it).

一个人可以理解懒惰地计算size的决定-毕竟,size并非总是需要/使用的,而且也不想为此付费.显然,如果经常使用size,就需要为这种懒惰付出代价-这是cython的权衡.

One can understand the decision to lazily calculate the size - after all, size is not always needed/used and one doesn't want to pay for it. Clearly, there is a price to pay for this laziness if you use the size a lot - that is the trade off cython makes.

我不会在调用a.size的开销上停留太长时间-与从python调用cython函数的开销相比没有什么.

I would not dwell too long on the overhead of calling a.size - it is nothing compared to the overhead of calling a cython-function from python.

例如,@danny的测量仅测量此python调用的开销,而不测量不同方法的实际性能.为了说明这一点,我将第三个函数添加到混合中:

For example, the measurements of @danny measure only this python-call overhead and not the actual performance of the different approaches. To show this, I throw a third function into the mix:

%%cython
...
def both():
    a.size+a.shape[0]*a.shape[1]*a.shape[2]

完成了两倍的工作,但是

which does double amount of the work, but

>>> %timeit mv_size
22.5 ns ± 0.0864 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>> %timeit mv_product
20.7 ns ± 0.087 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>>%timeit both
21 ns ± 0.39 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

一样快.另一方面:

%%cython
...
def nothing():
   pass

不是更快:

%timeit nothing
24.3 ns ± 0.854 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


简而言之:出于可读性考虑,我会使用a.size,前提是假设进行优化不会加快我的应用程序的速度,除非性能分析证明有所不同.


In a nutshell: I would use a.size because of the readability, assuming that optimizing that would not speed up my application, unless profiling proves something different.

整个故事:变量a的类型为__Pyx_memviewslice,而不是人们认为的类型为__pyx_memoryview.结构__Pyx_memviewslice具有以下定义:

The whole story: the variable a is of type __Pyx_memviewslice and not of type __pyx_memoryview as one could think. The struct __Pyx_memviewslice has the following definition:

struct __pyx_memoryview_obj;
typedef struct {
  struct __pyx_memoryview_obj *memview;
  char *data;
  Py_ssize_t shape[8];
  Py_ssize_t strides[8];
  Py_ssize_t suboffsets[8];
} __Pyx_memviewslice;

这意味着,Cython代码可以非常有效地访问shape,因为它是一个简单的C数组(顺便说一句.我问我自己,如果尺寸超过8个,会发生什么情况?-答案是:您的尺寸不能超过8个).

that means, shape can be accessed very efficiently by the Cython-code, as it is a simple C-array (btw. I ask my self, what happens if there are more than 8 dimensions? - the answer is: you cannot have more than 8 dimensions).

成员memview是存储内存的地方,而__pyx_memoryview_obj是C扩展名,它是由我们在上面看到的cython代码生成的,如下所示:

The member memview is where the memory is hold and __pyx_memoryview_obj is the C-Extension which is produce from the cython-code we saw above and looks as follows:

/* "View.MemoryView":328
 *
 * @cname('__pyx_memoryview')
 * cdef class memoryview(object):             # <<<<<<<<<<<<<<
 *
 *     cdef object obj
 */
struct __pyx_memoryview_obj {
  PyObject_HEAD
  struct __pyx_vtabstruct_memoryview *__pyx_vtab;
  PyObject *obj;
  PyObject *_size;
  PyObject *_array_interface;
  PyThread_type_lock lock;
  __pyx_atomic_int acquisition_count[2];
  __pyx_atomic_int *acquisition_count_aligned_p;
  Py_buffer view;
  int flags;
  int dtype_is_object;
  __Pyx_TypeInfo *typeinfo;
};

因此,Pyx_memviewslice并不是真正的Python对象-它是一种方便包装程序,它可以缓存重要数据,例如shapestride,因此可以快速,廉价地访问此信息.

So, Pyx_memviewslice is not really a Python object -it is kind of convenience wrapper, which caches important data, like shape and stride so this information can be accessed fast and cheap.

当我们呼叫a.size时会发生什么?首先,调用__pyx_memoryview_fromslice进行一些附加的引用计数和其他操作,并从__Pyx_memviewslice对象返回成员memview.

What happens when we call a.size? First, __pyx_memoryview_fromslice is called which does some additional reference counting and some further stuff and returns the member memview from the __Pyx_memviewslice-object.

然后在此返回的memoryview上调用属性size,该视图访问_size中的缓存值,如上面的Cython代码所示.

Then the property size is called on this returned memoryview, which accesses the cached value in _size as have been shown in the Cython code above.

似乎python程序员为shapestridessuboffsets之类的重要信息引入了快捷方式,但对于size却不那么重要,这可能不是那么重要-这就是为什么在shape情况下更清洁的C代码.

It looks as if the python-programmers introduced a shortcut for such important information as shape, strides and suboffsets, but not for the size which is probably not so important - this is the reason for cleaner C-code in the case of shape.

这篇关于Cython:memoryviews的size属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 10:30