地图与星图的性能?

本文介绍了地图与星图的性能?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试对两个序列进行纯Python(无外部依赖项)的逐元素比较.我的第一个解决方案是:

I was trying to make a pure-python (without external dependencies) element-wise comparison of two sequences. My first solution was:

list(map(operator.eq, seq1, seq2))

然后我从 itertools 中找到了 starmap 函数，该函数与我看起来非常相似.但是在最坏的情况下，它却比我的计算机快37％.由于对我来说并不明显，我测量了从生成器中检索1个元素所需的时间(不知道这种方法是否正确):

Then I found starmap function from itertools, which seemed pretty similar to me. But it turned out to be 37% faster on my computer in worst case. As it was not obvious to me, I measured the time necessary to retrieve 1 element from a generator (don't know if this way is correct):

from operator import eq
from itertools import starmap

seq1 = [1,2,3]*10000
seq2 = [1,2,3]*10000
seq2[-1] = 5

gen1 = map(eq, seq1, seq2))
gen2 = starmap(eq, zip(seq1, seq2))

%timeit -n1000 -r10 next(gen1)
%timeit -n1000 -r10 next(gen2)

271 ns ± 1.26 ns per loop (mean ± std. dev. of 10 runs, 1000 loops each)
208 ns ± 1.72 ns per loop (mean ± std. dev. of 10 runs, 1000 loops each)

在检索元素中，第二个解决方案的性能提高了24％.之后，它们对于 list 都产生相同的结果.但是从某个地方我们可以额外获得13％的时间:

In retrieving elements the second solution is 24% more performant. After that, they both produce the same results for list. But from somewhere we gain extra 13% in time:

%timeit list(map(eq, seq1, seq2))
%timeit list(starmap(eq, zip(seq1, seq2)))

5.24 ms ± 29.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.34 ms ± 84.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

我不知道如何深入剖析此类嵌套代码?所以我的问题是，为什么第一个生成器的检索速度如此之快，并且从中我们可以从 list 函数中获得13％的额外收益?

I don't know how to dig deeper in profiling of such nested code? So my question is why the first generator so faster in retrieving and from where we gain extra 13% in list function?

我的第一个意图是执行逐元素比较，而不是 all ，因此将 all 函数替换为 list .此替换不会影响时间比例.

My first intention was to perform element-wise comparison instead of all, so the all function was replaced with list. This replacement does not affect the timing ratio.

Windows 10(64位)上的CPython 3.6.2

CPython 3.6.2 on Windows 10 (64bit)

推荐答案

有几个因素(共同)有助于观察到的性能差异:

There are several factors that contribute (in conjunction) to the observed performance difference:

__ next __

zip 重复使用返回的 tuple ，如果它的引用计数为1.
map 构建一个 new tuple ，每次调用 __ next __ 时，该传递给映射函数"被制造.实际上，它可能不会从头开始创建新的元组，因为Python会维护未使用的元组的存储.但是在那种情况下， map 必须找到一个大小合适的未使用元组.
starmap 检查可迭代对象中的下一项是否为 tuple 类型，如果是，则将其继续传递.
使用 PyObject_Call 从C代码中调用C函数不会创建传递给被调用方的新元组.

zip re-uses the returned tuple if it has a reference count of 1 when the next __next__ call is made.
map builds a new tuple that is passed to the "mapped function" every time a __next__ call is made. Actually it probably won't create a new tuple from scratch because Python maintains a storage for unused tuples. But in that case map has to find an unused tuple of the right size.
starmap checks if the next item in the iterable is of type tuple and if so it just passes it on.
Calling a C function from within C code with PyObject_Call won't create a new tuple that is passed to the callee.

因此带有 zip 的 starmap 将仅一次又一次使用一个元组，并将其传递给 operator.eq ，从而减少了函数调用的开销非常.另一方面，每次调用 operator.eq 时， map 都会创建一个新的元组(或从CPython 3.6开始填充一个C数组).因此，速度的差异实际上就是元组创建的开销.

So starmap with zip will only use one tuple over and over again that is passed to operator.eq thus reducing the function call overhead immensely. map on the other hand will create a new tuple (or fill a C array from CPython 3.6 on) every time operator.eq is called. So what is actually the speed difference is just the tuple creation overhead.

我将提供一些Cython代码来验证这一点，而不是链接到源代码:

Instead of linking to the source code I'll provide some Cython code that can be used to verify this:

In [1]: %load_ext cython

In [2]: %%cython
   ...:
   ...: from cpython.ref cimport Py_DECREF
   ...:
   ...: cpdef func(zipper):
   ...:     a = next(zipper)
   ...:     print('a', a)
   ...:     Py_DECREF(a)
   ...:     b = next(zipper)
   ...:     print('a', a)

In [3]: func(zip([1, 2], [1, 2]))
a (1, 1)
a (2, 2)

是的， tuple 并不是真正不变的，一个简单的 Py_DECREF 足以欺骗" zip ，让别人相信没有人持有引用返回的元组！

Yes, tuples aren't really immutable, a simple Py_DECREF was sufficient to "trick" zip into believing noone else holds a reference to the returned tuple!

关于元组通过":

In [4]: %%cython
   ...:
   ...: def func_inner(*args):
   ...:     print(id(args))
   ...:
   ...: def func(*args):
   ...:     print(id(args))
   ...:     func_inner(*args)

In [5]: func(1, 2)
1404350461320
1404350461320

因此，元组将直接通过(只是因为它们被定义为C函数！)对于纯Python函数不会发生这种情况:

So the tuple is passed right through (just because these are defined as C functions!) This doesn't happen for pure Python functions:

In [6]: def func_inner(*args):
   ...:     print(id(args))
   ...:
   ...: def func(*args):
   ...:     print(id(args))
   ...:     func_inner(*args)
   ...:

In [7]: func(1, 2)
1404350436488
1404352833800

请注意，即使被调用的函数不是C函数，即使从C函数调用也不会发生这种情况:

Note that it also doesn't happen if the called function isn't a C function even if called from a C function:

In [8]: %%cython
   ...:
   ...: def func_inner_c(*args):
   ...:     print(id(args))
   ...:
   ...: def func(inner, *args):
   ...:     print(id(args))
   ...:     inner(*args)
   ...:

In [9]: def func_inner_py(*args):
    ...:     print(id(args))
    ...:
    ...:

In [10]: func(func_inner_py, 1, 2)
1404350471944
1404353010184

In [11]: func(func_inner_c, 1, 2)
1404344354824
1404344354824

因此，存在很多巧合"，导致具有 zip 的 starmap 比调用具有多个参数的 map 更快.当被调用函数也是C函数时...

So there are a lot of "coincidences" leading up to the point that starmap with zip is faster than calling map with multiple arguments when the called function is also a C function...

这篇关于地图与星图的性能?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！