为什么列表上的总和(有时)比itertools.chain快?

本文介绍了为什么列表上的总和(有时)比itertools.chain快?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我通过使用此处拉平"列表列表来回答了几个问题:

I answered several questions here by using this to "flatten" a list of lists:

>>> l = [[1,2,3],[4,5,6],[7,8,9]]
>>> sum(l,[])

它可以正常工作并产生收益:

it works fine and yields:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

尽管我被告知sum运算符会执行a = a + b，但不如itertools.chain

although I was told that the sum operator does a = a + b which is not as performant as itertools.chain

我计划中的问题是为什么在列表上可能在字符串上被阻止"，但是我在计算机上进行了快速基准测试，比较了相同数据上的sum和itertools.chain.from_iterable:

My planned question was "why is it possible on lists where it is prevented on strings", but I made a quick benchmark on my machine comparing sum and itertools.chain.from_iterable on the same data:

import itertools,timeit

print(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))

我做了几次，总是得到如下相同的数字:

I did that several times and I always get about the same figures as below:

0.7155522836070246
0.9883352857722025

让我感到惊讶的是，chain –比每个人在sum上推荐我的答案时列出的列表都要慢得多.

To my surprise, chain - recommended over sum for lists by everyone in several comments on my answers - is much slower.

在for循环中进行迭代仍然很有趣，因为它实际上并没有创建列表，但是在创建列表时，sum会获胜.

It's still interesting when iterating in a for loop because it doesn't actually create the list, but when creating the list, sum wins.

因此，当预期结果为list时，我们应该放弃itertools.chain并使用sum吗?

So should we drop itertools.chain and use sum when the expected result is a list ?

感谢一些评论，我通过增加列表数进行了另一项测试

thanks to some comments, I made another test by increasing the number of lists

s = 'l = [[4,5,6] for _ in range(20)]'
print(timeit.timeit("sum(l,[])",setup=s))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s))

现在我得到相反的结论:

now I get the opposite:

6.479897810702537
3.793455760814343

推荐答案

您的测试输入很小.在这些比例下，sum版本的可怕的O(n ^ 2)渐近运行时间不可见.时序受恒定因子支配，并且sum具有更好的恒定因子，因为它不必通过迭代器进行工作.

Your test inputs are tiny. At those scales, the horrific O(n^2) asymptotic runtime of the sum version isn't visible. The timings are dominated by constant factors, and sum has a better constant factor, since it doesn't have to work through iterators.

使用更大的列表，很明显sum根本不是为这种事情设计的:

With bigger lists, it becomes clear that sum is not at all designed for this kind of thing:

>>> timeit.timeit('list(itertools.chain.from_iterable(l))',
...               'l = [[i] for i in xrange(5000)]; import itertools',
...               number=1000)
0.20425895931668947
>>> timeit.timeit('sum(l, [])', 'l = [[i] for i in xrange(5000)]', number=1000)
49.55303902059097

这篇关于为什么列表上的总和(有时)比itertools.chain快?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！