问题描述
我通过使用此处拉平"列表列表来回答了几个问题:
I answered several questions here by using this to "flatten" a list of lists:
>>> l = [[1,2,3],[4,5,6],[7,8,9]]
>>> sum(l,[])
它可以正常工作并产生收益:
it works fine and yields:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
尽管我被告知sum
运算符会执行a = a + b
,但不如itertools.chain
although I was told that the sum
operator does a = a + b
which is not as performant as itertools.chain
我计划中的问题是为什么在列表上可能在字符串上被阻止",但是我在计算机上进行了快速基准测试,比较了相同数据上的sum
和itertools.chain.from_iterable
:
My planned question was "why is it possible on lists where it is prevented on strings", but I made a quick benchmark on my machine comparing sum
and itertools.chain.from_iterable
on the same data:
import itertools,timeit
print(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
我做了几次,总是得到如下相同的数字:
I did that several times and I always get about the same figures as below:
0.7155522836070246
0.9883352857722025
让我感到惊讶的是,chain
–比每个人在sum
上推荐我的答案时列出的列表都要慢得多.
To my surprise, chain
- recommended over sum
for lists by everyone in several comments on my answers - is much slower.
在for
循环中进行迭代仍然很有趣,因为它实际上并没有创建列表,但是在创建列表时,sum
会获胜.
It's still interesting when iterating in a for
loop because it doesn't actually create the list, but when creating the list, sum
wins.
因此,当预期结果为list
时,我们应该放弃itertools.chain
并使用sum
吗?
So should we drop itertools.chain
and use sum
when the expected result is a list
?
感谢一些评论,我通过增加列表数进行了另一项测试
thanks to some comments, I made another test by increasing the number of lists
s = 'l = [[4,5,6] for _ in range(20)]'
print(timeit.timeit("sum(l,[])",setup=s))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s))
现在我得到相反的结论:
now I get the opposite:
6.479897810702537
3.793455760814343
推荐答案
您的测试输入很小.在这些比例下,sum
版本的可怕的O(n ^ 2)渐近运行时间不可见.时序受恒定因子支配,并且sum
具有更好的恒定因子,因为它不必通过迭代器进行工作.
Your test inputs are tiny. At those scales, the horrific O(n^2) asymptotic runtime of the sum
version isn't visible. The timings are dominated by constant factors, and sum
has a better constant factor, since it doesn't have to work through iterators.
使用更大的列表,很明显sum
根本不是为这种事情设计的:
With bigger lists, it becomes clear that sum
is not at all designed for this kind of thing:
>>> timeit.timeit('list(itertools.chain.from_iterable(l))',
... 'l = [[i] for i in xrange(5000)]; import itertools',
... number=1000)
0.20425895931668947
>>> timeit.timeit('sum(l, [])', 'l = [[i] for i in xrange(5000)]', number=1000)
49.55303902059097
这篇关于为什么列表上的总和(有时)比itertools.chain快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!