问题描述
我在这里回答了几个问题,使用它来展平"列表列表:
>>>l = [[1,2,3],[4,5,6],[7,8,9]]>>>总和(l,[])它工作正常并产生:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
虽然我被告知 sum
运算符执行 a = a + b
不如 itertools.chain
我计划的问题是为什么可以在列表中阻止它在字符串上出现",但我在我的机器上做了一个快速基准比较 sum
和 itertools.chain.from_iterable
在相同的数据上:
import itertools,timeit打印(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))打印(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
我做了好几次,我总是得到与下面相同的数字:
0.71555228360702460.9883352857722025
令我惊讶的是,chain
- 在对我的答案的几条评论中,每个人都推荐比 sum
用于列表 - 速度要慢得多.
在 for
循环中迭代时仍然很有趣,因为它实际上并没有创建列表,但是在创建列表时,sum
获胜.
那么当预期结果是一个 list
时,我们应该放弃 itertools.chain
并使用 sum
吗?
感谢一些评论,我通过增加列表数量进行了另一个测试
s = 'l = [[4,5,6] for _ in range(20)]'打印(timeit.timeit("sum(l,[])",setup=s))打印(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s))
现在我得到了相反的结果:
6.4798978107025373.793455760814343
您的测试输入很小.在这些尺度上,sum
版本可怕的 O(n^2) 渐近运行时间是不可见的.时间由常数因子决定,并且 sum
具有更好的常数因子,因为它不必通过迭代器工作.
对于更大的列表,很明显 sum
根本不是为这种事情设计的:
I answered several questions here by using this to "flatten" a list of lists:
>>> l = [[1,2,3],[4,5,6],[7,8,9]]
>>> sum(l,[])
it works fine and yields:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
although I was told that the sum
operator does a = a + b
which is not as performant as itertools.chain
My planned question was "why is it possible on lists where it is prevented on strings", but I made a quick benchmark on my machine comparing sum
and itertools.chain.from_iterable
on the same data:
import itertools,timeit
print(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
I did that several times and I always get about the same figures as below:
0.7155522836070246
0.9883352857722025
To my surprise, chain
- recommended over sum
for lists by everyone in several comments on my answers - is much slower.
It's still interesting when iterating in a for
loop because it doesn't actually create the list, but when creating the list, sum
wins.
So should we drop itertools.chain
and use sum
when the expected result is a list
?
EDIT: thanks to some comments, I made another test by increasing the number of lists
s = 'l = [[4,5,6] for _ in range(20)]'
print(timeit.timeit("sum(l,[])",setup=s))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s))
now I get the opposite:
6.479897810702537
3.793455760814343
Your test inputs are tiny. At those scales, the horrific O(n^2) asymptotic runtime of the sum
version isn't visible. The timings are dominated by constant factors, and sum
has a better constant factor, since it doesn't have to work through iterators.
With bigger lists, it becomes clear that sum
is not at all designed for this kind of thing:
>>> timeit.timeit('list(itertools.chain.from_iterable(l))',
... 'l = [[i] for i in xrange(5000)]; import itertools',
... number=1000)
0.20425895931668947
>>> timeit.timeit('sum(l, [])', 'l = [[i] for i in xrange(5000)]', number=1000)
49.55303902059097
这篇关于为什么列表上的总和(有时)比 itertools.chain 快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!