问题描述
给我的印象是,使用求和构造比运行for循环要快得多.但是,在下面的代码中,for循环实际上运行得更快:
I was under the impression that using a sum construction was much faster than running a for loop. However, in the following code, the for loop actually runs faster:
import time
Score = [[3,4,5,6,7,8] for i in range(40)]
a=[0,1,2,3,4,5,4,5,2,1,3,0,5,1,0,3,4,2,2,4,4,5,1,2,5,4,3,2,0,1,1,0,2,0,0,0,1,3,2,1]
def ver1():
for i in range(100000):
total = 0
for j in range(40):
total+=Score[j][a[j]]
print (total)
def ver2():
for i in range(100000):
total = sum(Score[j][a[j]] for j in range(40))
print (total)
t0 = time.time()
ver1()
t1 = time.time()
ver2()
t2 = time.time()
print("Version 1 time: ", t1-t0)
print("Version 2 time: ", t2-t1)
输出为:
208
208
Version 1 time: 0.9300529956817627
Version 2 time: 1.066061019897461
我做错什么了吗?有没有办法更快地做到这一点?
Am I doing something wrong? Is there a way to do this faster?
(请注意,这只是我设置的演示,在我的实际应用中,分数不会以这种方式重复)
(Note that this is just a demo I set up, in my real application the scores will not be repeated in this manner)
一些其他信息:它在i7上的Python 3.4.4 64位,Windows 7 64位和Windows 7上运行.
Some additional info: This is run on Python 3.4.4 64-bit, on Windows 7 64-bit, on an i7.
推荐答案
这似乎取决于系统,可能是python版本.在我的系统上,差异约为13%:
This seems to depend on the system, probably the python version. On my system, the difference is is about 13%:
python sum.py
208
208
('Version 1 time: ', 0.6371259689331055)
('Version 2 time: ', 0.7342419624328613)
与循环相比,这两个版本无法测量sum
,因为循环实体"并不相同. ver2
做更多的工作,因为它创建了100000次生成器表达式,而ver1
的循环体几乎是微不足道的,但是它为每个迭代创建了一个包含40个元素的列表.您可以将示例更改为相同,然后看到sum
的效果:
The two version are not measuring sum
versus manual looping because the loop "bodies" are not identical. ver2
does more work because it creates the generator expression 100000 times, while ver1
's loop body is almost trivial, but it creates a list with 40 elements for every iteration. You can change the example to be identical, and then you see the effect of sum
:
def ver1():
r = [Score[j][a[j]] for j in range(40)]
for i in xrange(100000):
total = 0
for j in r:
total+=j
print (total)
def ver2():
r = [Score[j][a[j]] for j in xrange(40)]
for i in xrange(100000):
total = sum(r)
print (total)
我已将所有内容从内部循环主体和sum
调用中移出,以确保我们仅测量手工制作的循环的开销.使用xrange
代替range
可以进一步改善总体运行时间,但这适用于两个版本,因此不会更改比较.在我的系统上修改后的代码的结果是:
I've moved everything out of the inner loop body and out of the sum
call to make sure that we are measuring only the overhead of hand-crafted loops. Using xrange
instead of range
further improves the overall runtime, but this applies to both versions and thus does not change the comparison. The results of the modified code on my system is:
python sum.py
208
208
('Version 1 time: ', 0.2034609317779541)
('Version 2 time: ', 0.04234910011291504)
ver2
比ver1
快五倍.这是使用sum
代替手工制作的循环的纯性能提升.
ver2
is five times faster than ver1
. This is the pure performance gain of using sum
instead of a hand-crafted loop.
受到 ShadowRanger关于查找问题的评论的启发 ,我修改了该示例以比较原始代码并检查是否查找绑定符号:
Inspired by ShadowRanger's comment on the question about lookups, I have modified the example to compare the original code and check if the lookup of bound symbols:
def gen(s,b):
for j in xrange(40):
yield s[j][b[j]]
def ver2():
for i in range(100000):
total = sum(gen(Score, a))
print (total)
我创建了一个小的自定义生成器,该生成器在本地绑定了Score
和a
,以防止在父作用域中进行昂贵的查找.执行此操作:
I create a small custom generator which locally binds Score
and a
to prevent expensive lookups in parent scopes. Executing this:
python sum.py
208
208
('Version 1 time: ', 0.6167840957641602)
('Version 2 time: ', 0.6198039054870605)
仅符号查找就占运行时的〜12%.
The symbol lookups alone account for ~12% of the runtime.
这篇关于求和速度为“和".对Python的理解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!