问题描述
似乎在生成器表达式 (test1) 周围使用 [] 比将其放在 list() (test2) 中表现要好得多.当我简单地将列表传递到 list() 以进行浅拷贝(test3)时,不会出现减速.这是为什么?
证据:
from timeit import Timert1 = Timer("test1()", "from __main__ import test1")t2 = Timer("test2()", "from __main__ import test2")t3 = Timer("test3()", "from __main__ import test3")x = [34534534, 23423523, 77645645, 345346]def test1():[e 表示 x 中的 e]打印 t1.timeit()#0.552290201187def test2():列表(e 表示 x 中的 e)打印 t2.timeit()#2.38739395142def test3():列表(x)打印 t3.timeit()#0.515818119049
机器:64 位 AMD、Ubuntu 8.04、Python 2.7 (r27:82500)
嗯,我的第一步是独立设置这两个测试,以确保这不是例如定义函数的顺序.
>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "[e for e in x]"1000000 个循环,最好的 3 个:每个循环 0.638 微秒>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "list(e for e in x)"1000000 个循环,最好的 3 个:每个循环 1.72 微秒
果然,我可以复制这个.好的,下一步是看看字节码,看看实际发生了什么:
>>>导入文件>>>x=[34534534, 23423523, 77645645, 345346]>>>dis.dis(lambda: [e for e in x])1 0 LOAD_CONST 0(<代码对象<listcomp>在0x0000000001F8B330,文件<stdin>",第1行>)3 MAKE_FUNCTION 06 负载全局 0 (x)9 GET_ITER10 CALL_FUNCTION 113 RETURN_VALUE>>>dis.dis(lambda: list(e for e in x))1 0 LOAD_GLOBAL 0(列表)3 LOAD_CONST 0(在0x0000000001F8B9B0,文件",第1行>)6 MAKE_FUNCTION 09 LOAD_GLOBAL 1 (x)12 GET_ITER13 CALL_FUNCTION 116 CALL_FUNCTION 119 RETURN_VALUE请注意,第一个方法直接创建列表,而第二个方法创建一个 genexpr
对象并将其传递给全局 list
.这可能就是开销所在.
另请注意,差异大约为一微秒,即完全微不足道.
其他有趣的数据
这仍然适用于非平凡列表
>python -mtimeit "x=range(100000)" "[e for e in x]"100 个循环,最好的 3 个:每个循环 8.51 毫秒>python -mtimeit "x=range(100000)" "list(e for e in x)"100 个循环,最好的 3 个:每个循环 11.8 毫秒
对于不太重要的地图函数:
>python -mtimeit "x=range(100000)" "[2*e for e in x]"100 个循环,最好的 3 个:每个循环 12.8 毫秒>python -mtimeit "x=range(100000)" "list(2*e for e in x)"100 个循环,最好的 3 个:每个循环 16.8 毫秒
并且(虽然不那么强烈)如果我们过滤列表:
>python -mtimeit "x=range(100000)" "[e for e in x if e%2]"100 个循环,最好的 3 个:每个循环 14 毫秒>python -mtimeit "x=range(100000)" "list(e for e in x if e%2)"100 个循环,最好的 3 个:每个循环 16.5 毫秒
It appears that using [] around a generator expression (test1) behaves substantially better than putting it inside of list() (test2). The slowdown isn't there when I simply pass a list into list() for shallow copy (test3). Why is this?
Evidence:
from timeit import Timer
t1 = Timer("test1()", "from __main__ import test1")
t2 = Timer("test2()", "from __main__ import test2")
t3 = Timer("test3()", "from __main__ import test3")
x = [34534534, 23423523, 77645645, 345346]
def test1():
[e for e in x]
print t1.timeit()
#0.552290201187
def test2():
list(e for e in x)
print t2.timeit()
#2.38739395142
def test3():
list(x)
print t3.timeit()
#0.515818119049
Machine: 64 bit AMD, Ubuntu 8.04, Python 2.7 (r27:82500)
Well, my first step was to set the two tests up independently to ensure that this is not a result of e.g. the order in which the functions are defined.
>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "[e for e in x]"
1000000 loops, best of 3: 0.638 usec per loop
>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "list(e for e in x)"
1000000 loops, best of 3: 1.72 usec per loop
Sure enough, I can replicate this. OK, next step is to have a look at the bytecode to see what's actually going on:
>>> import dis
>>> x=[34534534, 23423523, 77645645, 345346]
>>> dis.dis(lambda: [e for e in x])
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x0000000001F8B330, file "<stdin>", line 1>)
3 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (x)
9 GET_ITER
10 CALL_FUNCTION 1
13 RETURN_VALUE
>>> dis.dis(lambda: list(e for e in x))
1 0 LOAD_GLOBAL 0 (list)
3 LOAD_CONST 0 (<code object <genexpr> at 0x0000000001F8B9B0, file "<stdin>", line 1>)
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 1 (x)
12 GET_ITER
13 CALL_FUNCTION 1
16 CALL_FUNCTION 1
19 RETURN_VALUE
Notice that the first method creates the list directly, whereas the second method creates a genexpr
object and passes that to the global list
. This is probably where the overhead lies.
Note also that the difference is approximately a microsecond i.e. utterly trivial.
Other interesting data
This still holds for non-trivial lists
>python -mtimeit "x=range(100000)" "[e for e in x]"
100 loops, best of 3: 8.51 msec per loop
>python -mtimeit "x=range(100000)" "list(e for e in x)"
100 loops, best of 3: 11.8 msec per loop
and for less trivial map functions:
>python -mtimeit "x=range(100000)" "[2*e for e in x]"
100 loops, best of 3: 12.8 msec per loop
>python -mtimeit "x=range(100000)" "list(2*e for e in x)"
100 loops, best of 3: 16.8 msec per loop
and (though less strongly) if we filter the list:
>python -mtimeit "x=range(100000)" "[e for e in x if e%2]"
100 loops, best of 3: 14 msec per loop
>python -mtimeit "x=range(100000)" "list(e for e in x if e%2)"
100 loops, best of 3: 16.5 msec per loop
这篇关于Python 的 [<generator expression>] 至少比 list(<generator expression>) 快 3 倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!