问题描述
在python中,可以通过这种方式 :
In python, it is possible to chain operators in this manner:
a op b op c
经评估为
a op b and b op c
唯一的区别是 b
仅被评估一次(因此,像 t = eval(b); op t和t op c
)。
With the only difference being that b
is evaluated only once (so, something more like t = eval(b); a op t and t op c
).
从它的可读性来看,并且比具有显式连词(使用和
的等效版本)更为简洁。
This is advantageous from the view point that it is very readable and more concise than the equivalent version with explicit conjunction (using and
).
但是...我注意到,链式表达式和等效表达式之间在性能上存在细微差别,无论是3个操作数还是20个。当您使用
However... I've noticed that there is a minor performance difference between chained expressions and the equivalent, be it for 3 operands or 20. This becomes apparent when you time these operations.
import timeit
timeit.timeit("a <= b <= c", setup="a,b,c=1,2,3")
0.1086414959972899
timeit.timeit("a <= b and b <= c", setup="a,b,c=1,2,3")
0.09434155100097996
并且
timeit.timeit("a <= b <= c <= d <= e <= f", setup="a,b,c,d,e,f=1,2,3,4,5,6")
0.2151330839988077
timeit.timeit("a <= b and b <= c and c <= d and d <= e and e <= f", setup="a,b,c,d,e,f=1,2,3,4,5,6")
0.19196406500122976
注意:全部测试是使用Python-3.4 完成的。
检查两个表达式的字节码,我注意到一个表达式的性能明显更高(实际上,还有四个)
Examining the byte code for both expressions, I noticed that one performs significantly more (actually, 4 more) operations than the other.
import dis
dis.dis("a <= b <= c")
1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
6 DUP_TOP
7 ROT_THREE
8 COMPARE_OP 1 (<=)
11 JUMP_IF_FALSE_OR_POP 21
14 LOAD_NAME 2 (c)
17 COMPARE_OP 1 (<=)
20 RETURN_VALUE
>> 21 ROT_TWO
22 POP_TOP
23 RETURN_VALUE
与此相反,
dis.dis("a <= b and b <= c")
1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
6 COMPARE_OP 1 (<=)
9 JUMP_IF_FALSE_OR_POP 21
12 LOAD_NAME 1 (b)
15 LOAD_NAME 2 (c)
18 COMPARE_OP 1 (<=)
>> 21 RETURN_VALUE
我没有阅读字节码的经验,但是第一个代码段一定是在字节码级别比第二个级别执行更多的操作。
I am not experienced with reading byte code, but the first code snippet definitely performs more operations at the byte code level than the second.
这就是我对此的解释。在第一种情况下,变量被压入某种堆栈,并依次弹出以进行比较。所有变量仅弹出一次。在第二种情况下,没有堆栈,但是至少有(N-2)个操作数必须两次加载到内存中才能进行比较。看来堆栈弹出操作比两次加载(N-2)个变量进行比较要昂贵得多,这要考虑速度差异。
Here's how I've interpreted this. In the first case, variables are pushed onto some sort of stack, and popped successively for comparison. All variables are popped only once. In the second case, there is no stack, but at least (N - 2) of the operands have to be loaded into memory twice for comparison. It appears the stack popping operation is more expensive than loading (N - 2) variables twice for comparison, accounting for the speed difference.
简而言之,我正在尝试了解为什么一个操作总是比另一个操作要慢一个常数倍。我的假设正确吗?还是我想念的python内部还有更多东西?
In a nutshell, I'm trying to understand why one operation is always slower than the other by a constant factor. Is my hypothesis correct? Or is there something more to the python internals I'm missing?
更多基准:
| System | a <= b <= c | a <= b and b <= c | a <= b <= ... <= e <= f | a <= b and ... and e <= f | Credit |
|--------|---------------------|---------------------|-------------------------|---------------------------|----------------|
| 3.4 | 0.1086414959972899 | 0.09434155100097996 | 0.2151330839988077 | 0.19196406500122976 | @cᴏʟᴅsᴘᴇᴇᴅ |
| 3.6.2 | 0.06788300536572933 | 0.059271858073771 | 0.1505890181288123 | 0.12044331897050142 | @Bailey Parker |
| 2.7.10 | 0.05009198188781738 | 0.04472208023071289 | 0.11113405227661133 | 0.09062719345092773 | @Bailey Parker |
推荐答案
在CPython的,节省了对 b
用于链式比较并非免费。它处于严重的,不用担心的便宜水平,但是它并不是真正的免费,您正在将其与装载局部变量的便宜一些的操作进行比较。
In CPython's stack-based bytecode execution engine, saving an extra reference to b
for the chained comparison isn't free. It's at the "seriously, don't worry about it" level of cheap, but it's not literally free, and you're comparing it to the slightly cheaper operation of loading a local variable.
COMPARE_OP
操作码将要比较的对象从堆栈中删除,因此对于链式比较,Python必须创建另一个对 b的引用
( DUP_TOP
)并将其推入堆栈中的两个位置( ROT_THREE
)以获得
The COMPARE_OP
opcode removes the objects it's comparing from the stack, so for the chained comparison, Python has to create another reference to b
(DUP_TOP
) and shove it two places down in the stack (ROT_THREE
) to get it out of the way.
在 a< = b和b< = c
中,而不是在引用改组之上,Python只是从堆栈框架的 fastlocals
数组中复制了另一个对 b
的引用。这涉及较少的指针改组和围绕字节码评估循环的行程,因此便宜一些。
In a <= b and b <= c
, instead of the above reference shuffling, Python just copies another reference to b
out of the stack frame's fastlocals
array. This involves less pointer shuffling and one less trip around the bytecode evaluation loop, so it's slightly cheaper.
这篇关于为什么链接的运算符表达式要比其扩展的等效表达式慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!