我有一个简单的Monte-Carlo Pi计算程序。我尝试在2个不同的机器上运行它(相同的硬件,内核版本略有不同)。我看到一种情况下性能显着下降(两倍)。没有线程,性能几乎是相同的。对程序进行性能分析表明,速度较慢的程序每个futex调用花费的时间更少。
Linux(3.10.0-123.20.1(Red Hat 4.4.7-16))Python 2.6.6
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.69 53.229549 5 10792796 5385605 futex
Profile Output
==============
256 function calls in 26.189 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
39 26.186 0.671 26.186 0.671 :0(acquire)
Linux(3.10.0-514.26.2(Red Hat 4.8.5-11))Python 2.7.5
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.69 94.281979 8 11620358 5646413 futex
Profile Output
==============
259 function calls in 53.448 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
38 53.445 1.406 53.445 1.406 :0(acquire)
测试程序
import random
import math
import time
import threading
import sys
import profile
def find_pi(tid, n):
t0 = time.time()
in_circle = 0
for i in range(n):
x = random.random()
y = random.random()
dist = math.sqrt(pow(x, 2) + pow(y, 2))
if dist < 1:
in_circle += 1
pi = 4.0 * (float(in_circle)/float(n))
print 'Pi=%s - thread(%s) time=%.3f sec' % (pi, tid, time.time() - t0)
return pi
def main():
if len(sys.argv) > 1:
n = int(sys.argv[1])
else:
n = 6000000
t0 = time.time()
threads = []
num_threads = 5
print 'n =', n
for tid in range(num_threads):
t = threading.Thread(target=find_pi, args=(tid,n,))
threads.append(t)
t.start()
for t in threads:
t.join()
#main()
profile.run('main()')
#profile.run('find_pi(1, 6000000)')
最佳答案
很有可能这是由于这两个版本之间的内核代码发生了一些更改。内核有一个bug in the futex code,导致某些进程死锁。修复该错误可能会导致性能下降。 3.10.0-514的changelog(适用于CentOS)提到了[kernel] futex
的许多更改。
关于python - Python 2.6与2.7多线程性能问题(futex),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/47051545/