问题描述
我正在尝试使用itertools.product
函数使我的代码段(在同位素模式模拟器中)更易于阅读,并且希望也更快(文档指出未创建任何中间结果),但是我已经使用cProfiling
对两个版本的代码进行了相互测试库,并注意到itertools.product
的速度明显比我嵌套的for循环慢.
用于测试的示例值:
carbons = [(0.0, 0.004613223957020534), (1.00335, 0.02494768843632857), (2.0067, 0.0673219412049374), (3.0100499999999997, 0.12087054681917497), (4.0134, 0.16243239687902825), (5.01675, 0.17427700732161705), (6.020099999999999, 0.15550695260604208), (7.0234499999999995, 0.11869556397525197), (8.0268, 0.07911287899598853), (9.030149999999999, 0.04677626606764402)]
hydrogens = [(0.0, 0.9417611429667746), (1.00628, 0.05651245007201512)]
nitrogens = [(0.0, 0.16148864310897554), (0.99703, 0.2949830688288726), (1.99406, 0.26887643366755537), (2.99109, 0.16305943261399866), (3.98812, 0.0740163089529218), (4.98515, 0.026824040474519875), (5.98218, 0.008084687617425748)]
oxygens17 = [(0.0, 0.8269292736927519), (1.00422, 0.15717628899143962), (2.00844, 0.014907548827832968)]
oxygens18 = [(0.0, 0.3584191873916266), (2.00425, 0.36813434247849824), (4.0085, 0.18867830334103902), (6.01275, 0.06433912182670033), (8.017, 0.016421642936302827)]
sulfurs33 = [(0.0, 0.02204843659673093), (0.99939, 0.08442569434459646), (1.99878, 0.16131398792444965), (2.99817, 0.2050722764666321), (3.99756, 0.1951327596407101), (4.99695, 0.14824112268069747), (5.99634, 0.09365899226198841), (6.99573, 0.050618028523695714), (7.99512, 0.023888506307006133), (8.99451, 0.010000884811585533)]
sulfurs34 = [(0.0, 3.0106350597190195e-10), (1.9958, 6.747270089956428e-09), (3.9916, 7.54568412614702e-08), (5.9874, 5.614443102700176e-07), (7.9832, 3.1268212758750728e-06), (9.979, 1.3903197959791067e-05), (11.9748, 5.141248916434075e-05), (13.970600000000001, 0.0001626288218672788), (15.9664, 0.00044921518047309414), (17.9622, 0.0011007203440032396)]
sulfurs36 = [(0.0, 0.904828368500412), (3.99501, 0.0905009370374487)]
展示嵌套for循环的代码段:
totals = []
for i in carbons:
for j in hydrogens:
for k in nitrogens:
for l in oxygens17:
for m in oxygens18:
for n in sulfurs33:
for o in sulfurs34:
for p in sulfurs36:
totals.append((i[0]+j[0]+k[0]+l[0]+m[0]+n[0]+o[0]+p[0], i[1]*j[1]*k[1]*l[1]*m[1]*n[1]*o[1]*p[1]))
说明使用itertools.product
的代码段:
totals = []
for i in itertools.product(carbons,hydrogens,nitrogens,oxygens17,oxygens18,sulfurs33,sulfurs34,sulfurs36):
massDiff = i[0][0]
chance = i[0][1]
for j in i[1:]:
massDiff += j[0]
chance = chance * j[1]
totals.append((massDiff,chance))
剖析的结果(基于每种方法10次运行)对于嵌套循环方法而言平均为〜0.8秒,对于itertools.product
方法而言平均为〜1.3秒.因此,我的问题是,我是否错误地使用了itertools.product
函数,还是应该坚持嵌套的for循环?
-更新-
我已经包括了两个cProfile
结果:
# ITERTOOLS.PRODUCT APPROACH
420003 function calls in 1.306 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.018 0.018 1.306 1.306 <string>:1(<module>)
1 1.246 1.246 1.289 1.289 IsotopeBas.py:64(option1)
420000 0.042 0.000 0.042 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
和:
# NESTED FOR LOOP APPROACH
420003 function calls in 0.830 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.019 0.019 0.830 0.830 <string>:1(<module>)
1 0.769 0.769 0.811 0.811 IsotopeBas.py:78(option2)
420000 0.042 0.000 0.042 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
您原来的itertool代码在不必要的lambda
上花费了很多额外的时间,并手动构建了中间值列表-其中很多可以替换为内置功能.
现在,内部for循环确实增加了很多额外的开销:只需尝试以下操作,其性能与原始代码相当:
for a in itertools.product(carbons,hydrogens,nitrogens,oxygens17,
oxygens18,sulfurs33,sulfurs34,sulfurs36):
i, j, k, l, m, n, o, p = a
totals.append((i[0]+j[0]+k[0]+l[0]+m[0]+n[0]+o[0]+p[0],
i[1]*j[1]*k[1]*l[1]*m[1]*n[1]*o[1]*p[1]))
以下代码在CPython内置方面尽可能多地运行,并且我对其进行了测试,使其等效于代码.值得注意的是,代码使用zip(*iterable)
解压缩每个产品结果.然后将reduce
与operator.mul
用作乘积,并对sum
进行求和; 2个用于查看列表的生成器. for循环仍然略胜一筹,但从长远来看,它可能不是您可以使用的硬编码.
import itertools
from operator import mul
from functools import partial
prod = partial(reduce, mul)
elems = carbons, hydrogens, nitrogens, oxygens17, oxygens18, sulfurs33, sulfurs34, sulfurs36
p = itertools.product(*elems)
totals = [
( sum(massdiffs), prod(chances) )
for massdiffs, chances in
( zip(*i) for i in p )
]
I am trying using the itertools.product
function to make a segment of my code (in an isotopic pattern simulator) easier to read and hopefully faster as well (the documentation states that no intermediate results are created) , I have however tested both versions of the code against each other using the cProfiling
library and noticed that the itertools.product
was significantly slower than my nested for loops.
Example values used for the testing:
carbons = [(0.0, 0.004613223957020534), (1.00335, 0.02494768843632857), (2.0067, 0.0673219412049374), (3.0100499999999997, 0.12087054681917497), (4.0134, 0.16243239687902825), (5.01675, 0.17427700732161705), (6.020099999999999, 0.15550695260604208), (7.0234499999999995, 0.11869556397525197), (8.0268, 0.07911287899598853), (9.030149999999999, 0.04677626606764402)]
hydrogens = [(0.0, 0.9417611429667746), (1.00628, 0.05651245007201512)]
nitrogens = [(0.0, 0.16148864310897554), (0.99703, 0.2949830688288726), (1.99406, 0.26887643366755537), (2.99109, 0.16305943261399866), (3.98812, 0.0740163089529218), (4.98515, 0.026824040474519875), (5.98218, 0.008084687617425748)]
oxygens17 = [(0.0, 0.8269292736927519), (1.00422, 0.15717628899143962), (2.00844, 0.014907548827832968)]
oxygens18 = [(0.0, 0.3584191873916266), (2.00425, 0.36813434247849824), (4.0085, 0.18867830334103902), (6.01275, 0.06433912182670033), (8.017, 0.016421642936302827)]
sulfurs33 = [(0.0, 0.02204843659673093), (0.99939, 0.08442569434459646), (1.99878, 0.16131398792444965), (2.99817, 0.2050722764666321), (3.99756, 0.1951327596407101), (4.99695, 0.14824112268069747), (5.99634, 0.09365899226198841), (6.99573, 0.050618028523695714), (7.99512, 0.023888506307006133), (8.99451, 0.010000884811585533)]
sulfurs34 = [(0.0, 3.0106350597190195e-10), (1.9958, 6.747270089956428e-09), (3.9916, 7.54568412614702e-08), (5.9874, 5.614443102700176e-07), (7.9832, 3.1268212758750728e-06), (9.979, 1.3903197959791067e-05), (11.9748, 5.141248916434075e-05), (13.970600000000001, 0.0001626288218672788), (15.9664, 0.00044921518047309414), (17.9622, 0.0011007203440032396)]
sulfurs36 = [(0.0, 0.904828368500412), (3.99501, 0.0905009370374487)]
Snippet demonstrating nested for loops:
totals = []
for i in carbons:
for j in hydrogens:
for k in nitrogens:
for l in oxygens17:
for m in oxygens18:
for n in sulfurs33:
for o in sulfurs34:
for p in sulfurs36:
totals.append((i[0]+j[0]+k[0]+l[0]+m[0]+n[0]+o[0]+p[0], i[1]*j[1]*k[1]*l[1]*m[1]*n[1]*o[1]*p[1]))
Snippet demonstrating the use of itertools.product
:
totals = []
for i in itertools.product(carbons,hydrogens,nitrogens,oxygens17,oxygens18,sulfurs33,sulfurs34,sulfurs36):
massDiff = i[0][0]
chance = i[0][1]
for j in i[1:]:
massDiff += j[0]
chance = chance * j[1]
totals.append((massDiff,chance))
The results from profiling (based on 10 runs per method) was an average of ~0.8 seconds for the nested for loop approach and ~1.3 seconds for the itertools.product
approach. My question is thus, am I using the itertools.product
function wrongly or should I just stick to the nested for loops?
-- UPDATE --
I have included two of my cProfile
results:
# ITERTOOLS.PRODUCT APPROACH
420003 function calls in 1.306 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.018 0.018 1.306 1.306 <string>:1(<module>)
1 1.246 1.246 1.289 1.289 IsotopeBas.py:64(option1)
420000 0.042 0.000 0.042 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
and:
# NESTED FOR LOOP APPROACH
420003 function calls in 0.830 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.019 0.019 0.830 0.830 <string>:1(<module>)
1 0.769 0.769 0.811 0.811 IsotopeBas.py:78(option2)
420000 0.042 0.000 0.042 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Your original itertool code spent a lot extra time in the needless lambda
, and building lists of intermediate values by hand - a lot of this can be replaced with builtin functionality.
Now, the inner for loop does add quite a lot extra overhead: just try the following and the performance is very much on par with your original code:
for a in itertools.product(carbons,hydrogens,nitrogens,oxygens17,
oxygens18,sulfurs33,sulfurs34,sulfurs36):
i, j, k, l, m, n, o, p = a
totals.append((i[0]+j[0]+k[0]+l[0]+m[0]+n[0]+o[0]+p[0],
i[1]*j[1]*k[1]*l[1]*m[1]*n[1]*o[1]*p[1]))
The following code runs as much as possible in the CPython builtin side, and I tested it to be equivalent to with code. Notably the code uses zip(*iterable)
to unzip each of the product results; then uses the reduce
with operator.mul
for product, and sum
for summing; 2 generators for going through the lists. The for loop still beats slightly, but being hardcoded it probably is not what you can use in the long run.
import itertools
from operator import mul
from functools import partial
prod = partial(reduce, mul)
elems = carbons, hydrogens, nitrogens, oxygens17, oxygens18, sulfurs33, sulfurs34, sulfurs36
p = itertools.product(*elems)
totals = [
( sum(massdiffs), prod(chances) )
for massdiffs, chances in
( zip(*i) for i in p )
]
这篇关于itertools.product比嵌套循环慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!