itertools.product比嵌套循环慢

本文介绍了itertools.product比嵌套循环慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用itertools.product函数使我的代码段(在同位素模式模拟器中)更易于阅读，并且希望也更快(文档指出未创建任何中间结果)，但是我已经使用cProfiling对两个版本的代码进行了相互测试库，并注意到itertools.product的速度明显比我嵌套的for循环慢.

用于测试的示例值:

carbons = [(0.0, 0.004613223957020534), (1.00335, 0.02494768843632857), (2.0067, 0.0673219412049374), (3.0100499999999997, 0.12087054681917497), (4.0134, 0.16243239687902825), (5.01675, 0.17427700732161705), (6.020099999999999, 0.15550695260604208), (7.0234499999999995, 0.11869556397525197), (8.0268, 0.07911287899598853), (9.030149999999999, 0.04677626606764402)]
hydrogens = [(0.0, 0.9417611429667746), (1.00628, 0.05651245007201512)]
nitrogens = [(0.0, 0.16148864310897554), (0.99703, 0.2949830688288726), (1.99406, 0.26887643366755537), (2.99109, 0.16305943261399866), (3.98812, 0.0740163089529218), (4.98515, 0.026824040474519875), (5.98218, 0.008084687617425748)]
oxygens17 = [(0.0, 0.8269292736927519), (1.00422, 0.15717628899143962), (2.00844, 0.014907548827832968)]
oxygens18 = [(0.0, 0.3584191873916266), (2.00425, 0.36813434247849824), (4.0085, 0.18867830334103902), (6.01275, 0.06433912182670033), (8.017, 0.016421642936302827)]
sulfurs33 = [(0.0, 0.02204843659673093), (0.99939, 0.08442569434459646), (1.99878, 0.16131398792444965), (2.99817, 0.2050722764666321), (3.99756, 0.1951327596407101), (4.99695, 0.14824112268069747), (5.99634, 0.09365899226198841), (6.99573, 0.050618028523695714), (7.99512, 0.023888506307006133), (8.99451, 0.010000884811585533)]
sulfurs34 = [(0.0, 3.0106350597190195e-10), (1.9958, 6.747270089956428e-09), (3.9916, 7.54568412614702e-08), (5.9874, 5.614443102700176e-07), (7.9832, 3.1268212758750728e-06), (9.979, 1.3903197959791067e-05), (11.9748, 5.141248916434075e-05), (13.970600000000001, 0.0001626288218672788), (15.9664, 0.00044921518047309414), (17.9622, 0.0011007203440032396)]
sulfurs36 = [(0.0, 0.904828368500412), (3.99501, 0.0905009370374487)]

展示嵌套for循环的代码段:

totals = []
for i in carbons:
    for j in hydrogens:
        for k in nitrogens:
            for l in oxygens17:
                for m in oxygens18:
                    for n in sulfurs33:
                        for o in sulfurs34:
                            for p in sulfurs36:
                                totals.append((i[0]+j[0]+k[0]+l[0]+m[0]+n[0]+o[0]+p[0], i[1]*j[1]*k[1]*l[1]*m[1]*n[1]*o[1]*p[1]))

说明使用itertools.product的代码段:

totals = []
for i in itertools.product(carbons,hydrogens,nitrogens,oxygens17,oxygens18,sulfurs33,sulfurs34,sulfurs36):
    massDiff = i[0][0]
    chance = i[0][1]
    for j in i[1:]:
        massDiff += j[0]
        chance = chance * j[1]
    totals.append((massDiff,chance))

剖析的结果(基于每种方法10次运行)对于嵌套循环方法而言平均为〜0.8秒，对于itertools.product方法而言平均为〜1.3秒.因此，我的问题是，我是否错误地使用了itertools.product函数，还是应该坚持嵌套的for循环?

-更新-

我已经包括了两个cProfile结果:

# ITERTOOLS.PRODUCT APPROACH
420003 function calls in 1.306 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.018    0.018    1.306    1.306 <string>:1(<module>)
        1    1.246    1.246    1.289    1.289 IsotopeBas.py:64(option1)
   420000    0.042    0.000    0.042    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

和:

# NESTED FOR LOOP APPROACH
420003 function calls in 0.830 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.019    0.019    0.830    0.830 <string>:1(<module>)
        1    0.769    0.769    0.811    0.811 IsotopeBas.py:78(option2)
   420000    0.042    0.000    0.042    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

解决方案

您原来的itertool代码在不必要的lambda上花费了很多额外的时间，并手动构建了中间值列表-其中很多可以替换为内置功能.

现在，内部for循环确实增加了很多额外的开销:只需尝试以下操作，其性能与原始代码相当:

for a in itertools.product(carbons,hydrogens,nitrogens,oxygens17,
                           oxygens18,sulfurs33,sulfurs34,sulfurs36):
    i, j, k, l, m, n, o, p = a
    totals.append((i[0]+j[0]+k[0]+l[0]+m[0]+n[0]+o[0]+p[0],
                   i[1]*j[1]*k[1]*l[1]*m[1]*n[1]*o[1]*p[1]))

以下代码在CPython内置方面尽可能多地运行，并且我对其进行了测试，使其等效于代码.值得注意的是，代码使用zip(*iterable)解压缩每个产品结果.然后将reduce与operator.mul用作乘积，并对sum进行求和； 2个用于查看列表的生成器. for循环仍然略胜一筹，但从长远来看，它可能不是您可以使用的硬编码.

import itertools
from operator import mul
from functools import partial

prod = partial(reduce, mul)
elems = carbons, hydrogens, nitrogens, oxygens17, oxygens18, sulfurs33, sulfurs34, sulfurs36
p = itertools.product(*elems)

totals = [
    ( sum(massdiffs), prod(chances) )
    for massdiffs, chances in
    ( zip(*i) for i in p )
]

I am trying using the itertools.product function to make a segment of my code (in an isotopic pattern simulator) easier to read and hopefully faster as well (the documentation states that no intermediate results are created) , I have however tested both versions of the code against each other using the cProfiling library and noticed that the itertools.product was significantly slower than my nested for loops.

Example values used for the testing:

carbons = [(0.0, 0.004613223957020534), (1.00335, 0.02494768843632857), (2.0067, 0.0673219412049374), (3.0100499999999997, 0.12087054681917497), (4.0134, 0.16243239687902825), (5.01675, 0.17427700732161705), (6.020099999999999, 0.15550695260604208), (7.0234499999999995, 0.11869556397525197), (8.0268, 0.07911287899598853), (9.030149999999999, 0.04677626606764402)]
hydrogens = [(0.0, 0.9417611429667746), (1.00628, 0.05651245007201512)]
nitrogens = [(0.0, 0.16148864310897554), (0.99703, 0.2949830688288726), (1.99406, 0.26887643366755537), (2.99109, 0.16305943261399866), (3.98812, 0.0740163089529218), (4.98515, 0.026824040474519875), (5.98218, 0.008084687617425748)]
oxygens17 = [(0.0, 0.8269292736927519), (1.00422, 0.15717628899143962), (2.00844, 0.014907548827832968)]
oxygens18 = [(0.0, 0.3584191873916266), (2.00425, 0.36813434247849824), (4.0085, 0.18867830334103902), (6.01275, 0.06433912182670033), (8.017, 0.016421642936302827)]
sulfurs33 = [(0.0, 0.02204843659673093), (0.99939, 0.08442569434459646), (1.99878, 0.16131398792444965), (2.99817, 0.2050722764666321), (3.99756, 0.1951327596407101), (4.99695, 0.14824112268069747), (5.99634, 0.09365899226198841), (6.99573, 0.050618028523695714), (7.99512, 0.023888506307006133), (8.99451, 0.010000884811585533)]
sulfurs34 = [(0.0, 3.0106350597190195e-10), (1.9958, 6.747270089956428e-09), (3.9916, 7.54568412614702e-08), (5.9874, 5.614443102700176e-07), (7.9832, 3.1268212758750728e-06), (9.979, 1.3903197959791067e-05), (11.9748, 5.141248916434075e-05), (13.970600000000001, 0.0001626288218672788), (15.9664, 0.00044921518047309414), (17.9622, 0.0011007203440032396)]
sulfurs36 = [(0.0, 0.904828368500412), (3.99501, 0.0905009370374487)]

Snippet demonstrating nested for loops:

totals = []
for i in carbons:
    for j in hydrogens:
        for k in nitrogens:
            for l in oxygens17:
                for m in oxygens18:
                    for n in sulfurs33:
                        for o in sulfurs34:
                            for p in sulfurs36:
                                totals.append((i[0]+j[0]+k[0]+l[0]+m[0]+n[0]+o[0]+p[0], i[1]*j[1]*k[1]*l[1]*m[1]*n[1]*o[1]*p[1]))

Snippet demonstrating the use of itertools.product:

totals = []
for i in itertools.product(carbons,hydrogens,nitrogens,oxygens17,oxygens18,sulfurs33,sulfurs34,sulfurs36):
    massDiff = i[0][0]
    chance = i[0][1]
    for j in i[1:]:
        massDiff += j[0]
        chance = chance * j[1]
    totals.append((massDiff,chance))

The results from profiling (based on 10 runs per method) was an average of ~0.8 seconds for the nested for loop approach and ~1.3 seconds for the itertools.product approach. My question is thus, am I using the itertools.product function wrongly or should I just stick to the nested for loops?

-- UPDATE --

I have included two of my cProfile results:

# ITERTOOLS.PRODUCT APPROACH
420003 function calls in 1.306 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.018    0.018    1.306    1.306 <string>:1(<module>)
        1    1.246    1.246    1.289    1.289 IsotopeBas.py:64(option1)
   420000    0.042    0.000    0.042    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

and:

# NESTED FOR LOOP APPROACH
420003 function calls in 0.830 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.019    0.019    0.830    0.830 <string>:1(<module>)
        1    0.769    0.769    0.811    0.811 IsotopeBas.py:78(option2)
   420000    0.042    0.000    0.042    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

解决方案

Your original itertool code spent a lot extra time in the needless lambda, and building lists of intermediate values by hand - a lot of this can be replaced with builtin functionality.

Now, the inner for loop does add quite a lot extra overhead: just try the following and the performance is very much on par with your original code:

for a in itertools.product(carbons,hydrogens,nitrogens,oxygens17,
                           oxygens18,sulfurs33,sulfurs34,sulfurs36):
    i, j, k, l, m, n, o, p = a
    totals.append((i[0]+j[0]+k[0]+l[0]+m[0]+n[0]+o[0]+p[0],
                   i[1]*j[1]*k[1]*l[1]*m[1]*n[1]*o[1]*p[1]))

The following code runs as much as possible in the CPython builtin side, and I tested it to be equivalent to with code. Notably the code uses zip(*iterable) to unzip each of the product results; then uses the reduce with operator.mul for product, and sum for summing; 2 generators for going through the lists. The for loop still beats slightly, but being hardcoded it probably is not what you can use in the long run.

import itertools
from operator import mul
from functools import partial

prod = partial(reduce, mul)
elems = carbons, hydrogens, nitrogens, oxygens17, oxygens18, sulfurs33, sulfurs34, sulfurs36
p = itertools.product(*elems)

totals = [
    ( sum(massdiffs), prod(chances) )
    for massdiffs, chances in
    ( zip(*i) for i in p )
]

这篇关于itertools.product比嵌套循环慢的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

sum

itertools.product比嵌套​​循环慢

问题描述

itertools.product比嵌套循环慢