Python:如何用字典并行化循环

本文介绍了Python:如何用字典并行化循环的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个看起来像这样的代码:

EDITED: I have a code which looks like:

__author__ = 'feynman'

cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def MC_Surface(volume, mc_vol):

    Perm_area = {
        "00000000": 0.000000,
        "11111111": 0.000000,
         ...
         ...
        "11100010": 1.515500,
        "00011101": 1.515500
    }

    cdef int j, i, k
    for k in range(volume.shape[2] - 1):
        for j in range(volume.shape[1] - 1):
            for i in range(volume.shape[0] - 1):

                pattern = '%i%i%i%i%i%i%i%i' % (
                    volume[i, j, k],
                    volume[i, j + 1, k],
                    volume[i + 1, j, k],
                    volume[i + 1, j + 1, k],
                    volume[i, j, k + 1],
                    volume[i, j + 1, k + 1],
                    volume[i + 1, j, k + 1],
                    volume[i + 1, j + 1, k + 1])

                mc_vol[i, j, k] = Perm_area[pattern]

    return mc_vol

为了加快速度，将其修改为:

In the hope to speed up, it was modified to:

    {
      ...
      ...
      "11100010": 1.515500,
      "00011101": 1.515500
    }
    keys = np.array(Perm_area.keys())
    values = np.array(Perm_area.values())

    starttime = time.time()
    tmp_vol = GetPattern(volume)
    print 'time to populate the key array: ', time.time() - starttime

    cdef int i
    starttime=time.time()
    for i, this_key in enumerate(keys):
        mc_vol[tmp_vol == this_key] = values[i]

    print 'time for the loop: ', time.time() -starttime
    return mc_vol

def GetPattern(volume):
    a = (volume.astype(np.int)).astype(np.str)
    output = a.copy()  # Central voxel
    output[:, :-1, :] = np.char.add(output[:, :-1, :], a[:, 1:, :])  # East
    output[:-1, :, :] = np.char.add(output[:-1, :, :], a[1:, :, :])  # South
    output[:-1, :-1, :] = np.char.add(output[:-1, :-1, :], a[1:, 1:, :])  # SouthEast
    output[:, :, :-1] = np.char.add(output[:, :, :-1], a[:, :, 1:])  # Down
    output[:, :-1, :-1] = np.char.add(output[:, :-1, :-1], a[:, 1:, 1:])  # DownEast
    output[:-1, :, :-1] = np.char.add(output[:-1, :, :-1], a[1:, :, 1:])  # DownSouth
    output[:-1, :-1, :-1] = np.char.add(output[:-1, :-1, :-1], a[1:, 1:, 1:])  # DownSouthEast
    output = output[:-1, :-1, :-1]
    del a
    return output

对于尺寸为500 ^ 3的3D数组，此过程花费的时间更长.在这里，tmp_vol 3D字符串数组.例如:如果说tmp_vol [0,0,0] ="00000000"，则mc_vol [0,0,0] = 0.00000.或者，我可以摆脱mc_vol并编写是否tmp_vol [0,0,0] ="00000000"，然后tmp_vol [0,0,0] = 0.00000.

This takes longer for a 3D array of size say 500^3. Here, tmp_vol 3D array of strings. For example: if say tmp_vol[0,0,0] = "00000000" then mc_vol[0,0,0] = 0.00000. Alternatively, I can get rid of mc_vol and write if tmp_vol[0,0,0] = "00000000" then tmp_vol[0,0,0] = 0.00000.

在这里，for循环需要很多时间，我看到只使用了一个CPU.我试图使用map和lambda并行映射它们，但是遇到了错误.我真的是python新手，所以任何提示都会很棒.

Here, The for-loop takes a lot of time, I see that only one CPU is used. I tried to map them in parallel using map and lambda but ran into errors. I am really new to python so any hints will be great.

推荐答案

由于我不太了解您的代码，并且您说任何提示都会很棒"，因此，我将向您提供一些一般性建议.基本上，您想加速for循环

Since I don't quite understand your code and you said "any hints will be great", I'll give you some general suggestions. Basically you want to speed up a for loop

for i, this_key in enumerate(keys):

您可以做的是将 keys 数组分成几部分，如下所示:

What you could do is split the keys array into several parts, something like this:

length = len(keys)
part1 = keys[:length/3]
part2 = keys[length/3: 2*length/3]
part3 = keys[2*length/3:]

然后处理子流程中的每个部分:

Then deal with each part in a subprocess:

from concurrent.futures import ProcessPoolExecutor

def do_work(keys):
    for i, this_key in enumerate(keys):
        mc_vol[tmp_vol == this_key] = values[i]

with ProcessPoolExecutor(max_workers=3) as e:
    e.submit(do_work, part1)
    e.submit(do_work, part2)
    e.submit(do_work, part3)

return mc_vol

就是这样.

这篇关于Python:如何用字典并行化循环的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

Any