本文介绍了使用mpi4py并行化计算群集上的“ for”循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以前从未使用过分布式计算,但是我正在尝试将mpi4py集成到程序中,以便并行计算群集上的for循环。

I haven't worked with distributed computing before, but I'm trying to integrate mpi4py into a program in order to parallelize a for loop on a compute cluster.

这是我要执行的操作的伪代码:

This is a pseudocode of what I want to do:

用于目录中的文件:
初始化类
运行类方法
集团结果

我已经看过堆栈溢出了,找不到任何解决方案为此。是否可以通过mpi4py来完成此操作,或者是否可以通过简单的安装和设置来完成此操作?

I've looked all over stack overflow and I can't find any solution to this. Is there any way to do this simply with mpi4py, or is there another tool that can do it with easy installation and setup?

推荐答案

为了使用MPI4Py实现for循环的并行性,请检查以下代码示例。
它只是一个for循环,用于添加一些数字。 for循环将在每个节点中执行。每个节点将获得不同的数据块(在for循环中)。
最终排名为零的节点将添加所有节点的结果。

In order to achieve parallelism of a for loop with MPI4Py check the code example below.Its just a for loop to add some numbers. The for loop will execute in every node. Every node will get a different chunk of data to work with (range in for loop).In the end Node with rank zero will add the results from all the nodes.

#!/usr/bin/python

import numpy
from mpi4py import MPI
import time

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

a = 1
b = 1000000

perrank = b//size
summ = numpy.zeros(1)

comm.Barrier()
start_time = time.time()

temp = 0
for i in range(a + rank*perrank, a + (rank+1)*perrank):
    temp = temp + i

summ[0] = temp

if rank == 0:
    total = numpy.zeros(1)
else:
    total = None

comm.Barrier()
#collect the partial results and add to the total sum
comm.Reduce(summ, total, op=MPI.SUM, root=0)

stop_time = time.time()

if rank == 0:
    #add the rest numbers to 1 000 000
    for i in range(a + (size)*perrank, b+1):
        total[0] = total[0] + i
    print ("The sum of numbers from 1 to 1 000 000: ", int(total[0]))
    print ("time spent with ", size, " threads in milliseconds")
    print ("-----", int((time.time()-start_time)*1000), "-----")

为了执行上面的代码,您应该像这样运行它:

In order to execute the code above you should run it like this:

在此示例中,我们启用了MPI4Py代码在4个节点上,每个节点16个内核(总共64个进程),每个python进程都绑定到不同的内核。

In this example, we run MPI4Py enabled code on 4 nodes, 16 cores per node (total of 64 processes), each python process is bound to a different core.

可能为您提供帮助的资源:



https://github.com/JordiCorbilla/mpi4py-examples/tree/master/src/examples/matrix%20multiplication

Sources that may help you:
Submit job with python code (mpi4py) on HPC cluster
https://github.com/JordiCorbilla/mpi4py-examples/tree/master/src/examples/matrix%20multiplication

这篇关于使用mpi4py并行化计算群集上的“ for”循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-29 15:04