我正在尝试使用Scatter将矩阵列发送到其他进程。下面的代码非常适合行,因此为了以最小的修改发送列,我使用了Numpy转置功能。但是,这似乎没有任何作用,除非我为矩阵创建了一个完整的副本(您可以想象,它破坏了目的)。

下面的3个最小示例来说明问题(必须运行3个进程!)。


散布行(按预期工作):

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

A = np.zeros((3,3))
if rank==0:
    A = np.matrix([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]])

local_a = np.zeros(3)

comm.Scatter(A, local_a, root=0)
print "process", rank, "has", local_a


提供输出:

process 0 has [ 1.  2.  3.]
process 1 has [ 4.  5.  6.]
process 2 has [ 7.  8.  9.]

散列(无效,仍散行...):

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

A = np.zeros((3,3))
if rank==0:
    A = np.matrix([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]).T

local_a = np.zeros(3)

comm.Scatter(A, local_a, root=0)
print "process", rank, "has", local_a


提供输出:

process 0 has [ 1.  2.  3.]
process 1 has [ 4.  5.  6.]
process 2 has [ 7.  8.  9.]

分散列(有效,但似乎毫无意义):

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

A = np.zeros((3,3))
if rank==0:
    A = np.matrix([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]).T.copy()

local_a = np.zeros(3)

comm.Scatter(A, local_a, root=0)
print "process", rank, "has", local_a


最后给出所需的输出:

process 0 has [ 1.  4.  7.]
process 2 has [ 3.  6.  9.]
process 1 has [ 2.  5.  8.]



有没有一种简单的方法可以发送列而无需复制整个矩阵?



对于上下文,我在mpi4py tutorial中进行练习5。如果您想知道,我的完整解决方案(如上面的第3点那样浪费内存)是这样的:

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

A = np.zeros((3,3))
v = np.zeros(3)
result = np.zeros(3)
if rank==0:
    A = np.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]).T.copy()
    v = np.array([0.1,0.01,0.001])

# Scatter the columns of the matrix
local_a = np.zeros(3)
comm.Scatter(A, local_a, root=0)

# Scatter the elements of the vector
local_v = np.array([0.])
comm.Scatter(v, local_v, root=0)

print "process", rank, "has A_ij =", local_a, "and v_i", local_v

# Multiplication
local_result = local_a * local_v

# Add together
comm.Reduce(local_result, result, op=MPI.SUM)
print "process", rank, "finds", result, "(", local_result, ")"

if (rank==0):
    print "The resulting vector is"
    print "   ", result, "computed in parallel"
    print "and", np.dot(A.T,v), "computed serially."




这是@Sajid请求的内存配置文件测试:

我的解决方案3(给出正确答案):
0.027 MiB A = np.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]).T.copy()0.066 MiB comm.Scatter(A, local_a, root=0)
总计= 0.093 MiB

另一个类似的解决方案(给出正确答案):
0.004 MiB A = np.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]])0.090 MiB comm.Scatter(A.T.copy(), local_a, root=0)
总计= 0.094 MiB

@Sajid的解决方案(给出正确答案):
0.039 MiB A[:,:] = np.transpose(np.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]))0.062 MiB comm.Scatter(A, local_a, root=0)
总计= 0.101 MiB

我的解决方案2(给出错误的答案):
0.004 MiB A = np.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]])0.066 MiB comm.Scatter(A, local_a, root=0)
总计= 0.070 MiB

(我仅从各行复制了内存增量,其中代码版本之间的内存增量有所不同。很明显,这全都是来自根节点的。)

显然,所有正确的解决方案都必须将数组复制到内存中。这是次优的,因为我想要的只是分散列而不是行。

最佳答案

数据未正确复制到A可能是一个问题,请尝试以下操作:

import numpy as np
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

A = np.zeros((3,3))
if rank==0:
    A[:,:] = np.transpose(np.matrix([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]))

local_a = (np.zeros(3))

comm.Scatter(A, local_a, root=0)
print("process", rank, "has", local_a)


当然,如果您使用的是python2,请更改print语句。

关于python - MPI-矩阵的发送和接收列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/47400098/

10-11 08:45