问题描述
我有一个2D数组,每个进程都在其中运行一些计算.之后,我需要将所有计算出的列收集回根进程.我目前正在按照先到先得的方式进行分区.在伪代码中,主循环如下所示:
I have a 2D array where I'm running some computation on each process. Afterwards, I need to gather all the computed columns back to the root processes. I'm currently partitioning in a first come first serve manner. In pseudo code, the main loop looks like:
DO i = mpi_rank + 1, num_columns, mpi_size
array(:,i) = do work here
完成此操作后,我需要将这些列收集到根进程中的正确索引中.做这个的最好方式是什么?如果分区方案不同,看起来MPI_GATHERV可以满足我的要求.但是,我不确定最好的分区方法是什么,因为num_columns
和mpi_size
不一定能被整除.
After this is completed, I need to gather these columns into the correct indices back in the root process. What is the best way to do this? It looks like MPI_GATHERV could do what I want if the partitioning scheme was different. However, I'm not sure what the best way to partition that would be since num_columns
and mpi_size
are not necessarily evenly divisible.
推荐答案
我建议采用以下方法:
- 将2D数组切成几乎相等"大小的块,即本地列数接近
num_columns
/mpi_size
. - 使用
mpi_gatherv
收集大块,mpi_gatherv
与不同大小的块一起运行.
- Cut the 2D array into chunks of "almost equal" size, i.e. with local number of columns close to
num_columns
/mpi_size
. - Gather chunks with
mpi_gatherv
, which operates with chunks of different size.
要获得几乎相等"的列数,请将本地列数设置为num_columns
/mpi_size
的整数值,并且仅对前一个mod(num_columns,mpi_size)
mpi任务增加一.
To get "almost equal" number of columns, set local number of columns to integer value of num_columns
/ mpi_size
and increment by one only for first mod(num_columns,mpi_size)
mpi tasks.
下表演示了在5个MPI进程上(10,12)矩阵的划分:
The following table demonstrates the partitioning of (10,12) matrix on 5 MPI processes:
01 02 03 11 12 13 21 22 31 32 41 42
01 02 03 11 12 13 21 22 31 32 41 42
01 02 03 11 12 13 21 22 31 32 41 42
01 02 03 11 12 13 21 22 31 32 41 42
01 02 03 11 12 13 21 22 31 32 41 42
01 02 03 11 12 13 21 22 31 32 41 42
01 02 03 11 12 13 21 22 31 32 41 42
01 02 03 11 12 13 21 22 31 32 41 42
01 02 03 11 12 13 21 22 31 32 41 42
01 02 03 11 12 13 21 22 31 32 41 42
这里的第一位数字是进程的 id ,第二位数字是本地列的数量.如您所见,进程0和1每个都有3列,而所有其他进程每个只有2列.
Here the first digit is an id of the process, the second digit is a number of local columns.As you can see, processes 0 and 1 got 3 columns each, while all other processes got only 2 columns each.
下面您可以找到我编写的有效示例代码.最棘手的部分是为MPI_Gatherv生成rcounts
和displs
数组.讨论的表是代码的输出.
Below you can find working example code that I wrote.The trickiest part would be the generation of rcounts
and displs
arrays for MPI_Gatherv. The discussed table is an output of the code.
program mpi2d
implicit none
include 'mpif.h'
integer myid, nprocs, ierr
integer,parameter:: m = 10 ! global number of rows
integer,parameter:: n = 12 ! global number of columns
integer nloc ! local number of columns
integer array(m,n) ! global m-by-n, i.e. m rows and n columns
integer,allocatable:: loc(:,:) ! local piece of global 2d array
integer,allocatable:: rcounts(:) ! array of nloc's (for mpi_gatrherv)
integer,allocatable:: displs(:) ! array of displacements (for mpi_gatherv)
integer i,j
! Initialize
call mpi_init(ierr)
call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr)
call mpi_comm_size(MPI_COMM_WORLD, nprocs, ierr)
! Partition, i.e. get local number of columns
nloc = n / nprocs
if (mod(n,nprocs)>myid) nloc = nloc + 1
! Compute partitioned array
allocate(loc(m,nloc))
do j=1,nloc
loc(:,j) = myid*10 + j
enddo
! Build arrays for mpi_gatherv:
! rcounts containes all nloc's
! displs containes displacements of partitions in terms of columns
allocate(rcounts(nprocs),displs(nprocs))
displs(1) = 0
do j=1,nprocs
rcounts(j) = n / nprocs
if(mod(n,nprocs).gt.(j-1)) rcounts(j)=rcounts(j)+1
if((j-1).ne.0)displs(j) = displs(j-1) + rcounts(j-1)
enddo
! Convert from number of columns to number of integers
nloc = m * nloc
rcounts = m * rcounts
displs = m * displs
! Gather array on root
call mpi_gatherv(loc,nloc,MPI_INT,array,
& rcounts,displs,MPI_INT,0,MPI_COMM_WORLD,ierr)
! Print array on root
if(myid==0)then
do i=1,m
do j=1,n
write(*,'(I04.2)',advance='no') array(i,j)
enddo
write(*,*)
enddo
endif
! Finish
call mpi_finalize(ierr)
end
这篇关于MPI分区并在Fortran中收集2D数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!