本文介绍了使用 MPI_Gather 在 Fortran 中发送二维数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 MPI_GATHER 发送二维数据块.例如:我在每个节点上有 2x3 数组,如果我有 4 个节点,我想要根上的 8x3 数组.对于 1d 数组,MPI_GATHER 根据 MPI 等级对数据进行排序,但对于 2d 数据,它会造成混乱!

I want to send 2d chunks of data using MPI_GATHER. For example: I have 2x3 arrays on each node and I want 8x3 array on root, if I have 4 nodes. For 1d arrays, MPI_GATHER sorts data according to MPI ranks, but for 2d data it creates a mess!

什么是整理块的干净方法?

What is the clean way to put chunks in order?

我期待这段代码的输出:

I expected the output of this code:

program testmpi
  use mpi
  implicit none
  integer :: send (2,3)
  integer :: rec (4,3)
  integer :: ierror,my_rank,i,j

  call MPI_Init(ierror)
  MPI_DATA_TYPE type_col
  ! find out process rank
  call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
  if (my_rank==0) then
    send=1
    do i=1,2
      print*,(send(i,j),j=1,3)
    enddo
  endif
  if (my_rank==1) then
    send=5
    ! do 1,2
    !   print*,(send(i,j),j=1,3)
    ! enddo
  endif
  call MPI_GATHER(send,6,MPI_INTEGER,rec,6,MPI_INTEGER,0,MPI_COMM_WORLD,ierror)
  if (my_rank==0) then
    print*,'<><><><><>rec'
    do i=1,4
      print*,(rec(i,j),j=1,3)
    enddo
  endif
  call MPI_Finalize(ierror)
end program testmpi

变成这样:

   1           1           1
   1           1           1
   5           5           5
   5           5           5

但它看起来像这样:

   1           1           5
   1           1           5
   1           5           5
   1           5           5

推荐答案

下面是this answer的字面Fortran翻译.我原以为这是不必要的,但数组索引和内存布局的多重差异可能意味着值得做一个 Fortran 版本.

The following a literal Fortran translation of this answer. I had thought this was unnecessary, but the multiple differences in array indexing and memory layout might mean that it is worth doing a Fortran version.

首先我要说的是,您通常并不想这样做 - 从某个主"进程分散并收集大量数据.通常,您希望每个任务都完成自己的难题,并且您的目标应该是永远不要让一个处理器需要整个数据的全局视图";只要你需要,你就会限制可伸缩性和问题的大小.如果您正在为 I/O 执行此操作 - 一个进程读取数据,然后将其分散,然后将其收集回来进行写入,您最终会希望查看 MPI-IO.

Let me start by saying that you generally don't really want to do this - scatter and gather huge chunks of data from some "master" process. Normally you want each task to be chugging away at its own piece of the puzzle, and you should aim to never have one processor need a "global view" of the whole data; as soon as you require that, you limit scalability and the problem size. If you're doing this for I/O - one process reads the data, then scatters it, then gathers it back for writing, you'll want eventually to look into MPI-IO.

不过,说到您的问题,MPI 有非常好的方法可以将任意数据从内存中提取出来,并将其分散/收集到一组处理器中.不幸的是,这需要相当多的 MPI 概念——MPI 类型、范围和集体操作.在这个问题的答案中讨论了很多基本思想——MPI_Type_create_subarray 和MPI_Gather .

Getting to your question, though, MPI has very nice ways of pulling arbitrary data out of memory, and scatter/gathering it to and from a set of processors. Unfortunately that requires a fair number of MPI concepts - MPI Types, extents, and collective operations. A lot of the basic ideas are discussed in the answer to this question -- MPI_Type_create_subarray and MPI_Gather .

考虑一个 1d 整数全局数组,任务 0 具有您希望将其分配给多个 MPI 任务,以便它们各自在其本地数组中获得一块.假设你有 4 个任务,全局数组是 [0,1,2,3,4,5,6,7].您可以让任务 0 发送四条消息(包括一条给它自己)来分发它,当需要重新组装时,接收四条消息将它捆绑在一起;但这显然会在大量进程中变得非常耗时.这些类型的操作有优化的例程 - 分散/收集操作.所以在这种 1d 情况下,你会做这样的事情:

Consider a 1d integer global array that task 0 has that you want to distribute to a number of MPI tasks, so that they each get a piece in their local array. Say you have 4 tasks, and the global array is [0,1,2,3,4,5,6,7]. You could have task 0 send four messages (including one to itself) to distribute this, and when it's time to re-assemble, receive four messages to bundle it back together; but that obviously gets very time consuming at large numbers of processes. There are optimized routines for these sorts of operations - scatter/gather operations. So in this 1d case you'd do something like this:

integer, dimension(8) :: global      ! only root has this
integer, dimension(2) :: local       ! everyone has this
integer, parameter    :: root = 0
integer :: rank, comsize
integer :: i, ierr

call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

if (rank == root) then
    global = [ (i, i=1,8) ]
endif

call MPI_Scatter(global, 2, MPI_INTEGER, &    ! send everyone 2 ints from global
                 local,  2, MPI_INTEGER, &    ! each proc recieves 2 into
                 root,                   &    ! sending process is root,
                 MPI_COMM_WORLD, ierr)        ! all procs in COMM_WORLD participate

在此之后,处理器的数据将如下所示

After this, the processors' data would look like

task 0:  local:[1,2]  global: [1,2,3,4,5,6,7,8]
task 1:  local:[3,4]  global: [garbage]
task 2:  local:[5,6]  global: [garbage]
task 3:  local:[7,8]  global: [garbage]

也就是说,分散操作采用全局数组并将连续的 2-int 块发送到所有处理器.

That is, the scatter operation takes the global array and sends contiguous 2-int chunks to all the processors.

要重新组装数组,我们使用 MPI_Gather() 操作,其工作原理完全相同但相反:

To re-assemble the array, we use the MPI_Gather() operation, which works exactly the same but in reverse:

local = local + rank

call MPI_Gather (local,  2, MPI_INTEGER, &    ! everyone sends 2 ints from local
                 global, 2, MPI_INTEGER, &    ! root receives 2 ints each proc into global
                 root,                   &    ! receiving process is root,
                 MPI_COMM_WORLD, ierr)        ! all procs in COMM_WORLD participate

现在数组看起来像:

task 0:  local:[1,2]    global: [1,2,4,5,7,8,10,11]
task 1:  local:[4,5]    global: [garbage-]
task 2:  local:[7,8]    global: [garbage-]
task 3:  local:[10,11]  global: [garbage-]

Gather 带回所有数据.

Gather brings all the data back.

如果数据点的数量不均分进程数,并且我们需要向每个进程发送不同数量的项目,会发生什么情况?然后你需要一个通用的分散版本,MPI_Scatterv,它允许您指定每个处理器的计数和位移——在全局数组中该数据段的起始位置.因此,假设对于相同的 4 个任务,您有一个字符数组 [a,b,c,d,e,f,g,h,i] 有 9 个字符,并且您将为每个进程分配两个字符,除了最后一个字符,得到了三个.那么你需要

What happens if the number of data points doesn't evenly divide the number of processes, and we need to send different numbers of items to each process? Then you need a generalized version of scatter, MPI_Scatterv, which lets you specify the counts for each processor, and displacements -- where in the global array that piece of data starts. So let's say with the same 4 tasks you had an array of characters [a,b,c,d,e,f,g,h,i] with 9 characters, and you were going to assign every process two characters except the last, that got three. Then you'd need

character, dimension(9) :: global
character, dimension(3) :: local
integer, dimension(4)   :: counts
integer, dimension(4)   :: displs

if (rank == root) then
    global = [ (achar(i+ichar('a')), i=0,8) ]
endif
local = ['-','-','-']

counts = [2,2,2,3]
displs = [0,2,4,6]

mycounts = counts(rank+1)

call MPI_Scatterv(global, counts, displs,         & ! proc i gets counts(i) chars from displs(i)
                  MPI_CHARACTER,                  &
                  local, mycounts, MPI_CHARACTER, & ! I get mycounts chars into
                  root,                           & ! root rank does sending
                  MPI_COMM_WORLD, ierr)             ! all procs in COMM_WORLD participate

现在数据看起来像

task 0:  local:"ab-"  global: "abcdefghi"
task 1:  local:"cd-"  global: *garbage*
task 2:  local:"ef-"  global: *garbage*
task 3:  local:"ghi"  global: *garbage*

您现在已经使用 scatterv 来分发不规则数量的数据.每种情况下的位移都是从数组的开头开始的两个*秩(以字符为单位;位移的单位是为分散发送或为收集接收的类型;通常不是以字节或其他单位为单位),并且计数为 [2,2,2,3].如果它是我们希望有 3 个字符的第一个处理器,我们将设置 counts=[3,2,2,2] 并且位移将是 [0,3,5,7].Gatherv 再次以完全相同的方式工作,但相反;counts 和 displs 数组将保持不变.

You've now used scatterv to distribute the irregular amounts of data. The displacement in each case is two*rank (measured in characters; the displacement is in unit of the types being sent for a scatter or received for a gather; it's not generally in bytes or something) from the start of the array, and the counts are [2,2,2,3]. If it had been the first processor we wanted to have 3 characters, we would have set counts=[3,2,2,2] and displacements would have been [0,3,5,7]. Gatherv again works exactly the same but reverse; the counts and displs arrays would remain the same.

现在,对于 2D,这有点棘手.如果我们想发送 2d 数组的 2d 子块,我们现在发送的数据不再是连续的.如果我们向 4 个处理器发送(比如说)一个 6x6 数组的 3x3 子块,我们发送的数据中有漏洞:

Now, for 2D, this is a bit trickier. If we want to send 2d sublocks of a 2d array, the data we're sending now no longer is contiguous. If we're sending (say) 3x3 subblocks of a 6x6 array to 4 processors, the data we're sending has holes in it:

2D Array

   ---------
   |000|222|
   |000|222|
   |000|222|
   |---+---|
   |111|333|
   |111|333|
   |111|333|
   ---------

Actual layout in memory

   [000111000111000111222333222333222333]

(请注意,所有高性能计算都归结为了解内存中数据的布局.)

(Note that all high-performance computing comes down to understanding the layout of data in memory.)

如果我们要将标记为1"的数据发送给任务1,我们需要跳过三个值,发送三个值,跳过三个值,发送三个值,跳过三个值,发送三个值.第二个复杂因素是子区域停止和开始的地方.请注意,区域1"不会从区域0"停止的地方开始;在区域0"的最后一个元素之后,内存中的下一个位置位于区域1"的中途.

If we want to send the data that is marked "1" to task 1, we need to skip three values, send three values, skip three values, send three values, skip three values, send three values. A second complication is where the subregions stop and start; note that region "1" doesn't start where region "0" stops; after the last element of region "0", the next location in memory is partway-way through region "1".

让我们首先解决第一个布局问题 - 如何仅提取我们想要发送的数据.我们总是可以将所有0"区域数据复制到另一个连续数组中,然后发送;如果我们足够仔细地计划它,我们甚至可以这样做,我们可以在结果上调用 MPI_Scatter.但我们宁愿不必以这种方式转置整个主要数据结构.

Let's tackle the first layout problem first - how to pull out just the data we want to send. We could always just copy out all the "0" region data to another, contiguous array, and send that; if we planned it out carefully enough, we could even do that in such a way that we could call MPI_Scatter on the results. But we'd rather not have to transpose our entire main data structure that way.

到目前为止,我们使用的所有 MPI 数据类型都是简单的 - MPI_INTEGER 指定(比如说)连续 4 个字节.但是,MPI 允许您创建自己的数据类型来描述内存中任意复杂的数据布局.这种情况——数组的矩形子区域——很常见,以至于 有一个特定的要求.对于我们上面描述的二维情况,

So far, all the MPI data types we've used are simple ones - MPI_INTEGER specifies (say) 4 bytes in a row. However, MPI lets you create your own data types that describe arbitrarily complex data layouts in memory. And this case -- rectangular subregions of an array -- is common enough that there's a specific call for that. For the 2-dimensional case we're describing above,

integer :: newtype;
integer, dimension(2) :: sizes, subsizes, starts

sizes    = [6,6]     ! size of global array
subsizes = [3,3]     ! size of sub-region
starts   = [0,0]     ! let's say we're looking at region "0"
                     ! which begins at offset [0,0]

call MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_FORTRAN, MPI_INTEGER, newtype, ierr)
call MPI_Type_commit(newtype, ierr)

这将创建一个类型,该类型仅从全局数组中挑选出区域0".请注意,即使在 Fortran 中,start 参数也是作为距数组开头的偏移量(例如,从 0 开始)而不是索引(例如,从 1 开始)给出的.

This creates a type which picks out just the region "0" from the global array. Note that even in Fortran, the start parameter is given as an offset (eg, 0-based) from the start of the array, not an index (eg, 1-based).

我们现在可以将那条数据发送到另一个处理器

We could send just that piece of data now to another processor

call MPI_Send(global, 1, newtype, dest, tag, MPI_COMM_WORLD, ierr)  ! send region "0"

并且接收进程可以将其接收到本地数组中.请注意,接收过程,如果它只是将它接收到一个 3x3 数组中,则不能将它接收的内容描述为一种新类型;这不再描述内存布局,因为在一行的结尾和下一行的开头之间没有大的跳跃.相反,它只是接收一个由 3*3 = 9 个整数组成的块:

and the receiving process could receive it into a local array. Note that the receiving process, if it's only receiving it into a 3x3 array, can not describe what it's receiving as a type of newtype; that no longer describes the memory layout, because there aren't big skips between the end of one row and the start of the next. Instead, it's just receiving a block of 3*3 = 9 integers:

call MPI_Recv(local, 3*3, MPI_INTEGER, 0, tag, MPI_COMM_WORLD, ierr)

请注意,我们也可以对其他子区域执行此操作,方法是为其他块创建不同的类型(具有不同的起始数组),或者只是从特定块的第一个位置开始发送:

Note that we could do this for other sub-regions, too, either by creating a different type (with different start array) for the other blocks, or just by sending starting from the first location of the particular block:

if (rank == root) then
    call MPI_Send(global(4,1), 1, newtype, 1, tag, MPI_COMM_WORLD, ierr)
    call MPI_Send(global(1,4), 1, newtype, 2, tag, MPI_COMM_WORLD, ierr)
    call MPI_Send(global(4,4), 1, newtype, 3, tag, MPI_COMM_WORLD, ierr)
    local = global(1:3, 1:3)
else
    call MPI_Recv(local, 3*3, MPI_INTEGER, 0, tag, MPI_COMM_WORLD, rstatus, ierr)
endif

既然我们了解了如何指定子区域,那么在使用分散/聚集操作之前只需要讨论一件事,那就是这些类型的大小".我们还不能只对这些类型使用 MPI_Scatter()(甚至 scatterv),因为这些类型的范围是 15 个整数;也就是说,它们开始后的结束位置是 15 个整数——它们的结束位置与下一个块开始的位置不一致,所以我们不能只使用 scatter——它会选择错误的位置开始发送数据到下一个处理器.

Now that we understand how to specify subregions, there's only one more thing to discuss before using scatter/gather operations, and that's the "size" of these types. We couldn't just use MPI_Scatter() (or even scatterv) with these types yet, because these types have an extent of 15 integers; that is, where they end is 15 integers after they start -- and where they end doesn't line up nicely with where the next block begins, so we can't just use scatter - it would pick the wrong place to start sending data to the next processor.

当然,我们可以使用 MPI_Scatterv() 并自己指定位移,这就是我们要做的——除了位移以发送类型大小为单位,这对我们也没有帮助;块从全局数组开始的 (0,3,18,21) 个整数的偏移量开始,并且一个块从它开始的地方结束 15 个整数的事实并不能让我们用整数倍数表达这些位移.

Of course, we could use MPI_Scatterv() and specify the displacements ourselves, and that's what we'll do - except the displacements are in units of the send-type size, and that doesn't help us either; the blocks start at offsets of (0,3,18,21) integers from the start of the global array, and the fact that a block ends 15 integers from where it starts doesn't let us express those displacements in integer multiples at all.

为了解决这个问题,MPI 允许您设置类型的范围以用于这些计算.它不会截断类型;它仅用于在给定最后一个元素的情况下确定下一个元素的开始位置.对于像这样的带有孔的类型,将范围设置为小于内存中到类型实际结尾的距离通常很方便.

To deal with this, MPI lets you set the extent of the type for the purposes of these calculations. It doesn't truncate the type; it's just used for figuring out where the next element starts given the last element. For types like these with holes in them, it's frequently handy to set the extent to be something smaller than the distance in memory to the actual end of the type.

我们可以将范围设置为对我们来说方便的任何内容.我们可以将范围 1 设为整数,然后以整数为单位设置位移.不过,在这种情况下,我喜欢将范围设置为 3 个整数 - 子列的大小 - 这样,块1"在块0"之后立即开始,块3"在块之后立即开始"2".不幸的是,当从2"块跳到3"块时,它的效果并不好,但这无济于事.

We can set the extent to be anything that's convenient to us. We could just make the extent 1 integer, and then set the displacements in units of integers. In this case, though, I like to set the extent to be 3 integers - the size of a sub-column - that way, block "1" starts immediately after block "0", and block "3" starts immediately after block "2". Unfortunately, it doesn't quite work as nicely when jumping from block "2" to block "3", but that can't be helped.

因此,在这种情况下,为了分散子块,我们将执行以下操作:

So to scatter the subblocks in this case, we'd do the following:

integer(kind=MPI_ADDRESS_KIND) :: extent

starts   = [0,0]
sizes    = [6, 6]
subsizes = [3, 3]

call MPI_Type_create_subarray(2, sizes, subsizes, starts,        &
                              MPI_ORDER_FORTRAN, MPI_INTEGER,  &
                              newtype, ierr)
call MPI_Type_size(MPI_INTEGER, intsize, ierr)
extent = 3*intsize
call MPI_Type_create_resized(newtype, 0, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)

在这里,我们创建了与以前相同的块类型,但我们调整了它的大小;我们没有改变类型开始"(0)的位置,但我们改变了它结束"的位置(3 个整数).我们之前没有提到这一点,但是需要 MPI_Type_commit 才能使用该类型;但您只需要提交您实际使用的最终类型,而不需要任何中间步骤.完成后,您使用 MPI_Type_free 释放已提交的类型.

Here we've created the same block type as before, but we've resized it; we haven't changed where the type "starts" (the 0) but we've changed where it "ends" (3 integers). We didn't mention this before, but the MPI_Type_commit is required to be able to use the type; but you only need to commit the final type you actually use, not any intermediate steps. You use MPI_Type_free to free the committed type when you're done.

所以现在,最后,我们可以分散块了:上面的数据操作有点复杂,但是一旦完成,scatterv 看起来就像以前一样:

So now, finally, we can scatterv the blocks: the data manipulations above are a little complicated, but once it's done, the scatterv looks just like before:

counts = 1          ! we will send one of these new types to everyone
displs = [0,1,6,7]  ! the starting point of everyone's data
                    ! in the global array, in block extents

call MPI_Scatterv(global, counts, displs, & ! proc i gets counts(i) types from displs(i)
        resizedtype,                      &
        local, 3*3, MPI_INTEGER,          & ! I'm receiving 3*3 int
        root, MPI_COMM_WORLD, ierr)         !... from (root, MPI_COMM_WORLD)

现在我们已经完成了,在对 scatter、gather 和 MPI 派生类型进行了一些了解之后.

And now we're done, after a little tour of scatter, gather, and MPI derived types.

下面的示例代码显示了收集和分散操作以及字符数组.运行程序:

An example code which shows both the gather and the scatter operation, with character arrays, follows. Running the program:

$ mpirun -np 4 ./scatter2d
 global array is:
 000222
 000222
 000222
 111333
 111333
 111333
 Rank            0  received:
 000
 000
 000
 Rank            1  received:
 111
 111
 111
 Rank            2  received:
 222
 222
 222
 Rank            3  received:
 333
 333
 333
 Rank            0  sending:
 111
 111
 111
 Rank            1  sending:
 222
 222
 222
 Rank            2  sending:
 333
 333
 333
 Rank            3  sending:
 444
 444
 444
  Root received:
 111333
 111333
 111333
 222444
 222444
 222444

代码如下:

program scatter
    use mpi
    implicit none

    integer, parameter :: gridsize = 6    ! size of array
    integer, parameter :: procgridsize = 2 ! size of process grid
    character, allocatable, dimension (:,:) :: global, local
    integer, dimension(procgridsize**2)   :: counts, displs
    integer, parameter    :: root = 0
    integer :: rank, comsize
    integer :: localsize
    integer :: i, j, row, col, ierr, p, charsize
    integer, dimension(2) :: sizes, subsizes, starts

    integer :: newtype, resizedtype
    integer, parameter :: tag = 1
    integer, dimension(MPI_STATUS_SIZE) :: rstatus
    integer(kind=MPI_ADDRESS_KIND) :: extent, begin

    call MPI_Init(ierr)
    call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
    call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

    if (comsize /= procgridsize**2) then
        if (rank == root) then
            print *, 'Only works with np = ', procgridsize**2, ' for now.'
        endif
        call MPI_Finalize(ierr)
        stop
    endif

    localsize = gridsize/procgridsize
    allocate( local(localsize, localsize) )
    if (rank == root) then
        allocate( global(gridsize, gridsize) )
        forall( col=1:procgridsize, row=1:procgridsize )
            global((row-1)*localsize+1:row*localsize, &
                   (col-1)*localsize+1:col*localsize) = &
                    achar(ichar('0')+(row-1)+(col-1)*procgridsize)
        end forall

        print *, 'global array is: '
        do i=1,gridsize
            print *, global(i,:)
        enddo
    endif
    starts   = [0,0]
    sizes    = [gridsize, gridsize]
    subsizes = [localsize, localsize]

    call MPI_Type_create_subarray(2, sizes, subsizes, starts,        &
                                  MPI_ORDER_FORTRAN, MPI_CHARACTER,  &
                                  newtype, ierr)
    call MPI_Type_size(MPI_CHARACTER, charsize, ierr)
    extent = localsize*charsize
    begin  = 0
    call MPI_Type_create_resized(newtype, begin, extent, resizedtype, ierr)
    call MPI_Type_commit(resizedtype, ierr)

    counts = 1          ! we will send one of these new types to everyone
    forall( col=1:procgridsize, row=1:procgridsize )
       displs(1+(row-1)+procgridsize*(col-1)) = (row-1) + localsize*procgridsize*(col-1)
    endforall

    call MPI_Scatterv(global, counts, displs,   & ! proc i gets counts(i) types from displs(i)
            resizedtype,                        &
            local, localsize**2, MPI_CHARACTER, & ! I'm receiving localsize**2 chars
            root, MPI_COMM_WORLD, ierr)           !... from (root, MPI_COMM_WORLD)

    do p=1, comsize
        if (rank == p-1) then
            print *, 'Rank ', rank, ' received: '
            do i=1, localsize
                print *, local(i,:)
            enddo
        endif
        call MPI_Barrier(MPI_COMM_WORLD, ierr)
    enddo

    local = achar( ichar(local) + 1 )

    do p=1, comsize
        if (rank == p-1) then
            print *, 'Rank ', rank, ' sending: '
            do i=1, localsize
                print *, local(i,:)
            enddo
        endif
        call MPI_Barrier(MPI_COMM_WORLD, ierr)
    enddo

    call MPI_Gatherv( local, localsize**2, MPI_CHARACTER, & ! I'm sending localsize**2 chars
                      global, counts, displs, resizedtype,&
                      root, MPI_COMM_WORLD, ierr)

    if (rank == root) then
        print *, ' Root received: '
        do i=1,gridsize
            print *, global(i,:)
        enddo
    endif

    call MPI_Type_free(newtype,ierr)
    if (rank == root) deallocate(global)
    deallocate(local)
    call MPI_Finalize(ierr)

end program scatter

这就是一般的解决方案.对于您的特定情况,我们只是按行附加,我们不需要 Gatherv,我们可以只使用聚集,因为在这种情况下,所有位移都是相同的 - 之前,在 2d 块情况下,我们有一个位移向下",然后在您越过"到下一列块时跳入该位移.在这里,位移始终是前一个范围的一个范围,因此我们不需要明确给出位移.所以最终的代码如下:

So that's the general solution. For your particular case, where we are just appending by rows, we don't need a Gatherv, we can just use a gather, because in this case, all of the displacements are the same -- before, in the 2d block case we had one displacement going 'down', and then jumps in that displacement as you went 'across' to the next column of blocks. Here, the displacement is always one extent from the previous one, so we don't need to give displacements explicitly. So a final code looks like:

program testmpi
use mpi
    implicit none
    integer, dimension(:,:), allocatable :: send, recv
    integer, parameter :: nsendrows = 2, nsendcols = 3
    integer, parameter :: root = 0
    integer :: ierror, my_rank, comsize, i, j, ierr
    integer :: blocktype, resizedtype
    integer, dimension(2) :: starts, sizes, subsizes
    integer (kind=MPI_Address_kind) :: start, extent
    integer :: intsize

    call MPI_Init(ierror)
    call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
    call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierror)

    allocate( send(nsendrows, nsendcols) )

    send = my_rank

    if (my_rank==root) then
        ! we're going to append the local arrays
        ! as groups of send rows
        allocate( recv(nsendrows*comsize, nsendcols) )
    endif

    ! describe what these subblocks look like inside the full concatenated array
    sizes    = [ nsendrows*comsize, nsendcols ]
    subsizes = [ nsendrows, nsendcols ]
    starts   = [ 0, 0 ]

    call MPI_Type_create_subarray( 2, sizes, subsizes, starts,     &
                                   MPI_ORDER_FORTRAN, MPI_INTEGER, &
                                   blocktype, ierr)

    start = 0
    call MPI_Type_size(MPI_INTEGER, intsize, ierr)
    extent = intsize * nsendrows

    call MPI_Type_create_resized(blocktype, start, extent, resizedtype, ierr)
    call MPI_Type_commit(resizedtype, ierr)

    call MPI_Gather( send, nsendrows*nsendcols, MPI_INTEGER, &  ! everyone send 3*2 ints
                     recv, 1, resizedtype,                   &  ! root gets 1 resized type from everyone
                     root, MPI_COMM_WORLD, ierr)

    if (my_rank==0) then
    print*,'<><><><><>recv'
    do i=1,nsendrows*comsize
        print*,(recv(i,j),j=1,nsendcols)
    enddo
    endif
    call MPI_Finalize(ierror)

end program testmpi

用 3 个进程运行它会得到:

Running this with 3 processes gives:

$ mpirun -np 3 ./testmpi
 <><><><><>recv
           0           0           0
           0           0           0
           1           1           1
           1           1           1
           2           2           2
           2           2           2

这篇关于使用 MPI_Gather 在 Fortran 中发送二维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 00:40