c++ - 我发现自己无法理解MPI_Gatherv的参数 “recvcounts”

MPI_Gatherv是MPI的接口(interface)，如下所示:

int MPI_Gatherv(
    void* sendbuf,
    int sendcount,
    MPI_Datatype sendtype,
    void* recvbuf,
    int *recvcounts,
    int *displs,
    MPI_Datatype recvtype,
    int root,
    MPI_Comm comm)

“recvcounts”的类型为“int *”，以便我们可以分别设置每个进程要接收的项目数；但是我发现不可能做到这一点:

当recvcounts [i]
当recvcounts [i]> sendcount时，程序将崩溃，错误消息如下:

Fatal error in PMPI_Gatherv: Message truncated, error stack:
PMPI_Gatherv(386).....: MPI_Gatherv failed(sbuf=0012FD34, scount=2, MPI_CHAR, rbuf=0012FCC8, rcnts=0012FB30, displs=0012F998, MPI_CHAR, root=0, MPI_COMM_WORLD) failed
MPIR_Gatherv_impl(199):
MPIR_Gatherv(103).....:
MPIR_Localcopy(332)...: Message truncated; 2 bytes received but buffer size is 1

因此，这意味着根必须从每个进程接收固定数量的项，并且参数recvcount是没有意义的吗？还是我误会了某事？

这是我的代码:

#include <mpi.h>
#include <iostream>

int main(int argc, char* argv[])
{
    MPI_Init(&argc, &argv);

    int n, id;
    MPI_Comm_size(MPI_COMM_WORLD, &n);
    MPI_Comm_rank(MPI_COMM_WORLD, &id);

    char x[100], y[100];
    memset(x, '0' + id, sizeof(x));
    memset(y, '%', sizeof(y));
    int cnts[100], offs[100] = {0};
    for (int i = 0; i < n; i++)
    {
        cnts[i] = i + 1;
        if (i > 0)
        {
            offs[i] = offs[i - 1] + cnts[i - 1];
        }
    }
    MPI_Gatherv(x, 1, MPI_CHAR, y, cnts, offs, MPI_CHAR, 0, MPI_COMM_WORLD);    // receive only 1 item from each process
    //MPI_Gatherv(x, 2, MPI_CHAR, y, cnts, offs, MPI_CHAR, 0, MPI_COMM_WORLD);    // crash
    if (id == 0)
    {
        printf("Gatherv:\n");
        for (int i = 0; i < 100; i++)
        {
            printf("%c ", y[i]);
        }
        printf("\n");
    }

    MPI_Finalize();

    return 0;
}

最佳答案

正如@Alexander Molodih指出的那样，sendcount = recvcount，sendtype = recvtype将始终有效；但是，当您开始创建自己的MPI类型时，通常会有不同的发送和接收类型，这就是为什么recvcount可能不同于sendcount的原因。

例如，看看最近询问的MPI partition matrix into blocks;二维数组被分解为块并分散。那里的发送类型(只需要从全局数组中选择必要的数据)和接收类型(只是一个连续的数据块)是不同的，计数也不同。

这就是发送和接收类型和计数不同的普遍原因，例如sendrecv，gather / scatter或同时发生发送和接收的任何其他操作。

在您的collectv情况下，每个进程可能具有其自己的不同发送计数，但是recvcount []数组必须是所有这些计数的列表，以便接收器可以正确放置接收到的数据。如果您事先不知道这些值(每个等级只知道自己的计数cnts[id])，则可以先进行收集:

MPI_Gather(&(cnts[id]), 1, MPI_INT, cnts, 1, MPI_INT, 0, MPI_COMM_WORLD):
for (int i = 1; i < n; i++) {
    offs[i] = offs[i - 1] + cnts[i - 1];
}
MPI_Gatherv(x, cnts[id], MPI_CHAR, y, cnts, offs, MPI_CHAR, 0, MPI_COMM_WORLD);

关于c++ - 我发现自己无法理解MPI_Gatherv的参数 “recvcounts”，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/7495714/