本文介绍了MPI_Gatherv内存问题(MPI + C)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为我上一个问题,我修改了可变数量内核的代码.但是,在我的代码中实现Gatherv的方式似乎不可靠.一旦在3-4中运行,由于内存泄漏,收集缓冲区中的结束序列最终会被破坏.示例代码:

As a continuation to my previous question, I have modified the code for variable number of kernels. However, the way Gatherv is implemented in my code seems to be unreliable. Once in 3-4 runs the end sequence in the collecting buffer ends up being corrupted, it seems like, due to the memory leakage. Sample code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>

int main (int argc, char *argv[]) {

MPI_Init(&argc, &argv);
int world_size,*sendarray;
int rank, *rbuf=NULL, count,total_counts=0;
int *displs=NULL,i,*rcounts=NULL;

MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);

if(rank==0){
    displs = malloc((world_size+1)*sizeof(int));
    for(int i=1;i<=world_size; i++)displs[i]=0;
    rcounts=malloc(world_size*sizeof(int));

    sendarray=malloc(1*sizeof(int));
    for(int i=0;i<1;i++)sendarray[i]=1111;
    count=1;
}

if(rank!=0){
    int size=rank*2;
    sendarray=malloc(size*sizeof(int));
    for(int i=0;i<size;i++)sendarray[i]=rank;
    count=size;
}

MPI_Barrier(MPI_COMM_WORLD);

MPI_Gather(&count,1,MPI_INT,rcounts,1,MPI_INT,0,MPI_COMM_WORLD);

MPI_Barrier(MPI_COMM_WORLD);

if(rank==0){
    displs[0]=0;
    for(int i=1;i<=world_size; i++){
        for(int j=0; j<i; j++)displs[i]+=rcounts[j];
    }

    total_counts=0;
    for(int i=0;i<world_size;i++)total_counts+=rcounts[i];
    rbuf = malloc(10*sizeof(int));
}

MPI_Gatherv(sendarray, count, MPI_INT, rbuf, rcounts,
            displs, MPI_INT, 0, MPI_COMM_WORLD);

if(rank==0){
    int SIZE=total_counts;
    for(int i=0;i<SIZE;i++)printf("(%d) %d ",i, rbuf[i]);

    free(rbuf);
    free(displs);
    free(rcounts);
}

if(rank!=0)free(sendarray);
MPI_Finalize();

}

为什么会这样,有没有办法解决?

Why is this happening and is there a way to fix it?

在我的实际项目中,这变得更加糟糕.每个发送缓冲区包含150个双打.接收缓冲区非常脏,有时我会收到退出代码为6或11的床终止错误.

This becomes much worse in my actual project. Each sending buffer contains 150 doubles. The receiving buffer gets very dirty and sometimes I get an error of bed termination with exit code 6 or 11.

任何人都至少可以重现我的错误吗?

Can anyone at least reproduce my errors?

我的猜测:我正在分别在每个线程上为sendarray分配内存.如果我的虚拟机与硬件是一对一的,那么可能就不会有这样的问题.但是我只有2个核心,并且要为4个或更多核心运行一个进程.可能是原因吗?

My guess: I am allocating memory for sendarray on each thread separately. If my virtual machine was 1-to-1 to the hardware, then, probably, there would be no such problem. But I have only 2 cores and run a process for 4 or more. Could it be the reason?

推荐答案

更改此行:

rbuf = malloc(10*sizeof(int));

收件人:

rbuf = malloc(total_counts*sizeof(int));

作为旁注:每个MPI进程都存在于自己的进程地址空间中,除非通过MPI_XXX函数显式传递的错误数据会导致未定义的行为,否则它们无法踩踏彼此的数据.

As a side note: each MPI process exists in its own process address space and they cannot stomp on eachothers data except through erroneous data explicitly passed through the MPI_XXX functions, which results in undefined behavior.

这篇关于MPI_Gatherv内存问题(MPI + C)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!