我想在进程之间传输一个结构,为此我试图创建一个MPI结构。代码用于蚁群优化(ACO)算法。
C结构的头文件包含:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <math.h>
#include <mpi.h>
/* Constants */
#define NUM_CITIES 100 // Number of cities
//among others
typedef struct {
int city, next_city, tabu[NUM_CITIES], path[NUM_CITIES], path_index;
double tour_distance;
} ACO_Ant;
我试图按照this thread中的建议构建代码。
程序代码:
int main(int argc, char *argv[])
{
MPI_Datatype MPI_TABU, MPI_PATH, MPI_ANT;
// Initialize MPI
MPI_Init(&argc, &argv);
//Determines the size (&procs) of the group associated with a communicator (MPI_COMM_WORLD)
MPI_Comm_size(MPI_COMM_WORLD, &procs);
//Determines the rank (&rank) of the calling process in the communicator (MPI_COMM_WORLD)
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Type_contiguous(NUM_CITIES, MPI_INT, &MPI_TABU);
MPI_Type_contiguous(NUM_CITIES, MPI_INT, &MPI_PATH);
MPI_Type_commit(&MPI_TABU);
MPI_Type_commit(&MPI_PATH);
// Create ant struct
//int city, next_city, tabu[NUM_CITIES], path[NUM_CITIES], path_index;
//double tour_distance;
int blocklengths[6] = {1,1, NUM_CITIES, NUM_CITIES, 1, 1};
MPI_Datatype types[6] = {MPI_INT, MPI_INT, MPI_TABU, MPI_PATH, MPI_INT, MPI_DOUBLE};
MPI_Aint offsets[6] = { offsetof( ACO_Ant, city ), offsetof( ACO_Ant, next_city), offsetof( ACO_Ant, tabu), offsetof( ACO_Ant, path ), offsetof( ACO_Ant, path_index ), offsetof( ACO_Ant, tour_distance )};
MPI_Datatype tmp_type;
MPI_Aint lb, extent;
MPI_Type_create_struct(6, blocklengths, offsets, types, &tmp_type);
MPI_Type_get_extent( tmp_type, &lb, &extent );
//Tried all of these
MPI_Type_create_resized( tmp_type, lb, extent, &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(MPI_ANT), &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(ant), &MPI_ANT );
MPI_Type_commit(&MPI_ANT);
printf("Return: %d\n" , MPI_Bcast(ant, NUM_ANTS, MPI_ANT, 0, MPI_COMM_WORLD));
}
但一旦程序到达MPI_Bcast命令,它就会崩溃,错误代码为11,我认为MPI_ERR_TOPOLOGYas per this manual.是segfault(信号11)。
我也不确定为什么原始程序的作者-
有人能解释他们为什么
MPI_Aint displacements[3];
MPI_Datatype typelist[3];
当结构有两个变量时,大小为3?
int block_lengths[2];
代码:
void ACO_Build_best(ACO_Best_tour *tour, MPI_Datatype *mpi_type /*out*/)
{
int block_lengths[2];
MPI_Aint displacements[3];
MPI_Datatype typelist[3];
MPI_Aint start_address;
MPI_Aint address;
block_lengths[0] = 1;
block_lengths[1] = NUM_CITIES;
typelist[0] = MPI_DOUBLE;
typelist[1] = MPI_INT;
displacements[0] = 0;
MPI_Address(&(tour->distance), &start_address);
MPI_Address(tour->path, &address);
displacements[1] = address - start_address;
MPI_Type_struct(2, block_lengths, displacements, typelist, mpi_type);
MPI_Type_commit(mpi_type);
}
任何帮助都将不胜感激。
编辑:有助于解决问题,而不是稍微有用的StackOverflow术语
最佳答案
这部分是错误的:
int blocklengths[6] = {1,1, NUM_CITIES, NUM_CITIES, 1, 1};
MPI_Datatype types[6] = {MPI_INT, MPI_INT, MPI_TABU, MPI_PATH, MPI_INT, MPI_DOUBLE};
MPI_Aint offsets[6] = { offsetof( ACO_Ant, city ), offsetof( ACO_Ant, next_city), offsetof( ACO_Ant, tabu), offsetof( ACO_Ant, path ), offsetof( ACO_Ant, path_index ), offsetof( ACO_Ant, tour_distance )};
MPI_TABU
和MPI_PATH
数据类型已经覆盖了NUM_CITIES
元素。当您将相应的块大小指定为NUM_CITIES
时,生成的数据类型将尝试访问NUM_CITIES * NUM_CITIES
元素,这可能会导致segfault(信号11)。将
blocklengths
的所有元素设置为1
或用MPI_TABU
替换MPI_PATH
数组中的types
和MPI_INT
。这一部分也错了:
MPI_Type_create_struct(6, blocklengths, offsets, types, &tmp_type);
MPI_Type_get_extent( tmp_type, &lb, &extent );
//Tried all of these
MPI_Type_create_resized( tmp_type, lb, extent, &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(MPI_ANT), &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(ant), &MPI_ANT );
MPI_Type_commit(&MPI_ANT);
用
MPI_Type_create_resized
返回的值调用MPI_Type_get_extent
是没有意义的,因为它只是复制类型而没有实际调整它的大小。使用sizeof(MPI_ANT)
是错误的,因为MPI_ANT
不是C类型,而是MPI句柄,它是整数索引或指针(依赖于实现)。如果sizeof(ant)
是ant
类型,但给定您调用ACO_Ant
,则MPI_Bcast(ant, NUM_ANTS, ...)
可以是指针,在这种情况下ant
只是指针大小,或者它是数组,在这种情况下sizeof(ant)
比它必须大一倍。正确的做法是:MPI_Type_create_resized(tmp_type, 0, sizeof(ACO_Ant), &ant_type);
MPI_Type_commit(&ant_type);
请不要在自己的变量或函数名中使用
sizeof(ant)
作为前缀。这使得代码不可读,并且具有误导性(“这是预定义的MPI数据类型还是用户定义的数据类型?”)至于最后一个问题,作者可能有一个不同的结构。只要使用正确数量的有效元素调用
NUM_ANTS
,就不会停止使用较大的数组。注意:您不必提交从未在通信调用中直接使用的MPI数据类型。也就是说,这两行是不必要的:
MPI_Type_commit(&MPI_TABU);
MPI_Type_commit(&MPI_PATH);
关于c - 创建MPI结构时出现问题,调用MPI_Bcast时出现错误11,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/55678073/