c - 等到奴隶叫MPI_finalize

我对以下代码有疑问:

主:

#include <iostream>
using namespace std;

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define PB1 1
#define PB2 1

int main (int argc, char *argv[])
{
  int np[2] = { 2, 1 }, errcodes[2];
  MPI_Comm parentcomm, intercomm;
  char *cmds[2] = { "./slave", "./slave" };
  MPI_Info infos[2] = { MPI_INFO_NULL, MPI_INFO_NULL };
  MPI_Init(NULL, NULL);

#if PB1
  for(int i = 0 ; i<2 ; i++)
    {
      MPI_Info_create(&infos[i]);
      char hostname[] = "localhost";
      MPI_Info_set(infos[i], "host", hostname);
    }
#endif

  MPI_Comm_spawn_multiple(2, cmds, MPI_ARGVS_NULL, np, infos, 0, MPI_COMM_WORLD, &intercomm, errcodes);
  printf("c Creation of the workers finished\n");

#if PB2
  sleep(1);
#endif

  MPI_Comm_spawn_multiple(2, cmds, MPI_ARGVS_NULL, np, infos, 0, MPI_COMM_WORLD, &intercomm, errcodes);
  printf("c Creation of the workers finished\n");

  MPI_Finalize();
  return 0;
}

奴隶:

#include "mpi.h"
#include <stdio.h>

using namespace std;

int main( int argc, char *argv[])
{
  int rank;
  MPI_Init(0, NULL);

  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  printf("rank =  %d\n", rank);

  MPI_Finalize();
  return 0;
}

我不知道为什么当我运行mpirun -np 1 ./master时，当我将PB1和PB2设置为1时，程序会以以下消息停止(当将它们设置为0时，效果很好):

系统中没有足够的插槽来满足两个
应用程序请求的插槽:./slave
为您的应用请求更少的插槽，或提供更多的插槽
用来。

例如，当我将PB2设置为0时，程序运行良好。因此，我认为这是因为MPI_finalize尚未完成其工作...

我用谷歌搜索，但是没有找到任何答案。我尝试了各种方法，例如:调用MPI_comm_disconnect，添加一个障碍，...但是没有任何效果。

我在Ubuntu(15.10)上工作，并使用OpenMPI版本1.10.2。

最佳答案

在主控板上调用MPI_Finalize之前，第一组从属信息上的MPI_Finalize不会完成。 MPI_Finalize是所有连接过程的集合。您可以通过在调用MPI_Finalize之前从互连器上手动断开第一批药膏来解决此问题。这样，从站实际上将完成并退出-释放新一批从站的“插槽”。不幸的是，我没有看到一种标准的方法来真正确保从属设备在释放其插槽的意义上完成，因为这是实现定义的。不幸的是，OpenMPI冻结在MPI_Comm_spawn_multiple中而不返回错误，这一事实很不幸，您可能会认为这是一个错误。无论如何，这是您可以做什么的草稿:

在主服务器中，每次都是由其从服务器完成的:

MPI_Barrier(&intercomm); // Make sure master and slaves are somewhat synchronized
MPI_Comm_disconnect(&intercomm);
sleep(1); // This is the ugly unreliable way to give the slaves some time to shut down

奴隶:

MPI_Comm parent;
MPI_Comm_get_parent(&parent); // you should have that already
MPI_Comm_disconnect(&parent);
MPI_Finalize();

但是，您仍然需要确保OpenMPI知道应为整个应用程序保留多少个插槽(universe_size)。您可以使用hostfile例如:

localhost slots=4

然后mpirun -np 1 ./master。

现在这还不是很漂亮，我认为您动态生成MPI工作者的方法并不是MPI真正的目的。该标准可能会支持它，但是如果实现很困难，那对您没有帮助。但是，关于您打算如何与外部流程进行通信以提供更清洁，更理想的解决方案的信息不足。

最后一句话:请检查MPI函数的返回码。特别是MPI_Comm_spawn_multiple。