本文介绍了检查相邻从属进程是否在MPI中结束的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的MPI程序中,我想向相邻进程发送和接收信息.但是,如果进程结束并且不发送任何消息,则其邻居将永远等待.我该如何解决这个问题?这是我正在尝试做的事情:

In my MPI program, I want to send and receive information to adjacent processes. But if a process ends and doesn't send anything, its neighbors will wait forever. How can I resolve this issue? Here is what I am trying to do:

if (rank == 0) {
    // don't do anything until all slaves are done
} else {
    while (condition) {
        // send info to rank-1 and rank+1
        // if can receive info from rank-1, receive it, store received info locally
        // if cannot receive info from rank-1, use locally stored info
        // do the same for process rank+1
        // MPI_Barrier(slaves); (wait for other slaves to finish this iteration)
    }
}

我当然要检查边界.当进程号为1时,我将不检查等级1;当进程号为最后一个时,我将不检查等级+1.但是我该如何实现呢?我应该再包装一遍吗?我很困惑.

I am going to check the boundaries of course. I won't check rank-1 when process number is 1 and I won't check rank+1 when process is the last one. But how can I achieve this? Should I wrap it with another while? I am confused.

推荐答案

我首先要说的是,MPI最初并不是为您的用例而设计的.通常,MPI应用程序全部一起开始,并且全部一起结束.但是,并非所有的应用程序都适合该模型,所以不要失去希望!

I'd start by saying that MPI wasn't originally designed with your use case in mind. In general, MPI applications all start together and all end together. Not all applications fit into this model though, so don't lose hope!

有两种相对容易的方法,可能还有成千上万种困难的方法:

There are two relatively easy ways of doing this and probably thousands of hard ones:

  1. 使用RMA在邻居上设置标志.

正如注释中所指出的,您可以设置一个微小的RMA窗口,该窗口向每个邻居公开一个值.进程完成工作后,可以在每个邻居上执行MPI_Put表示已完成,然后执行MPI_Finalize.在向邻居发送/从邻居接收数据之前,请检查该标志是否已设置.

As has been pointed out in the comments, you can set up a tiny RMA window that exposes a single value to each neighbor. When a process is done working, it can do an MPI_Put on each neighbor to indicate that it's done and then MPI_Finalize. Before sending/receiving data to/from the neighbors, check to see if the flag is set.

  1. 在检测到关机时使用特殊标签.

在发送和接收消息时,标签值通常会被忽略,但这是使用它的好时机.您的应用程序中可以有两个标志.第一个(我们将其称为DATA)仅表示此消息包含数据,您可以照常进行处理.第二个(DONE)表示该过程已完成并且正在退出应用程序.接收消息时,必须将tag的值从正在使用的值更改为MPI_ANY_TAG.然后,在收到消息后,检查它是哪个标签.如果是DONE,请停止与此进程进行通信.

The tag value often gets ignored when sending and receiving messages, but this is a great time to use it. You can have two flags in your application. The first (we'll call it DATA) just indicates that this message contains data and you can process it as normal. The second (DONE) indicates that the process is done and is leaving the application. When receiving messages, you'll have to change the value for tag from whatever you're using to MPI_ANY_TAG. Then, when the message is received, check which tag it is. If it's DONE, then stop communicating with that process.

但是,您发布的伪代码还有另一个问题.如果希望在每次迭代结束时执行MPI_Barrier,则不能让进程提早离开.发生这种情况时,MPI_Barrier将挂起.不幸的是,您没有什么可以做的.但是,鉴于您发布的代码,我不确定该障碍是否确实必要.在我看来,唯一的循环间依存关系是在相邻进程之间.如果是这种情况,则发送和接收将完成所有必要的同步.

There's another problem with the pseudo-code that you posted however. If you expect to perform an MPI_Barrier at the end of every iteration, you can't have processes leaving early. When that happens, the MPI_Barrier will hang. There's not much you can do to avoid this unfortunately. However, given the code you posted, I'm not sure that the barrier is really necessary. It seems to me that the only inter-loop dependency is between neighboring processes. If that's the case, then the sends and receives will accomplish all of the necessary synchronization.

如果仍然需要一种方法来跟踪所有等级的完成时间,则可以让每个进程在离开时提醒一个等级(例如等级0).当等级0检测到每个人都完成了任务时,就可以退出.或者,如果您想在完成其他一些过程后离开,则可以让等级0使用上面的特殊标签向所有其他等级发送消息(但添加MPI_ANY_SOURCE以便从等级0接收).

If you still need a way to track when all of the ranks are done, you can have each process alert a single rank (say rank 0) when it leaves. When rank 0 detects that everyone is done, it can just exit. Or, if you want to leave after some other number of processes is done, you can have rank 0 send out a message to all other ranks with a special tag like above (but add MPI_ANY_SOURCE so you can receive from rank 0).

这篇关于检查相邻从属进程是否在MPI中结束的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 11:34
查看更多