我正在用MPI库用C ++编写程序。发生死锁,只有一个节点有效!我没有使用发送或接收集体操作,而只使用了两个集体功能(MPI_AllreduceMPI_Bcast)。
如果有一个节点在等待其他节点发送或接收消息,我实际上不了解导致此死锁的原因。

void ParaStochSimulator::first_reacsimulator() {
    SimulateSingleRun();
}

double ParaStochSimulator::deterMinTau() {
    //calcualte minimum tau for this process
    l_nLocalMinTau = calc_tau(); //min tau for each node
    MPI_Allreduce(&l_nLocalMinTau, &l_nGlobalMinTau, 1, MPI_DOUBLE, MPI_MIN, MPI_COMM_WORLD);
    //min tau for all nodes
    //check if I have the min value
    if (l_nLocalMinTau <= l_nGlobalMinTau && m_nCurrentTime < m_nOutputEndPoint) {
        FireTransition(m_nMinTransPos);
        CalculateAllHazardValues();
    }
    return l_nGlobalMinTau;
}

void ParaStochSimulator::SimulateSingleRun() {
    //prepare a run
    PrepareRun();
    while ((m_nCurrentTime < m_nOutputEndPoint) && IsSimulationRunning()) {
        deterMinTau();
        if (mnprocess_id == 0) { //master
            SimulateSingleStep();
            std::cout << "current time:*****" << m_nCurrentTime << std::endl;
            broad_casting(m_nMinTransPos);
            MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD);
            //std::cout << "size of mani place :" << l_nMinplacesPos.size() << std::endl;
        }
    }
    MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD);
    PostProcessRun();
}

最佳答案

当您的“主”进程正在执行MPI_Bcast时,其他所有进程仍在运行循环,然后输入deterMinTau,然后执行MPI_Allreduce。

这是一个僵局,因为您的主节点正在等待所有节点执行Brodcast,而所有其他节点都在等待主节点执行Reduce。

我相信您正在寻找的是:

void ParaStochSimulator::SimulateSingleRun() {
    //prepare a run
    PrepareRun();
    while ((m_nCurrentTime < m_nOutputEndPoint) && IsSimulationRunning()) {
        //All the nodes reduce tau at the same time
        deterMinTau();
        if (mnprocess_id == 0) { //master
            SimulateSingleStep();
            std::cout << "current time:*****" << m_nCurrentTime << std::endl;
            broad_casting(m_nMinTransPos);
            //Removed bordcast for master here
        }
        //All the nodes broadcast at every loop iteration
        MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD);
    }
    PostProcessRun();
}

关于c++ - 具有集体功能的MPI僵局,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43807532/

10-12 14:55