问题描述
当我尝试在四个处理器上运行问题时,会收到以下序列的错误。我使用的MPI命令是mpirun -np 4
我为发布错误消息道歉(主要是由于缺乏关于解密信息的知识)。感谢您对以下内容的意见:
-
错误讯息是什么意思?人们在什么时候接受它?这是因为系统内存(硬件)还是由于通信错误(与MPI_Isend / Irecv?有关的内容,即软件问题)。
-
最后,我如何解决这个问题?
感谢!
如下:: - -
* 请注意:只有当时间较长时才会收到此错误 *。
计算数据所需的时间较小(即,与1000个时间步长相比,时间步长为300个时),代码计算得很好
作业:
MPI_Irecv中的致命错误:其他MPI错误,错误堆栈:
MPI_Irecv(143):MPI_Irecv (buf = 0x8294a60,count = 48,MPI_DOUBLE,src = 2,tag = -1,MPI_COMM_WORLD,request = 0xffffd68c)失败
MPID_Irecv(64)
中止作业:
MPI_Irecv中的致命错误:其他MPI错误,错误堆栈:
MPI_Irecv(143):MPI_Irecv(buf = 0x8295080,count = 48,MPI_DOUBLE,src = 3,tag = -1,MPI_COMM_WORLD,request = 0xffffd690)失败
MPID_Irecv(64):内存不足
中止作业:
MPI_Isend中的致命错误:内部MPI错误! :
MPI_Isend(142):MPI_Isend(buf = 0x8295208,count = 48,MPI_DOUBLE,dest = 3,tag = 0,MPI_COMM_WORLD,request = 0xffffd678) p>
(未知)():内部MPI错误!
中止作业:
MPI_Irecv中的致命错误: MPI_Irecv(buf = 0x82959b0,count = 48,MPI_DOUBLE,src = 2,tag = -1,MPI_COMM_WORLD, request = 0xffffd678)失败
MPID_Irecv(64):内存不足
在作业1 myocyte80_37021所有等级的集体中止
退出等级3的状态:返回代码13
在作业1中排名1 myocyte80_37021导致所有等级的集体中止
exit状态1:返回代码13
( SOURCE CODE )
头文件
变量声明
TOTAL TIME =
...
...
double * A = new double [Rows];
double * AA = new double [Rows];
double * B = new double [Rows;
double * BB = new double [Rows];
....
....
int Rmpi;
int my_rank;
int p;
int source;
int dest;
int tag = 0;
函数声明
int main(int argc,char * argv [])
{
MPI_Status status [8];
MPI_Request request [8];
MPI_Init(& argc,& argv);
MPI_Comm_size(MPI_COMM_WORLD,& p);
MPI_Comm_rank(MPI_COMM_WORLD,& my_rank);
//问题特定属性。 VARY基于节点
if(Flag = 1)
{
if(my_rank == 0)
{
定义特殊元素的边界组织(Rows x Column)
}
if(my_rank == 2)
..
if(my_rank == 3)
..
if (my_rank == 4)
..
}
//基于节点的初始条件也是变化的
for(Columns = 0; Columns< 48; i ++) //正向
{
for(Rows = 0; Rows {
if(Flag = 1)
{
if(my_rank == 0)
{
元素的初始条件
}
if(my_rank == 1)// MPI
{
}
..
..
..
//模拟开始
while(t [0] [0]< TOTAL TIME)
{
for(Columns = 0; Columns ++)//正方向
{
for(Rows = 0; Rows ++)//横向
{
//基于节点的更多属性
if(my_rank == 0)
{
if(FLAG = 1)
{
条件1
}
else
{
条件2
}
}
if(my_rank = 1)
... 。
....
...
//评估函数(微分方程)
函数1();
函数2();
...
...
//基于微分方程的输出,不同节点估计变量值。从
的问题是最近邻,角和边有不同的邻居/边界
条件
if(my_rank == 0)
{
如果(Row / Column at bottom_left)
{
Variables =
}
if(右/右)
{
Variables =
}
}
...
...
//跟踪行和列中每个元素的时间。时间针对某个
元素更新。
t [Column] [Row] = t [Column] [Row] + dt;
}
} //行和列的结束
// MPI实现。在每一个时间步结束,节点与最近的邻居通信
//第一步是使用估计为
以上的值填充数组(列,++)
{
for Rows,++)
{
if(my_rank == 0)
{
//加载(Row x Column)到变量的边。该一维阵列数据
与其最近的邻居共享以用于在下一时间步骤处的计算。
if(Column == 47)
{
A [i] = V [Column] [Row];
...
}
if(Row == 47)
{
B [i] = V [Column] [Row];
}
}
...
...
//无阻塞MPI发送RECV与下邻域共享数据
if((my_rank)== 0)
{
MPI_Isend(A,Rows,MPI_DOUBLE,my_rank + 1,0,MPI_COMM_WORLD,& request [1]);
MPI_Irecv(AA,Rows,MPI_DOUBLE,my_rank + 1,MPI_ANY_TAG,MPI_COMM_WORLD,& request [3]);
MPI_Wait(& request [3],& status [3]);
MPI_Isend(B,Rows,MPI_DOUBLE,my_rank + 2,0,MPI_COMM_WORLD,& request [5]);
MPI_Irecv(BB,Rows,MPI_DOUBLE,my_rank + 2,MPI_ANY_TAG,MPI_COMM_WORLD,& request [7]);
MPI_Wait(& request [7],& status [7]);
}
if((my_rank)== 1)
{
MPI_Irecv(CC,Rows,MPI_DOUBLE,my_rank-1,MPI_ANY_TAG,MPI_COMM_WORLD,& request [1]);
MPI_Wait(& request [1],& status [1]);
MPI_Isend(Cmpi,Rows,MPI_DOUBLE,my_rank-1,0,MPI_COMM_WORLD,& request [3]);
MPI_Isend(D,Rows,MPI_DOUBLE,my_rank + 2,0,MPI_COMM_WORLD,& request [6]);
MPI_Irecv(DD,Rows,MPI_DOUBLE,my_rank + 2,MPI_ANY_TAG,MPI_COMM_WORLD,& request [8]);
MPI_Wait(& request [8],& status [8]);
}
if((my_rank)== 2)
{
MPI_Isend(E,Rows,MPI_DOUBLE,my_rank + 1,0,MPI_COMM_WORLD,& request [2]);
MPI_Irecv(EE,Rows,MPI_DOUBLE,my_rank + 1,MPI_ANY_TAG,MPI_COMM_WORLD,& request [4]);
MPI_Wait(& request [4],& status [4]);
MPI_Irecv(FF,Rows,MPI_DOUBLE,my_rank-2,MPI_ANY_TAG,MPI_COMM_WORLD,& request [5]);
MPI_Wait(& request [5],& status [5]);
MPI_Isend(Fmpi,Rows,MPI_DOUBLE,my_rank-2,0,MPI_COMM_WORLD,& request [7]);
}
if((my_rank)== 3)
{
MPI_Irecv(GG,Rows,MPI_DOUBLE,my_rank-1,MPI_ANY_TAG,MPI_COMM_WORLD,& request [2]);
MPI_Wait(& request [2],& status [2]);
MPI_Isend(G,Rows,MPI_DOUBLE,my_rank-1,0%,MPI_COMM_WORLD,& request [4]);
MPI_Irecv(HH,Rows,MPI_DOUBLE,my_rank-2,MPI_ANY_TAG,MPI_COMM_WORLD,& request [6]);
MPI_Wait(& request [6],& status [6])
MPI_Isend(H,Rows,MPI_DOUBLE,my_rank-2,0,MPI_COMM_WORLD,& request [8]);
}
// RELOADING数据(从MPI_IRecv数组到用于在下一步计算的数组)
for(Columns,++)
{
for(Rows,++)
{
if(my_rank == 0)
{
if(Column == 47)
{
V [Column] [Row] = A [i];
}
if(Row == 47)
{
V [Column] [Row] = B [i]
}
}
...。
//打印输出文件在某个点
printval = 100;
if((printdata> = printval))
{
prttofile();
printdata = 0;
}
printdata = printdata + 1;
compute_dt();
} //关闭所有时间步骤
MPI_Finalize();
} //关闭MAIN
你反复调用MPI_Irecv吗?如果是这样,您可能没有意识到每个调用分配一个请求句柄 - 并且当消息被接收并且使用(例如。)MPI_Test测试完成时,这些被释放。这可能会过度使用MPI_Irecv或者MPI实现为此目的分配的内存耗尽内存。
只有看到代码才能确认问题。 / p>
The following sequence of errors is received when I try to run a problem on four processors. The MPI command I use is mpirun -np 4
I apologize for posting the error message as is (Primarily due a lack of knowledge on deciphering the information given). Would appreciate your input on the following:
What does the error message mean? At what point does one receive it? Is it because of the system memory (hardware) or is it due to a communication error (something related to MPI_Isend/Irecv?, i.e. Software issue).
Finally, how do I fix this?
Thanks!
ERROR message received follows below:: - -*PLEASE NOTE: This error is received only when the time is large*.Code computes fine when time required to compute data is small (i.e, 300 time steps compared to 1000 time steps)
aborting job:
Fatal error in MPI_Irecv: Other MPI error, error stack:
MPI_Irecv(143): MPI_Irecv(buf=0x8294a60, count=48, MPI_DOUBLE, src=2, tag=-1, MPI_COMM_WORLD, request=0xffffd68c) failed
MPID_Irecv(64): Out of memory
aborting job:
Fatal error in MPI_Irecv: Other MPI error, error stack:
MPI_Irecv(143): MPI_Irecv(buf=0x8295080, count=48, MPI_DOUBLE, src=3, tag=-1, MPI_COMM_WORLD, request=0xffffd690) failed
MPID_Irecv(64): Out of memory
aborting job:Fatal error in MPI_Isend: Internal MPI error!, error stack:
MPI_Isend(142): MPI_Isend(buf=0x8295208, count=48, MPI_DOUBLE, dest=3, tag=0, MPI_COMM_WORLD, request=0xffffd678) failed
(unknown)(): Internal MPI error!
aborting job:Fatal error in MPI_Irecv: Other MPI error, error stack:
MPI_Irecv(143): MPI_Irecv(buf=0x82959b0, count=48, MPI_DOUBLE, src=2, tag=-1, MPI_COMM_WORLD, request=0xffffd678) failed
MPID_Irecv(64): Out of memory
rank 3 in job 1 myocyte80_37021 caused collective abort of all ranks exit status of rank 3: return code 13
rank 1 in job 1 myocyte80_37021 caused collective abort of all ranks exit status of rank 1: return code 13
EDIT: (SOURCE CODE)
Header files
Variable declaration
TOTAL TIME =
...
...
double *A = new double[Rows];
double *AA = new double[Rows];
double *B = new double[Rows;
double *BB = new double[Rows];
....
....
int Rmpi;
int my_rank;
int p;
int source;
int dest;
int tag = 0;
function declaration
int main (int argc, char *argv[])
{
MPI_Status status[8];
MPI_Request request[8];
MPI_Init (&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
//PROBLEM SPECIFIC PROPERTIES. VARY BASED ON NODE
if (Flag = 1)
{
if (my_rank == 0)
{
Defining boundary (start/stop) for special elements in tissue (Rows x Column)
}
if (my_rank == 2)
..
if (my_rank == 3)
..
if (my_rank == 4)
..
}
//INITIAL CONDITIONS ALSO VARY BASED ON NODE
for (Columns = 0; Columns<48; i++) // Normal Direction
{
for (Rows = 0; Rows<48; y++) //Transverse Direction
{
if (Flag =1 )
{
if (my_rank == 0)
{
Initial conditions for elements
}
if (my_rank == 1) //MPI
{
}
..
..
..
//SIMULATION START
while(t[0][0] < TOTAL TIME)
{
for (Columns=0; Columns ++) //Normal Direction
{
for (Rows=0; Rows++) //Transverse Direction
{
//SOME MORE PROPERTIES BASED ON NODE
if (my_rank == 0)
{
if (FLAG = 1)
{
Condition 1
}
else
{
Condition 2
}
}
if (my_rank = 1)
....
....
...
//Evaluate functions (differential equations)
Function 1 ();
Function 2 ();
...
...
//Based on output of differential equations, different nodes estimate variable values. Since
the problem is of nearest neighbor, corners and edges have different neighbors/ boundary
conditions
if (my_rank == 0)
{
If (Row/Column at bottom_left)
{
Variables =
}
if (Row/Column at Bottom Right)
{
Variables =
}
}
...
...
//Keeping track of time for each element in Row and Column. Time is updated for a certain
element.
t[Column][Row] = t[Column][Row]+dt;
}
}//END OF ROWS AND COLUMNS
// MPI IMPLEMENTATION. AT END OF EVERY TIME STEP, Nodes communicate with nearest neighbor
//First step is to populate arrays with values estimated above
for (Columns, ++)
{
for (Rows, ++)
{
if (my_rank == 0)
{
//Loading the Edges of the (Row x Column) to variables. This One dimensional Array data
is shared with its nearest neighbor for computation at next time step.
if (Column == 47)
{
A[i] = V[Column][Row];
…
}
if (Row == 47)
{
B[i] = V[Column][Row];
}
}
...
...
//NON BLOCKING MPI SEND RECV TO SHARE DATA WITH NEAREST NEIGHBOR
if ((my_rank) == 0)
{
MPI_Isend(A, Rows, MPI_DOUBLE, my_rank+1, 0, MPI_COMM_WORLD, &request[1]);
MPI_Irecv(AA, Rows, MPI_DOUBLE, my_rank+1, MPI_ANY_TAG, MPI_COMM_WORLD, &request[3]);
MPI_Wait(&request[3], &status[3]);
MPI_Isend(B, Rows, MPI_DOUBLE, my_rank+2, 0, MPI_COMM_WORLD, &request[5]);
MPI_Irecv(BB, Rows, MPI_DOUBLE, my_rank+2, MPI_ANY_TAG, MPI_COMM_WORLD, &request[7]);
MPI_Wait(&request[7], &status[7]);
}
if ((my_rank) == 1)
{
MPI_Irecv(CC, Rows, MPI_DOUBLE, my_rank-1, MPI_ANY_TAG, MPI_COMM_WORLD, &request[1]);
MPI_Wait(&request[1], &status[1]);
MPI_Isend(Cmpi, Rows, MPI_DOUBLE, my_rank-1, 0, MPI_COMM_WORLD, &request[3]);
MPI_Isend(D, Rows, MPI_DOUBLE, my_rank+2, 0, MPI_COMM_WORLD, &request[6]);
MPI_Irecv(DD, Rows, MPI_DOUBLE, my_rank+2, MPI_ANY_TAG, MPI_COMM_WORLD, &request[8]);
MPI_Wait(&request[8], &status[8]);
}
if ((my_rank) == 2)
{
MPI_Isend(E, Rows, MPI_DOUBLE, my_rank+1, 0, MPI_COMM_WORLD, &request[2]);
MPI_Irecv(EE, Rows, MPI_DOUBLE, my_rank+1, MPI_ANY_TAG, MPI_COMM_WORLD, &request[4]);
MPI_Wait(&request[4], &status[4]);
MPI_Irecv(FF, Rows, MPI_DOUBLE, my_rank-2, MPI_ANY_TAG, MPI_COMM_WORLD, &request[5]);
MPI_Wait(&request[5], &status[5]);
MPI_Isend(Fmpi, Rows, MPI_DOUBLE, my_rank-2, 0, MPI_COMM_WORLD, &request[7]);
}
if ((my_rank) == 3)
{
MPI_Irecv(GG, Rows, MPI_DOUBLE, my_rank-1, MPI_ANY_TAG, MPI_COMM_WORLD, &request[2]);
MPI_Wait(&request[2], &status[2]);
MPI_Isend(G, Rows, MPI_DOUBLE, my_rank-1, 0, MPI_COMM_WORLD, &request[4]);
MPI_Irecv(HH, Rows, MPI_DOUBLE, my_rank-2, MPI_ANY_TAG, MPI_COMM_WORLD, &request[6]);
MPI_Wait(&request[6], &status[6]);
MPI_Isend(H, Rows, MPI_DOUBLE, my_rank-2, 0, MPI_COMM_WORLD, &request[8]);
}
//RELOADING Data (from MPI_IRecv array to array used to compute at next time step)
for (Columns, ++)
{
for (Rows, ++)
{
if (my_rank == 0)
{
if (Column == 47)
{
V[Column][Row]= A[i];
}
if (Row == 47)
{
V[Column][Row]=B[i];
}
}
….
//PRINT TO OUTPUT FILE AT CERTAIN POINT
printval = 100;
if ((printdata>=printval))
{
prttofile ();
printdata = 0;
}
printdata = printdata+1;
compute_dt ();
}//CLOSE ALL TIME STEPS
MPI_Finalize ();
}//CLOSE MAIN
Are you repeatedly calling MPI_Irecv? If so, you may not realize that each call allocates a request handle - and these are freed when the message is received and tested for completion with (eg.) MPI_Test. It's possible you could exhaust memory with over-use of MPI_Irecv - or the memory assigned by an MPI implementation for this purpose.
Only seeing the code would confirm the problem.
这篇关于MPI_Irecv中的致命错误:正在中止作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!