c - openACC传递结构列表

我有一个C程序来查找两组多边形是否重叠。用户输入2组多边形（每组数据有几千个多边形），程序查看set1中的哪个多边形与set2中的哪个多边形重叠
我有两个这样的结构：

struct gpc_vertex  /* Polygon vertex */
{
    double          x;
    double          y;
};

struct gpc_vertex_list  /* Polygon contour */
{
    int pid;    // polygon id
    int             num_vertices;
    double *mbr;   // minimum bounding rectangle of the polygon, so always 4 elements

};

我有以下代码段：

#pragma acc kernels copy(listOfPolygons1[0:polygonCount1], listOfPolygons2[0:polygonCount2], listOfBoolean[0:dump])
for (i=0; i<polygonCount1; i++){
    polygon1 = listOfPolygons1[i];

    for (j=0; j<polygonCount2; j++){

        polygon2 = listOfPolygons2[j];
        idx = polygonCount2 * i + j;

        listOfBoolean[idx] = isRectOverlap(polygon1.mbr, polygon2.mbr);  // line 115

    }
}

listOfPolygons1和listOfPolygons2（顾名思义）是gpc_vertex_list的数组。
listOfBoolean是一个int数组。
检查两个多边形的mbr，看它们是否重叠，如果重叠，函数“isRecoverLap”返回1，如果不重叠，则返回0，并将值放入boolean列表
问题
代码可以编译但不能运行。它返回以下错误：

call to cuEventSynchronize returned error 700: Illegal address during kernel execution

我的观察
程序可以通过将第115行更改为以下内容来编译和运行：

isRectOverlap(polygon1.mbr, polygon2.mbr); // without assigning value to listOfBoolean

or this:

listOfBoolean[idx] = 5; // assigning an arbitrary value

(though the result is wrong, but at least, it can run)

Question
Both "isRectOverlap" and "listOfBoolean" do not seem to produce the problem if value is not passed from "isRectOverlap" to "listOfBoolean"
Does anyone know why it can't run if I assign the return value from "isRectOverlap" to "listOfBoolean"?

isRectOverlap function is like this:

int isRectOverlap(double *shape1, double *shape2){

    if (shape1[0] > shape2[2] || shape2[0] > shape1[2]){
        return 0;
    }

    if (shape1[1] < shape2[3] || shape2[1] < shape1[3]){
        return 0;
    }

    return 1;

}

程序在不运行OpenACC时没有问题
谢谢你的帮助

最佳答案

在OpenACC data子句中使用聚合数据类型时，将执行该类型的浅层副本。这里最可能发生的情况是，当将polygons数组列表复制到设备时，“mbr”将包含主机地址。因此，当访问“mbr”时，程序将给出一个非法地址错误。
如果评论说“mbr”永远是4，那么最简单的事情就是使“mbr”成为4的固定大小数组。
假设您将PGI编译器与NVIDIA设备一起使用，第二种方法是通过编译“-ta=tesla:managed”来使用CUDA统一内存。所有动态内存将由CUDA运行时处理，并允许在设备上访问主机地址。需要注意的是，它只能用于动态数据，整个程序只能使用设备上可用的内存，并且可能会减慢程序的速度。http://www.pgroup.com/lit/articles/insider/v6n2a4.htm
第三个选项是执行聚合类型到设备的深度复制。如果你决定走这条路，我可以举个例子。我在GTC2015的演讲中也谈到了这个话题：https://www.youtube.com/watch?v=rWLmZt_u5u4

关于c - openACC传递结构列表，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/38779782/

MBR

c - openACC传递结构列表