本文介绍了OpenCL从大型阵列中选择/删除点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个2M +点的数组(计划在适当的时候增加到20M),我正在通过OpenCL进行计算.我想删除属于随机三角形几何形状的所有点.

I have an array of 2M+ points (planned to be increased to 20M in due course) that I am running calculations on via OpenCL. I'd like to delete any points that fall within a random triangle geometry.

如何在OpenCL内核进程中执行此操作?

How can I do this within an OpenCL kernel process?

我已经可以:

  • 标识那些不在三角形范围内的点(内核中多边形算法中的简单点)

  • identify those points that fall outside the triangle (simple point in poly algorithm in the kernel)

将其坐标传递到全局输出数组.

pass their coordinates to a global output array.

但是:

  • openCL全局输出数组不能是变量,因此我将其初始化为与输入点的大小匹配的数组

  • an openCL global output array cannot be variable and so I initialise it to match the input array of points in terms of size

结果是,当一个点落在三角形内时,最终输出中将出现0,0个点

As a result, 0,0 points occur in the final output when a point falls within the triangle

因此,输出数组本身不会导致任何减少.

The output array therefore does not result in any reduction per se.

能否在openCL上下文中删除0,0点?

Can the 0,0 points be deleted within the openCL context?

n.b.我在OpenFrameworks中编码,因此c ++实现链接到.cl文件

n.b. I am coding in OpenFrameworks, so c++ implementations are linking to .cl files

推荐答案

对于大多数点都位于原子条件内的情况,它只是一种替代方法:

Just an alternative for the case where most of the points fall inside the atomic condition:

可能有一个本地计数器和本地原子.然后,可以将该原子合并为全局值,可以使用atomic_add(). Witch将返回先前"的全局值.因此,您只需将索引复制到该地址即可.

It is possible to have a local counter, and local atomic. Then to merge that atomic to the global value it is possible to use atomic_add(). Witch will return the "previous" global value. So, you just copy the indexes to that address and up.

这应该是明显的加速,因为线程将在本地同步,并且仅全局同步一次.全局副本可以是并行的,因为地址永远不会重叠.

It should be a noticeable speed up, since the threads will sync locally and only once globally. The global copy can be parallel since the address will never overlap.

例如:

__kernel mykernel(__global MyType * global_out, __global int * global_count, _global MyType * global_in){
   int lid = get_local_id(0);
   int lws = get_local_size(0);
   int idx = get_global_id(0);

   __local int local_count;
   __local int global_val;
   //I am using a local container, but a local array of pointers to global is possible as well
   __local MyType local_out[WG_SIZE]; //Ensure this is higher than your work_group size
   if(lid==0){
      local_count = 0; global_val = -1;
   }
   barrier(CLK_LOCAL_MEM_FENCE);

   //Classify them
   if(global_in[idx] == ....)
       local_out[atomic_inc(local_count)] = global_in[idx];

   barrier(CLK_LOCAL_MEM_FENCE);

   //If not, we are done
   if(local_count > 0){
      //Only the first local ID does the atomic to global
      if(lid == 0)
         global_val = atomic_add(global_count,local_count);

      //Resync all the local workers here
      barrier(CLK_LOCAL_MEM_FENCE);

      //Copy all the data
      for(int i=0; i<local_count; i+=lws)
          global_out[global_val+i] = local_out[i];
   }
}

注意:我没有编译它,但是应该或多或少地起作用.

这篇关于OpenCL从大型阵列中选择/删除点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-25 21:18