问题描述
我有一个2M +点的数组(计划在适当的时候增加到20M),我正在通过OpenCL进行计算.我想删除属于随机三角形几何形状的所有点.
I have an array of 2M+ points (planned to be increased to 20M in due course) that I am running calculations on via OpenCL. I'd like to delete any points that fall within a random triangle geometry.
如何在OpenCL内核进程中执行此操作?
How can I do this within an OpenCL kernel process?
我已经可以:
-
标识那些不在三角形范围内的点(内核中多边形算法中的简单点)
identify those points that fall outside the triangle (simple point in poly algorithm in the kernel)
将其坐标传递到全局输出数组.
pass their coordinates to a global output array.
但是:
-
openCL全局输出数组不能是变量,因此我将其初始化为与输入点的大小匹配的数组
an openCL global output array cannot be variable and so I initialise it to match the input array of points in terms of size
结果是,当一个点落在三角形内时,最终输出中将出现0,0个点
As a result, 0,0 points occur in the final output when a point falls within the triangle
因此,输出数组本身不会导致任何减少.
The output array therefore does not result in any reduction per se.
能否在openCL上下文中删除0,0点?
Can the 0,0 points be deleted within the openCL context?
n.b.我在OpenFrameworks中编码,因此c ++实现链接到.cl文件
n.b. I am coding in OpenFrameworks, so c++ implementations are linking to .cl files
推荐答案
对于大多数点都位于原子条件内的情况,它只是一种替代方法:
Just an alternative for the case where most of the points fall inside the atomic condition:
可能有一个本地计数器和本地原子.然后,可以将该原子合并为全局值,可以使用atomic_add()
. Witch将返回先前"的全局值.因此,您只需将索引复制到该地址即可.
It is possible to have a local counter, and local atomic. Then to merge that atomic to the global value it is possible to use atomic_add()
. Witch will return the "previous" global value. So, you just copy the indexes to that address and up.
这应该是明显的加速,因为线程将在本地同步,并且仅全局同步一次.全局副本可以是并行的,因为地址永远不会重叠.
It should be a noticeable speed up, since the threads will sync locally and only once globally. The global copy can be parallel since the address will never overlap.
例如:
__kernel mykernel(__global MyType * global_out, __global int * global_count, _global MyType * global_in){
int lid = get_local_id(0);
int lws = get_local_size(0);
int idx = get_global_id(0);
__local int local_count;
__local int global_val;
//I am using a local container, but a local array of pointers to global is possible as well
__local MyType local_out[WG_SIZE]; //Ensure this is higher than your work_group size
if(lid==0){
local_count = 0; global_val = -1;
}
barrier(CLK_LOCAL_MEM_FENCE);
//Classify them
if(global_in[idx] == ....)
local_out[atomic_inc(local_count)] = global_in[idx];
barrier(CLK_LOCAL_MEM_FENCE);
//If not, we are done
if(local_count > 0){
//Only the first local ID does the atomic to global
if(lid == 0)
global_val = atomic_add(global_count,local_count);
//Resync all the local workers here
barrier(CLK_LOCAL_MEM_FENCE);
//Copy all the data
for(int i=0; i<local_count; i+=lws)
global_out[global_val+i] = local_out[i];
}
}
注意:我没有编译它,但是应该或多或少地起作用.
这篇关于OpenCL从大型阵列中选择/删除点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!