如果需要跨多个工作组的全局内存一致性,则需要将内核拆分为多个内核.I have a problem while reading a couple of positions in a double array from different threads.I enqueue the execution with :nelements = nx*ny;err = clEnqueueNDRangeKernel(queue,kernelTvl2of,1,NULL,&nelements,NULL,0,NULL,NULL);kernelTvl2of has (among other) the codesize_t k = get_global_id(0);(...)u1_[k] = (float)u1[k];(...)barrier(CLK_GLOBAL_MEM_FENCE);forwardgradient(u1_,u1x,u1y,k,nx,ny);barrier(CLK_GLOBAL_MEM_FENCE);and forwardgradient has the code:void forwardgradient(global double *f, global double *fx, global double *fy, int ker,int nx, int ny){unsigned int rowsnotlast = ((nx)*(ny-1));if(ker<rowsnotlast){ fx[ker] = f[ker+1] - f[ker]; fy[ker] = f[ker+nx] - f[ker];}if(ker<nx*ny){ fx[ker] = f[ker+1] - f[ker]; if(ker==4607){ fx[0] = f[4607]; fx[1] = f[4608]; fx[2] = f[4608] - f[4607]; fx[3] = f[ker]; fx[4] = f[ker+1]; fx[5] = f[ker+1] - f[ker]; }}if(ker==(nx*ny)-1){ fx[ker] = 0; fy[ker] = 0;}if(ker%nx == nx-1){ fx[ker]=0;}fx[6] = f[4608];}When I get the contents of the first positions of fx, they are:-6 0 6 -6 0 6 -6And here's my problem: when I query fx[ker+1] or fx[4608] on thread with id 4607 I get a '0' (positions second and fifth of the output array), but from other threads I get a '-6' last position of the output array)Anyone has a clue on what I'm doing wrong, or where I could look to?Thanks a lot,Anton 解决方案 Within a kernel, global memory consistency is only achievable within a single work-group. This means that if a work-item writes a value to global memory, a barrier(CLK_GLOBAL_MEM_FENCE) only guarantees that other work-items within the same work-group will be able to read the updated value.If you need global memory consistency across multiple work-groups, you need to split your kernel into multiple kernels. 这篇关于在不同线程上读取相同的内存位置时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
09-02 16:42