本文介绍了NDRange工作项数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用OpenCL复制图像:

I'm trying to copy an image using OpenCL:

std::string kernelCode =
            "void kernel copy(global const int* image, global int* result)"
            "{"
                "result[get_global_id(0)] = image[get_global_id(0)];"
            "}";

图片包含200 * 300像素.

The image contains 200 * 300 pixels.

根据CL_DEVICE_MAX_WORK_GROUP_SIZE

在队列中:

int size = _originalImage.width() * _originalImage.height();
//...
queue.enqueueNDRangeKernel(imgProcess, cl::NullRange, cl::NDRange(size), cl::NullRange);

提供段错误.

queue.enqueueNDRangeKernel(imgProcess, cl::NullRange, cl::NDRange(10000), cl::NullRange);

运行正常,但是它只返回图像的一部分.

Runs fine, but it gives back only part of the image.

我在这里想念什么?

推荐答案

正如您已经正确指出的那样,您的CL_DEVICE_MAX_WORK_GROUP_SIZE小于您要启动的线程数. segfault指示运行时错误.如果在代码文件的开头添加以下定义(在包含任何OpenCL标头之前),则可以从OpenCL中获取C ++错误

As you have already stated correctly, your CL_DEVICE_MAX_WORK_GROUP_SIZE is less than the number of threads you want to start. The segfault indicates an error in the runtime. You can get C++ errors from OpenCL if you add the following define at the beginning of your codefile (before you include any OpenCL headers)

#define __CL_ENABLE_EXCEPTIONS

第二行代码显然只复制图像的前10000个像素,而不是全部60000个像素.如果只想使用10000个线程,则需要进行六次调用,每次都调整NDRange偏移量.

The second line of code clearly only copies the first 10000 pixels of your image instead of all 60000. If you want to use only 10000 threads, you need to do this call six times with an adjusted NDRange offset each time.

通常,我建议使用cl :: copy复制图像或修改内核以每个线程复制多个像素.

Generally I would advise to either use cl::copy to copy an image or modify your kernel to copy multiple pixels per thread.

此外,我不确定将本地工作组大小设置为NullRange的效果.由于本地工作组的大小对您而言无关紧要,因此我认为最好不使用此参数,而仅使用3个参数(省略最后一个)使用enqueueNDRangeKernel的版本.

Furthermore I'm quite unsure about the effect of setting the local workgroup size to NullRange. As the local workgroup size does not matter in your case, I think it is the best to just leave out this parameter and use the version of enqueueNDRangeKernel with only 3 arguments (omitting the last one).

这篇关于NDRange工作项数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 18:23