我如何知道内核是否同时执行

我如何知道内核是否同时执行

本文介绍了我如何知道内核是否同时执行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有CC 3.0的GPU,因此它应该支持16个并发内核.我将通过clEnqueueNDRangeKernel循环10次来启动10个内核.我怎么知道内核正在同时执行?

I have a GPU with CC 3.0, so it should support 16 concurrent kernels. I am starting 10 kernels by looping through clEnqueueNDRangeKernel for 10 times. How do I get to know that the kernels are executing concurrently?

我想过的一种方法是获取NDRangeKernel语句之前和之后的时间.我可能必须使用事件,以确保内核的执行已完成.但是我仍然觉得循环将按顺序启动内核.有人可以帮我吗.

One way which I have thought is to get the time before and after the NDRangeKernel statement. I might have to use events so as to ensure the execution of the kernel has completed. But I still feel that the loop will start the kernels sequentially. Can someone help me out..

推荐答案

要确定您的内核执行是否重叠,必须对它们进行概要分析.这需要几个步骤:

To determine if your kernel executions overlap, you have to profile them. This requires several steps:

仅当使用属性CL_QUEUE_PROFILING_ENABLE创建命令队列时,才会收集分析数据:

Profiling data is only collected if the command-queue is created with the property CL_QUEUE_PROFILING_ENABLE:

cl_command_queue queues[10];
for (int i = 0; i < 10; ++i) {
  queues[i] = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE,
                                   &errcode);
}

2.确保所有内核同时启动

您的假设是正确的,即CPU按顺序将内核排队.但是,您可以创建一个用户事件并将其添加到所有内核的等待列表中.这将导致内核在用户事件完成之前无法开始运行:

2. Making sure all kernels start at the same time

You are right in your assumption that the CPU queues the kernels sequentially. However, you can create a single user event and add it to the wait list for all kernels. This causes the kernels not to start running before the user event is completed:

// Create the user event
cl_event user_event = clCreateUserEvent(context, &errcode);

// Reserve space for kernel events
cl_event kernel_events[10];

// Enqueue kernels
for (int i = 0; i < 10; ++i) {
  clEnqueueNDRangeKernel(queues[i], kernel, work_dim, global_work_offset,
                         global_work_size, 1, &user_event, &kernel_events[i]);
}

// Start all kernels by completing the user event
clSetUserEventStatus(user_event, CL_COMPLETE);

3.获取分析时间

最后,我们可以收集内核事件的计时信息:

3. Obtain profiling times

Finally, we can collect the timing information for the kernel events:

// Block until all kernels have run to completion
clWaitForEvents(10, kernel_events);

for (int i = 0; i < 10; ++i) {
  cl_ulong start;
  clGetEventProfilingInfo(kernel_event[i], CL_PROFILING_COMMAND_START,
                          sizeof(start), &start, NULL);
  cl_ulong end;
  clGetEventProfilingInfo(kernel_event[i], CL_PROFILING_COMMAND_END,
                          sizeof(end), &end, NULL);
  printf("Event %d: start=%llu, end=%llu", i, start, end);
}

4.分析输出

现在您已经掌握了所有内核运行的开始和结束时间,您可以检查重叠(手动或以编程方式).输出单位为纳秒.但是请注意,设备计时器仅在特定分辨率下才是准确的.您可以使用以下方法查询分辨率:

4. Analyzing the output

Now that you have the start and end times of all kernel runs, you can check for overlaps (either by hand or programmatically). The output units are nanoseconds. Note however that the device timer is only accurate to a certain resolution. You can query the resolution using:

size_t resolution;
clGetDeviceInfo(device, CL_DEVICE_PROFILING_TIMER_RESOLUTION,
                sizeof(resolution), &resolution, NULL);

FWIW,我在带有CC 2.0(应支持并发内核)的NVIDIA设备上进行了尝试,并观察到内核是按顺序运行的.

FWIW, I tried this on a NVIDIA device with CC 2.0 (which should support concurrent kernels) and observed that the kernels were run sequentially.

这篇关于我如何知道内核是否同时执行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 05:57