本文介绍了如何通过一个clFinish分析顺序启动的多个OpenCL内核?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个内核,它们以如下顺序方式启动:

I have multiple kernels, and they are launched in sequential manner like this:

        clEnqueueNDRangeKernel(..., kernel1, ...);
        clEnqueueNDRangeKernel(..., kernel2, ...);
        clEnqueueNDRangeKernel(..., kernel3, ...);

,并且多个内核共享一个全局缓冲区.

and, multiple kernels share one global buffer.

现在,我对每个内核执行进行概要分析,并通过在clEnqueueNDRangeKernel之后添加代码块来汇总它们,以计算总执行时间:

Now, I profile every kernel execution and sum them up to count total execution time by adding the code block after clEnqueueNDRangeKernel:

        clFinish(cmdQueue);
        status = clGetEventProfilingInfo(...,&starttime,...);
        clGetEventProfilingInfo(...,&endtime,...);
        time_spent = endtime - starttime;

我的问题是,如何通过一个clFinish将三个内核一起分析? (例如在最后一次内核启动后添加一个clFinish().)

My questions is that how to profile three kernels all together by one clFinish? (like adding one clFinish() after the last kernel launching).

是的,我给每个clEnqueueNDRangeKernel不同的时间事件,并且得到大的负数.详细信息:

Yes, I give every clEnqueueNDRangeKernel different time event, and get large Negative number.The detail information:

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event1);
clFinish(cmdQueue);
clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime1,NULL);
clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime1,NULL);
time_spent1 = endtime1 - starttime1;

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event2);
clFinish(cmdQueue);
clGetEventProfilingInfo(timing_event2,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime2,NULL);
clGetEventProfilingInfo(timing_event2,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime2,NULL);
time_spent2 = endtime2 - starttime2;

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event3);
clFinish(cmdQueue);
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime3,NULL);
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime3,NULL);
time_spent3 = endtime3 - starttime3;

time_spent_all_0 = time_spent1 + time_spent2 + time_spent3;
time_spent_all_1 = endtime3 - starttime1;

如果我有每个clFinish,则所有性能分析值都是合理的,但是time_spent_all_1大约是time_spent_all_0的2倍.如果我删除除最后一个clFinish以外的所有clFinish,则所有分析值都不合理.

If I have every clFinish, all profiling values are reasonable, but time_spent_all_1 is about 2 times over time_spent_all_0.If I remove all clFinish except for the last clFinish, all profiling values are non reasonable.

感谢Eric Bainville,我得到了想要的结果:通过一个clFinish对多个clEnqueueNDRangeKernel进行概要分析.以下是我使用的最终代码:

Thanks to Eric Bainville that I have gotten the result I want: profiling multiple clEnqueueNDRangeKernel by one clFinish. The following is final code I use:

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event1);
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event2);
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event3);
clFinish(cmdQueue);

clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime,NULL);
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime,NULL);
time_spent = endtime - starttime;

推荐答案

每个clEnqueueNDRangeKernel都将创建自己的cl_event:调用的最后一个arg是指向cl_event的指针;如果最后一个arg不为0,则将创建一个新事件.

Each clEnqueueNDRangeKernel will create its own cl_event: the last arg of the call is a pointer to a cl_event; if this last arg is not 0, a new event will be created.

命令完成后,可以查询关联的事件的开始/结束分析信息.使用后必须释放此事件(调用clReleaseEvent).

After a command has completed, the associated event can be queried the start/end profiling info. This event must be released after use (call clReleaseEvent).

clFinish阻塞,直到所有入队命令完成.

clFinish blocks until all enqueued commands are completed.

您只需一个呼叫clFinish,然后您就可以查询所有事件的分析信息.

You need only one call to clFinish, and then you can query profiling info for all events.

这篇关于如何通过一个clFinish分析顺序启动的多个OpenCL内核?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-16 06:08