问题描述
我有使用OpenCL进行计算的程序,OpenCL代码很大,编译时间约为2分钟,CPU负载为100%.当然,我保存编译的二进制结果.然后从二进制文件中第二次启动加载opencl程序.我可以在具有相同芯片但特性(RAM,CLOCK等)不同的另一视频卡上使用相同的二进制文件吗?
就OpenCL规范而言,您只能保证可以在创建程序二进制文件的同一设备上重复使用该程序二进制文件. /p>
实际上,许多OpenCL实现返回的二进制文件与同一供应商提供的更多设备兼容.例如,当您从实现中请求二进制文件时,NVIDIA返回 PTX ,这是一个相当高级的中间表示形式(即不是本机指令).当然,这与使用创建时使用相同架构的其他设备(例如,所有GK110设备或所有GF104设备)兼容,并且很可能也可以移植到其他一系列NVIDIA GPU架构中.其他供应商还返回了各种类型的中间表示形式(通常基于LLVM IR),可以实现这种二进制兼容性.
所以是的,您可能可以在具有相同体系结构的不同设备之间重复使用二进制文件,但是实际上您只需要尝试一下即可.您总是可以实施一个尝试使用二进制文件的方案,但该方案失败后将诉诸于源代码.
将来,我们希望有大量供应商支持最近批准的 SPIR规范,这是OpenCL设备程序的平台可移植的中间表示形式.这样一来,您不仅可以生成与单个供应商体系结构中的设备兼容的二进制文件,而且还可以跨多个也支持SPIR的其他供应商的设备生成二进制文件.显然,要使SPIR降低至本机指令集,还存在一些编译开销,但是与编译原始OpenCL C代码相比,这仍将导致明显的提速.
I have program that use OpenCL for calculation, OpenCL code is big and compile time is about 2 minutes with 100% CPU load. Of course i save binary results of compilation. And second launch load opencl program from binary. Can i use same binary on another video card with same chip but different characteristics (RAM,CLOCK,etc.)?
As far as the OpenCL specification is concerned, you only have guarantees that a program binary can be re-used on the same device on which it was created.
In reality, the binaries that are returned by many OpenCL implementations are compatible with a wider range of devices available from that same vendor. For example, NVIDIA return PTX when you request binaries from their implementation, which is a reasonably high level intermediate representation (i.e. not native instructions). This is certainly compatible with other devices using the same architecture on which it was created (e.g. all GK110 devices, or all GF104 devices), and quite likely to be portable across a range of other NVIDIA GPU architectures too. Other vendors also return various types of intermediate representation (usually LLVM IR based) that allow this kind of binary compatibility.
So yes, you can probably re-use binaries across different devices that have the same architecture, but you'll really just have to try it and see. You could always implement a scheme that tries to use the binary and it that fails resort back to the source code.
In the future, we will hopefully see a large number of vendors supporting the recently ratified SPIR specification, which is a platform-portable intermediate representation for OpenCL device programs. This would allow you to generate binaries that are not only compatible with devices from a single vendor's architecture, but also across devices from many other vendors that also support SPIR. There would clearly be some remaining compilation overhead to lower SPIR to the native instruction set, but this should still result in significant speed-ups compared to compiling raw OpenCL C code.
这篇关于在另一张卡上运行的OpenCL字节码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!