本文介绍了如何创建或操作 GPU 汇编程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人有创建/操作 GPU 机器代码的经验,可能是在运行时?

Does any one have experience in creating/manipulating GPU machine code, possibly at run-time?

我对修改 GPU 汇编代码感兴趣,可能在运行时以最小的开销进行修改.具体来说,我对基于汇编程序的遗传编程感兴趣.

I am interested in modifying GPU assembler code, possibly at run time with minimal overhead. Specifically I'm interested in assembler based genetic programming.

我知道 ATI 已经为他们的一些卡发布了 ISA,而 nvidia 最近发布了一个用于旧卡的 CUDA 反汇编程序,但我不确定是否可以在运行时甚至事先修改内存中的指令.

I understand ATI has released ISAs for some of their cards, and nvidia recently released a disassembler for CUDA for older cards, but I am not sure if it is possible to modify instructions in memory at runtime or even before hand.

这可能吗?欢迎提供任何相关信息.

Is this possible? Any related information is welcome.

推荐答案

NVIDIA PTX 生成和修改

不确定它与硬件相比有多低(可能未记录?),但它可以从类似 C/C++ 的 GPU 语言生成,并通过几种方式修改和重用:

Not sure how low level it is compared to the hardware (likely undocumented?), but it can be generated from C/C++-like GPU languages, modified and reused in a few ways:

  • OpenCL clGetProgramInfo(program, CL_PROGRAM_BINARIES + clCreateProgramWithBinary: minimal runnable example: How to use clCreateProgramWithBinary in OpenCL?

这些是标准化的 OpenCL API,用于生成和使用实现定义的格式,在适用于 Linux 的驱动程序版本 375.39 中恰好是人类可读的 PTX.

These are standardized OpenCL API's, which produce and consume implementation defined formats, which in driver version 375.39 for Linux happens to be human readable PTX.

这样你就可以转储 PTX,修改它,然后重新加载.

So you can dump the PTX, modify it, and reload.

nvcc:可以简单地将 CUDA GPU 端代码编译为 ptx 程序集:

nvcc: can compile CUDA GPU-side code to ptx assembly simply with either:

nvcc --ptx a.cu

nvcc 还可以编译包含设备和主机代码的 OpenCL C 程序:使用 NVIDIA 的 nvcc 编译器编译和构建 .cl 文件? 但我找不到如何使用 nvcc 获取 ptx.哪种有意义,因为它只是普通的 C + C 字符串,而不是神奇的 C 超集.这也建议:https://arrayfire.com/generating-ptx-文件来自opencl-code/

nvcc can also compile OpenCL C programs containing both device and host code: Compile and build .cl file using NVIDIA's nvcc Compiler? but I could not find how to get the ptx out with nvcc. Which kind of makes sense since it is just plain C + C strings, and not a magic C superset. This is also suggested by: https://arrayfire.com/generating-ptx-files-from-opencl-code/

而且我不确定如何重新编译修改后的 PTX 并像使用 clCreateProgramWithBinary 那样使用它:如何编译PTX代码

And I'm not sure how to recompile the modified PTX and use it as I did with clCreateProgramWithBinary: How to compile PTX code

使用 clGetProgramInfo,输入 CL 内核:

Using clGetProgramInfo, an input CL kernel:

__kernel void kmain(__global int *out) {
    out[get_global_id(0)]++;
}

被编译成一些 PTX,例如:

gets compiled to some PTX like:

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-21124049
// Cuda compilation tools, release 8.0, V8.0.44
// Based on LLVM 3.4svn
//

.version 5.0
.target sm_20
.address_size 64

    // .globl   _Z3incPi

.visible .entry _Z3incPi(
    .param .u64 _Z3incPi_param_0
)
{
    .reg .pred  %p<2>;
    .reg .b32   %r<4>;
    .reg .b64   %rd<5>;


    ld.param.u64    %rd1, [_Z3incPi_param_0];
    mov.u32     %r1, %ctaid.x;
    setp.gt.s32 %p1, %r1, 2;
    @%p1 bra    BB0_2;

    cvta.to.global.u64  %rd2, %rd1;
    mul.wide.s32    %rd3, %r1, 4;
    add.s64     %rd4, %rd2, %rd3;
    ldu.global.u32  %r2, [%rd4];
    add.s32     %r3, %r2, 1;
    st.global.u32   [%rd4], %r3;

BB0_2:
    ret;
}

然后,例如,如果您修改该行:

Then if for example you modify the line:

add.s32     %r3, %r2, 1;

到:

add.s32     %r3, %r2, 2;

并重用修改后的 PTX,它实际上增加了 2 而不是预期的 1.

and reuse the PTX modified, it actually increments by 2 instead of 1 as expected.

这篇关于如何创建或操作 GPU 汇编程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-12 22:47