我在将正确的参数传递给 prepare function (和prepared_call)到 allocate of shared memory in PyCUDA 时遇到问题。我以这种方式理解错误消息,我传递给 PyCUDA 的变量之一是 long 而不是我想要的 float32 。但我看不到变量来自哪里。

此外,在我看来,official exampledocumentation of prepare block 是否需要是 None 方面相互矛盾。

from pycuda import driver, compiler, gpuarray, tools
import pycuda.autoinit
import numpy as np

kernel_code ="""
__device__ void loadVector(float *target, float* source, int dimensions )
{
    for( int i = 0; i < dimensions; i++ ) target[i] = source[i];
}
__global__ void kernel(float* data, int dimensions, float* debug)
{
    extern __shared__ float mean[];
    if(threadIdx.x == 0) loadVector( mean, &data[0], dimensions );
    debug[threadIdx.x]=  mean[threadIdx.x];
}
"""

dimensions = 12
np.random.seed(23)
data = np.random.randn(dimensions).astype(np.float32)
data_gpu = gpuarray.to_gpu(data)
debug = gpuarray.zeros(dimensions, dtype=np.float32)

mod = compiler.SourceModule(kernel_code)
kernel = mod.get_function("kernel")
kernel.prepare("PiP",block = (dimensions, 1, 1),shared=data.size)
grid = (1,1)
kernel.prepared_call(grid,data_gpu,dimensions,debug)
print debug.get()

输出
Traceback (most recent call last):
File "shared_memory_minimal_example.py", line 28, in <module>
kernel.prepared_call(grid,data_gpu,dimensions,debug)
File "/usr/local/lib/python2.6/dist-packages/pycuda-0.94.2-py2.6-linux-x86_64.egg/pycuda/driver.py", line 230, in function_prepared_call
func.param_setv(0, pack(func.arg_format, *args))
pycuda._pvt_struct.error: cannot convert argument to long

最佳答案

我遇到了同样的问题,我花了一段时间才找到答案,所以就这样吧。错误消息的原因是 data_gpu 是一个 GPUArray 实例,即你用

data_gpu = gpuarray.to_gpu(data)

要将其传递给 prepare_call,您需要执行 data_gpu.gpudata 以获取关联的 DeviceAllocation 实例(即有效指向设备内存位置的指针)。

此外,将块参数传递给 prepare 现在是 deprecated - 所以正确的调用应该是这样的:
data_gpu = gpuarray.to_gpu(data)
func.prepare( "P" )
grid = (1,1)
block = (1,1,1)
func.prepared_call( grid, block, data_gpu.gpudata )

关于cuda - 如何使用 PyCUDA 中的 `prepare` 函数,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/6954487/

10-11 16:04