问题描述
我做了一个简单的CUDA程序练习。它只是将数据从一个数组复制到另一个数组:
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
from pycuda.compiler import SourceModule
#全局常量
N = 2 ** 20#数组大小a
a = np。 linspace(0,1,N)
e = np.empty_like(a)
block_size_x = 512
#实例化块和网格大小。
block_size =(block_size_x,1,1)
grid_size =(N / block_size_x,1)
#创建CUDA内核并运行它。
mod = SourceModule(
__global__ void D2x_kernel(double * a,double * e,int N){
int tid = blockDim.x * blockIdx.x + threadIdx.x;
if(tid> 0&& tid< N-1){
e [tid] = a [tid];
}
}
)
func = mod.get_function('D2x_kernel')
func(a,cuda.InOut(e),np.int32(N),block = block_size,grid = grid_size)$ b $但是,我得到这个错误: pycuda。b。 _driver.LogicError:cuLaunchKernel失败:无效值
当我摆脱第二个参数 double * e 在我的内核函数中并调用没有参数 e
的内核,错误消失。这是为什么?此错误是什么意思?解决方案您的 a
在设备内存中,所以我怀疑PyCUDA忽略(或以其他方式处理)你的内核调用的第一个参数,只传递 e
和 N
...so得到一个错误,因为内核期待三个参数,它只有两个。从内核定义中删除 double * e
可能会消除您得到的错误信息,但是您的内核仍然无法正常工作。
快速解决这个问题应该是在 cuda.In()
调用中包装 a
,它指示PyCUDA在启动内核之前将 a
复制到设备。也就是说,你的内核启动行应该是:
func(cuda.In(a),cuda.InOut np.int32(N),block = block_size,grid = grid_size)
意识到你的内核没有将 a的第一个和最后一个元素复制到 e
?您的 if(tid> 0&&& tid< N-1)
语句阻止了。对于整个数组,应该是 if(tid< N)
。
I made a simple CUDA program for practice. It simply copies over data from one array to another:
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
from pycuda.compiler import SourceModule
# Global constants
N = 2**20 # size of array a
a = np.linspace(0, 1, N)
e = np.empty_like(a)
block_size_x = 512
# Instantiate block and grid sizes.
block_size = (block_size_x, 1, 1)
grid_size = (N / block_size_x, 1)
# Create the CUDA kernel, and run it.
mod = SourceModule("""
__global__ void D2x_kernel(double* a, double* e, int N) {
int tid = blockDim.x * blockIdx.x + threadIdx.x;
if (tid > 0 && tid < N - 1) {
e[tid] = a[tid];
}
}
""")
func = mod.get_function('D2x_kernel')
func(a, cuda.InOut(e), np.int32(N), block=block_size, grid=grid_size)
print str(e)
However, I get this error: pycuda._driver.LogicError: cuLaunchKernel failed: invalid value
When I get rid of the second argument double* e
in my kernel function and invoke the kernel without the argument e
, the error goes away. Why is that? What does this error mean?
解决方案 Your a
array does not exist in device memory, so I suspect that PyCUDA is ignoring (or otherwise handling) the first argument to your kernel invocation and only passing in e
and N
...so you get an error because the kernel was expecting three arguments and it has only received two. Removing double* e
from your kernel definition might eliminate the error message you're getting, but your kernel still won't work properly.
A quick fix to this should be to wrap a
in a cuda.In()
call, which instructs PyCUDA to copy a
to the device before launching the kernel. That is, your kernel launch line should be:
func(cuda.In(a), cuda.InOut(e), np.int32(N), block=block_size, grid=grid_size)
Edit: Also, do you realize that your kernel is not copying the first and last elements of a
to e
? Your if (tid > 0 && tid < N - 1)
statement is preventing that. For the entire array, it should be if (tid < N)
.
这篇关于为什么我的相当琐碎的CUDA程序错误与某些参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!