为什么我的相当琐碎的CUDA程序错误与某些参数？

本文介绍了为什么我的相当琐碎的CUDA程序错误与某些参数？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！
问题描述

我做了一个简单的CUDA程序练习。它只是将数据从一个数组复制到另一个数组：
  import pycuda.driver as cuda 
 import pycuda.autoinit 
 import numpy as np 
 from pycuda.compiler import SourceModule 
 
＃全局常量
 N = 2 ** 20＃数组大小a 
a = np。 linspace（0，1，N）
e = np.empty_like（a）
 block_size_x = 512 
 
＃实例化块和网格大小。 
 block_size =（block_size_x，1，1）
 grid_size =（N / block_size_x，1）
 
＃创建CUDA内核并运行它。 
 mod = SourceModule（
 __global__ void D2x_kernel（double * a，double * e，int N）{
 int tid = blockDim.x * blockIdx.x + threadIdx.x; 
 if（tid> 0&& tid< N-1）{
e [tid] = a [tid]; 
} 
} 
 ）
 func = mod.get_function（'D2x_kernel'）
 func（a，cuda.InOut（e），np.int32（N），block = block_size，grid = grid_size）$ b $但是，我得到这个错误： pycuda。b。 _driver.LogicError：cuLaunchKernel失败：无效值 
 
 
当我摆脱第二个参数 double * e 在我的内核函数中并调用没有参数 e 的内核，错误消失。这是为什么？此错误是什么意思？
解决方案
您的 a 在设备内存中，所以我怀疑PyCUDA忽略（或以其他方式处理）你的内核调用的第一个参数，只传递 e 和 N ...so得到一个错误，因为内核期待三个参数，它只有两个。从内核定义中删除 double * e 可能会消除您得到的错误信息，但是您的内核仍然无法正常工作。
 
 
 快速解决这个问题应该是在 cuda.In（）调用中包装 a  ，它指示PyCUDA在启动内核之前将 a 复制到设备。也就是说，你的内核启动行应该是：
  func（cuda.In（a），cuda.InOut np.int32（N），block = block_size，grid = grid_size）
  
意识到你的内核没有将 a的第一个和最后一个元素复制到 e ？您的 if（tid> 0&&& tid< N-1）语句阻止了。对于整个数组，应该是 if（tid< N）。
 
I made a simple CUDA program for practice. It simply copies over data from one array to another:
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
from pycuda.compiler import SourceModule

# Global constants
N = 2**20 # size of array a
a = np.linspace(0, 1, N)
e = np.empty_like(a)
block_size_x = 512

# Instantiate block and grid sizes.
block_size = (block_size_x, 1, 1)
grid_size = (N / block_size_x, 1)

# Create the CUDA kernel, and run it.
mod = SourceModule("""
  __global__ void D2x_kernel(double* a, double* e, int N) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    if (tid > 0 && tid < N - 1) {
      e[tid] = a[tid];
    }
  }
""")
func = mod.get_function('D2x_kernel')
func(a, cuda.InOut(e), np.int32(N), block=block_size, grid=grid_size)
print str(e) 
However, I get this error: pycuda._driver.LogicError: cuLaunchKernel failed: invalid value
When I get rid of the second argument double* e in my kernel function and invoke the kernel without the argument e, the error goes away. Why is that? What does this error mean?
 解决方案 
Your a array does not exist in device memory, so I suspect that PyCUDA is ignoring (or otherwise handling) the first argument to your kernel invocation and only passing in e and N...so you get an error because the kernel was expecting three arguments and it has only received two.  Removing double* e from your kernel definition might eliminate the error message you're getting, but your kernel still won't work properly.
A quick fix to this should be to wrap a in a cuda.In() call, which instructs PyCUDA to copy a to the device before launching the kernel.  That is, your kernel launch line should be:
func(cuda.In(a), cuda.InOut(e), np.int32(N), block=block_size, grid=grid_size)
Edit: Also, do you realize that your kernel is not copying the first and last elements of a to e?  Your if (tid > 0 && tid < N - 1) statement is preventing that.  For the entire array, it should be if (tid < N).
                        这篇关于为什么我的相当琐碎的CUDA程序错误与某些参数？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！