本文介绍了为什么我的相当琐碎的CUDA程序错误与某些参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做了一个简单的CUDA程序练习。它只是将数据从一个数组复制到另一个数组:

  import pycuda.driver as cuda 
import pycuda.autoinit
import numpy as np
from pycuda.compiler import SourceModule

#全局常量
N = 2 ** 20#数组大小a
a = np。 linspace(0,1,N)
e = np.empty_like(a)
block_size_x = 512

#实例化块和网格大小。
block_size =(block_size_x,1,1)
grid_size =(N / block_size_x,1)

#创建CUDA内核并运行它。
mod = SourceModule(
__global__ void D2x_kernel(double * a,double * e,int N){
int tid = blockDim.x * blockIdx.x + threadIdx.x;
if(tid> 0&& tid< N-1){
e [tid] = a [tid];
}
}

func = mod.get_function('D2x_kernel')
func(a,cuda.InOut(e),np.int32(N),block = block_size,grid = grid_size)$ b $但是,我得到这个错误: pycuda。b。 _driver.LogicError:cuLaunchKernel失败:无效值



当我摆脱第二个参数 double * e 在我的内核函数中并调用没有参数 e 的内核,错误消失。这是为什么?此错误是什么意思?

解决方案

您的 a 在设备内存中,所以我怀疑PyCUDA忽略(或以其他方式处理)你的内核调用的第一个参数,只传递 e N ...so得到一个错误,因为内核期待三个参数,它只有两个。从内核定义中删除 double * e 可能会消除您得到的错误信息,但是您的内核仍然无法正常工作。



快速解决这个问题应该是在 cuda.In()调用中包装 a ,它指示PyCUDA在启动内核之前将 a 复制到设备。也就是说,你的内核启动行应该是:

  func(cuda.In(a),cuda.InOut np.int32(N),block = block_size,grid = grid_size)

意识到你的内核没有将 a的第一个和最后一个元素复制到 e ?您的 if(tid> 0&&& tid< N-1)语句阻止了。对于整个数组,应该是 if(tid< N)


I made a simple CUDA program for practice. It simply copies over data from one array to another:

import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
from pycuda.compiler import SourceModule

# Global constants
N = 2**20 # size of array a
a = np.linspace(0, 1, N)
e = np.empty_like(a)
block_size_x = 512

# Instantiate block and grid sizes.
block_size = (block_size_x, 1, 1)
grid_size = (N / block_size_x, 1)

# Create the CUDA kernel, and run it.
mod = SourceModule("""
  __global__ void D2x_kernel(double* a, double* e, int N) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    if (tid > 0 && tid < N - 1) {
      e[tid] = a[tid];
    }
  }
""")
func = mod.get_function('D2x_kernel')
func(a, cuda.InOut(e), np.int32(N), block=block_size, grid=grid_size)
print str(e) 

However, I get this error: pycuda._driver.LogicError: cuLaunchKernel failed: invalid value

When I get rid of the second argument double* e in my kernel function and invoke the kernel without the argument e, the error goes away. Why is that? What does this error mean?

解决方案

Your a array does not exist in device memory, so I suspect that PyCUDA is ignoring (or otherwise handling) the first argument to your kernel invocation and only passing in e and N...so you get an error because the kernel was expecting three arguments and it has only received two. Removing double* e from your kernel definition might eliminate the error message you're getting, but your kernel still won't work properly.

A quick fix to this should be to wrap a in a cuda.In() call, which instructs PyCUDA to copy a to the device before launching the kernel. That is, your kernel launch line should be:

func(cuda.In(a), cuda.InOut(e), np.int32(N), block=block_size, grid=grid_size)

Edit: Also, do you realize that your kernel is not copying the first and last elements of a to e? Your if (tid > 0 && tid < N - 1) statement is preventing that. For the entire array, it should be if (tid < N).

这篇关于为什么我的相当琐碎的CUDA程序错误与某些参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-23 03:58