问题描述
下面是节目
的#include<&stdio.h中GT;
#包括LT&;&cuda.h GT;
#包括LT&;&cuda_runtime.h GT;
#包括LT&;&device_launch_parameters.h GT; __global__无效加成(INT *一,为int * B,INT * C)
{
* C = * A + * B;
}
诠释的main()
{
诠释A,B,C;
为int * DEV_A,* dev_b,* dev_c;
INT大小= sizeof的(INT); cudaMalloc((无效**)及DEV_A,大小);
cudaMalloc((无效**)及dev_b,大小);
cudaMalloc((无效**)及dev_c,大小); 1 = 5,B = 6; cudaMemcpy(DEV_A,&安培;一,的sizeof(INT),cudaMemcpyHostToDevice);
cudaMemcpy(dev_b,和b,sizeof的(INT),cudaMemcpyHostToDevice); 此外<<< 1,1>>>(DEV_A,dev_b,dev_c);
cudaMemcpy(和C,dev_c,大小,cudaMemcpyDeviceToHost); cudaFree(安培; DEV_A);
cudaFree(安培; dev_b);
cudaFree(安培; dev_c); 的printf(%d个\\ N,C);
返回0;
}
这是我如何编译它
$ NVCC -o测试test.cu
下面是我的输出
1
下面是DEVICEQUERY的输出
./ DEVICEQUERY开始...CUDA设备查询(运行时API)版本(CUDART静态链接)检测1 CUDA功能的设备(S)设备0的GeForce 8400 GS
CUDA驱动程序版本/运行版本6.5 / 6.5
CUDA能力主要/次要版本号:1.1
全球内存总量:511兆字节(536150016字节)
(1)多处理器,(8)个CUDA Cores / MP:8个CUDA Cores
GPU时钟频率:1350兆赫(GHz的1.35)
内存时钟速度:400兆赫
内存总线宽度:64位
最大纹理外形尺寸(X,Y,Z)1D =(8192),2D =(65536,32768),3D =(2048,2048,2048)
最大分层1D纹理大小,(NUM)层1D =(8192),512层
最大分层2D纹理大小,(NUM)层2D =(8192,8192),512层
恒内存总量:65536字节
每块共享内存总量:16384字节
总数每块可用的寄存器:8192
经尺寸:32
每个多处理器的最大线程数:768
每块的最大线程数:512
一个线程块的最大尺寸大小(X,Y,Z):(512,512,64)
网格尺寸最大尺寸大小(X,Y,Z):(65535,65535,1)
最大内存间距:2147483647字节
纹理对齐:256字节
并发拷贝和内核执行:没有0拷贝引擎(S)
运行在内核时限:是
集成GPU共享主机内存:否
支持主页锁定内存映射:是
对于表面对齐要求:是
设备具有ECC支持:禁用
设备支持统一寻址(UVA):否
设备的PCI总线ID / PCI位置ID:1/0
计算方式:
<默认(多主机线程可以使用:: cudaSetDevice()同时与设备)GT;DEVICEQUERY,CUDA驱动程序= CUDART,CUDA驱动程序版本= 6.5,CUDA运行时版本= 6.5,NumDevs = 1,器件0 =的GeForce 8400 GS
结果= PASS
CUDA 6.5的编译默认情况下,CC2.0目标。你的GeForce 8400GS是cc1.1设备。所以,你的内核编译这样不会推出,而你没有proper CUDA误差在code检查(这会给你的问题的指示)。
如果您在编译时指定一个合适的拱开关,你的code应该正常运行:
NVCC -arch = sm_11 -o测试test.cu
将显示一条警告消息, sm_11
是德precated,但它仍应编译code正确。
Here is the program
#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
__global__ void Addition(int *a,int *b,int *c)
{
*c = *a + *b;
}
int main()
{
int a,b,c;
int *dev_a,*dev_b,*dev_c;
int size = sizeof(int);
cudaMalloc((void**)&dev_a, size);
cudaMalloc((void**)&dev_b, size);
cudaMalloc((void**)&dev_c, size);
a=5,b=6;
cudaMemcpy(dev_a, &a,sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(dev_b, &b,sizeof(int), cudaMemcpyHostToDevice);
Addition<<< 1,1 >>>(dev_a,dev_b,dev_c);
cudaMemcpy(&c, dev_c,size, cudaMemcpyDeviceToHost);
cudaFree(&dev_a);
cudaFree(&dev_b);
cudaFree(&dev_c);
printf("%d\n", c);
return 0;
}
Here is how i compiled it
$ nvcc -o test test.cu
Here is my output
1
Here is the output of deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce 8400 GS"
CUDA Driver Version / Runtime Version 6.5 / 6.5
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 511 MBytes (536150016 bytes)
( 1) Multiprocessors, ( 8) CUDA Cores/MP: 8 CUDA Cores
GPU Clock rate: 1350 MHz (1.35 GHz)
Memory Clock rate: 400 Mhz
Memory Bus Width: 64-bit
Maximum Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(8192), 512 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(8192, 8192), 512 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per multiprocessor: 768
Maximum number of threads per block: 512
Max dimension size of a thread block (x,y,z): (512, 512, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 1)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and kernel execution: No with 0 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce 8400 GS
Result = PASS
CUDA 6.5 compiles for a cc2.0 target by default. Your GeForce 8400GS is a cc1.1 device. So your kernels compiled that way will not launch, and you don't have proper cuda error checking in your code (which would have given you an indication of the problem).
If you specify a proper arch switch when compiling, your code should run properly:
nvcc -arch=sm_11 -o test test.cu
A warning message will be displayed that sm_11
is deprecated, but it should still compile your code properly.
这篇关于CUDA:两个数字的加入给错误的答案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!