在编译时在cuda内核中使用__ldg时出错

本文介绍了在编译时在cuda内核中使用__ldg时出错的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的目标是利用缓存在我的应用程序和搜索在线示例显示使用 __ ldg 应该是相对直接。

My goal is to take advantage of cache memory in my application and searching for online examples shows that using __ldg should be relatively straightforward.

NVIDIA 有 GPU 优化的文档：），它提供了一个简单的例子：

NVIDIA has documentation for GPU optimization (found here: https://www.olcf.ornl.gov/wp-content/uploads/2013/02/GPU_Opt_Fund-CW1.pdf) which provides the straightforward example:

__global__ void kernel ( int *output, int *input)
{
  ...
  output[idx] = __ldg( &input[idx] );
}

然而，当我尝试编译这个，我得到以下错误信息： p>

However when I try to compile this I get the following error message:

error: identifier "__ldg" is undefined.

搜索Google对此错误消息的解决方案不幸无济于事。任何建议什么可能是错误的这个简单的例子？

是否有一个编译器标志，我缺少？

Searching Google for a solution to this error message has been unfortunately unhelpful. Any suggestions what may be wrong with this simple example?
Is there a compiler flag that I am missing?

参考我的设备是计算能力3.5，我正在使用 CUDA 5.5 。

For reference my device is compute capability 3.5 and I am working with CUDA 5.5.

谢谢。

推荐答案

__ ldg（） 仅适用于计算能力3.5（或更高版本）架构。

The __ldg() intrinsic is only available on compute capability 3.5 (or newer) architecture.

这意味着：

必须在计算3.5（或更新版本）GPU上运行

必须针对计算3.5（或更新版本）的GPU进行编译

不能为旧体系结构编译。

这意味着：

这不起作用： nvcc -arch = sm_30 ...

这将工作： nvcc -arch = sm_35 ...

这不会工作： nvcc -gencode arch = compute30，code = sm_30 -gencode arch = compute_35，code = sm_35 ...

                        这篇关于在编译时在cuda内核中使用__ldg时出错的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！