问题描述
在当今大多数nVIDIA GPU上,共享内存(OpenCL术语为本地内存")的大小仅为16 KiB.
我有一个应用程序,需要在其中创建一个具有10,000个整数的数组.因此我需要容纳10,000个整数= 10,000 * 4b = 40kb的内存量.
The size of the shared memory ("local memory" in OpenCL terms) is only 16 KiB on most nVIDIA GPUs of today.
I have an application in which I need to create an array that has 10,000 integers. so the amount of memory I will need to fit 10,000 integers = 10,000 * 4b = 40kb.
- 我该如何解决?
- 是否有GPU拥有超过16 KiB的共享内存?
推荐答案
将共享内存视为显式管理的缓存.您将需要将数组存储在全局内存中,并根据需要通过多次传递或其他一些方案来最大程度地减少加载/存储到全局内存中或从全局内存中存储的数量,从而将其部分缓存在共享内存中.
Think of shared memory as explicitly managed cache. You will need to store your array in global memory and cache parts of it in shared memory as needed, either by making multiple passes or some other scheme which minimises the number of loads and stores to/from global memory.
如何执行此操作将取决于您的算法-如果您可以提供一些详细信息,说明您正在尝试实现的确切内容,则可能会得到一些更具体的建议.
How you implement this will depend on your algorithm - if you can give some details of what it is exactly that you are trying to implement you may get some more concrete suggestions.
最后一点-请注意,共享内存在一个块中的所有线程之间都是共享-每个线程的内存少于16 kb,除非您拥有一个对所有线程都通用的数据结构线程中的一个线程.
One last point - be aware that shared memory is shared between all threads in a block - you have way less than 16 kb per thread, unless you have a single data structure which is common to all threads in a block.
这篇关于GPU共享内存的大小非常小-我该怎么办?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!