本文介绍了如何在CUDA中执行原子写入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,无论写入是否在CUDA中是原子的,我都无法找到可靠的来源.例如在CUDA中全局内存写入是否被认为是原子的?触摸这个话题,但最后一句话表明,我们不是在谈论相同的原子概念.拥有代码:

First of all I cannot find reliable source whether the write is atomic in CUDA or not. For example Is global memory write considered atomic in CUDA? touches this subject but the last remark shows we are not talking about same atomic notion. Having the code:

global_mem[0] = pick_at_random_from(1, 2);
shared_mem[0] = pick_at_random_from(1, 2);

由成千上万个线程执行,原子"表示在两种情况下内容均为1或2,并保证其他任何内容都不会出现(如3).原子意味着完整性.

executed by gazillion of threads "atomic" means in both cases the content will be 1 or 2 and it is guaranteed nothing else can show up (like 3). Atomic means integrity.

但是据我了解,CUDA不能保证它,所以当我运行这段代码时,我可能会得到值3吗?如果确实如此,如何执行原子写入?有 atomicExch ,但这是一个过大的杀伤力,它的作用超出了需要.

But as I understand it, CUDA does not guarantee it, so when I run this code I can potentially get value 3? If it really the case, how to perform atomic write? There is atomicExch but it is an overkill -- it does more than it is needed.

我已经检查过的原子功能: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions

推荐答案

对于在CUDA中2个不同线程中的每个线程中的写操作,如果:

For a write operation in each of 2 different threads in CUDA, if:

  • 写入内容在同一位置(地址)
  • 该地址是自然对齐表示写入的大小
  • 两个线程之间的写入操作大小相同(大小分别为1、2、4或8个字节)

然后,考虑到已写入的数据类型大小,可以确保得到这两个线程写入的值之一,而不是其他任何值.只要通过单个SASS指令完成写操作就可以提供此功能.当前CUDA硬件提供了此处的正确性,不一定是CUDA遵循的编译器,CUDA编程模型和/或C ++标准.

then you are guaranteed to get one of the values written by those two threads, and not any other value, considering the data type size that was written. This is provided so long as the write is done by a single SASS instruction. The correctness here is provided by current CUDA hardware, not necessarily the compiler, the CUDA programming model, and/or the C++ standard to which CUDA adheres.

这可以直接扩展到满足上述条件的任意数量的线程.

This is directly extendable to any number of threads that meet the above conditions.

假设没有其他线程针对写入的位置做其他任何事情"(即,它们没有在该位置,任何重叠的位置或其他对齐方式中写入不同的大小量).

This assumes no other threads are doing "anything else" with respect to the written locations (i.e. they are not writing a different size quantity to that location, or any overlapping location, or of some other alignment).

除非程序员在操作上强制执行一些排序,否则实际值最终将在该位置最终是不确定的(除了它将是一个且只有一个写入值,而不会是其他任何值).

Which actual value will end up in that location is generally undefined (except that it will be one and only one of the written values, and not anything else) unless the programmer enforces some ordering on the operations.

在C/C ++中编写向量数量或结构时,应注意确保SASS代码中的基础写(存储)指令引用适当的大小.上面提到写操作时的注释是指SASS代码发出的写操作.一般来说,我不希望这种解释与使用POD数据类型的从C/C ++代码写入"之间有太大区别.但是结构可能会分解为多个较小的事务,在这种情况下,可以取消上述声明.尽管如此,在C/C ++中通过适当的编程实践(例如,谨慎使用向量类型),有可能确保在相关的情况下最多使用8个字节的写操作.

When writing vector quantities or structures in C/C++, care should be taken to ensure that the underlying write (store) instruction in SASS code references the appropriate size. The comments above when referring to write operations are referring to the writes as issued by the SASS code. Generally speaking, I don't expect much difference between that interpretation and "writes from C/C++ code" using POD data types. But structures could possibly be broken into multiple transactions of a smaller size, in which case the above statements would be abrogated. Nevertheless, it's possible with appropriate programming practices (e.g. careful use of vector types) in C/C++ to ensure that up to 8 byte writes will be used if relevant.

这篇关于如何在CUDA中执行原子写入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 00:05