问题描述
AVX512CD指令系列为:VPCONFLICT,VPLZCNT和VPBROADCASTM.
The AVX512CD instruction families are: VPCONFLICT, VPLZCNT and VPBROADCASTM.
有哪些示例说明这些指令在向量化循环中很有用?如果答案包含标量循环及其向量化的对应物,将会很有帮助.
What are some examples that show these instruction being useful in vectorizing loops? It would be helpful if answers will include scalar loops and their vectorized counterparts.
谢谢!
推荐答案
直方图是CD指令可能有用的一个示例.对于标量代码直方图,仅是一个简单的循环,如下所示:
One example where the CD instructions might be useful is histogramming. For scalar code histogramming is just a simple loop like this:
load bin index
load bin count at index
increment bin count
store updated bin count at index
通常,您无法对直方图进行矢量化处理,因为您可能在一个矢量中不止一次拥有相同的bin索引-您可能会天真地尝试执行以下操作:
Normally you can't vectorize histogramming because you might have the same bin index more than once in a vector - you might naïvely try something like this:
load vector of N bin indices
perform gathered load using N bin indices to get N bin counts
increment N bin counts
store N updated bin counts using scattered store
但是,如果向量中的任何索引相同,则会出现冲突,并且由此导致的bin更新将是错误的.
but if any of the indices within a vector are the same then you get a conflict, and the resulting bin update will be incorrect.
因此,CD救援说明:
load vector of N bin indices
use CD instruction to test for duplicate indices
set mask for all unique indices
while mask not empty
perform masked gathered load using <N bin indices to get <N bin counts
increment <N bin counts
store <N updated bin counts using masked scattered store
remove non-masked indices and update mask
end
在实践中,该示例效率很低,并且不比标量代码好,但是还有其他一些计算量更大的示例,其中使用CD指令似乎是值得的.通常,这些将是模拟,其中数据元素将以不确定的方式进行更新.一个示例(来自 LAMMPS分子动力学模拟器)在杰弗斯(Jeffers)等人 的KNL图书.
In practice this example is quite inefficient and no better than scalar code, but there are other more compute-intensive examples where using the CD instructions seems to be worthwhile. Typically these will be simulations where the data elements are going to be updated in a non-deterministic fashion. One example (from the LAMMPS Molecular Dynamics Simulator) is referred to in the KNL book by Jeffers et al.
这篇关于冲突检测指令如何使循环矢量化变得更容易?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!