本文介绍了冲突检测指令如何使循环矢量化变得更容易?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

AVX512CD指令系列为:VPCONFLICT,VPLZCNT和VPBROADCASTM.

The AVX512CD instruction families are: VPCONFLICT, VPLZCNT and VPBROADCASTM.

有关这些说明的Wikipedia部分说:

有哪些示例说明这些指令在向量化循环中很有用?如果答案包含标量循环及其向量化的对应物,将会很有帮助.

What are some examples that show these instruction being useful in vectorizing loops? It would be helpful if answers will include scalar loops and their vectorized counterparts.

谢谢!

推荐答案

直方图是CD指令可能有用的一个示例.对于标量代码直方图,仅是一个简单的循环,如下所示:

One example where the CD instructions might be useful is histogramming. For scalar code histogramming is just a simple loop like this:

load bin index
load bin count at index
increment bin count
store updated bin count at index

通常,您无法对直方图进行矢量化处理,因为您可能在一个矢量中不止一次拥有相同的bin索引-您可能会天真地尝试执行以下操作:

Normally you can't vectorize histogramming because you might have the same bin index more than once in a vector - you might naïvely try something like this:

load vector of N bin indices
perform gathered load using N bin indices to get N bin counts
increment N bin counts
store N updated bin counts using scattered store

但是,如果向量中的任何索引相同,则会出现冲突,并且由此导致的bin更新将是错误的.

but if any of the indices within a vector are the same then you get a conflict, and the resulting bin update will be incorrect.

因此,CD救援说明:

load vector of N bin indices
use CD instruction to test for duplicate indices
set mask for all unique indices
while mask not empty
    perform masked gathered load using <N bin indices to get <N bin counts
    increment <N bin counts
    store <N updated bin counts using masked scattered store
    remove non-masked indices and update mask
end

在实践中,该示例效率很低,并且不比标量代码好,但是还有其他一些计算量更大的示例,其中使用CD指令似乎是值得的.通常,这些将是模拟,其中数据元素将以不确定的方式进行更新.一个示例(来自 LAMMPS分子动力学模拟器)在杰弗斯(Jeffers)等人 的KNL图书.

In practice this example is quite inefficient and no better than scalar code, but there are other more compute-intensive examples where using the CD instructions seems to be worthwhile. Typically these will be simulations where the data elements are going to be updated in a non-deterministic fashion. One example (from the LAMMPS Molecular Dynamics Simulator) is referred to in the KNL book by Jeffers et al.

这篇关于冲突检测指令如何使循环矢量化变得更容易?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 14:48