问题描述
我是GCC的C矢量扩展的新手。我正在考虑在我的项目中使用它们,但它们的实用性(有点)取决于将向量中的所有元素有效地移动到左侧一个位置并将结果存储在新向量中的能力。我怎样才能有效地做到这一点(例如以SIMD加速的方式)?
所以,基本上:
- OriginalVector = {1,2,3,4,5,6,7,8}
ShVector = {2,3,4,5,6 ,7,8,X}
(其中X可以是任何内容)。
背景信息(可以跳过) :这种转换的目的是处理矩阵,其中每一行都用向量表示。具体来说,它将使人们能够将ShiftedVector作为下方行的左上对角线,并在一次SIMD操作中比较所有值。如果有另一种方法来比较矢量与另一个矢量偏移一个元素,那也可以解决问题。但我假设没有,而且进行这种比较最有效的方法是将所有元素向左移动并进行比较1:1。
一般规定:
- 原始向量不得在此过程中受到伤害
- 如果我有要使用,但我不知道哪些或怎样
- 如果我丢失了向量中最左侧的元素并在右侧引入了乱码,那就好了-first
- 如果最有效的方法是原始矢量从第二个位置到结束+1的未对齐加载,但我仍然想知道如何对此进行最佳编码
看来这里的瓶颈是缺乏使用内在函数过程的一般信息。似乎人们使用程序集(我不熟练)或自动矢量化(在这里不起作用),所以矢量类型是最合理的选择。
谢谢!
,我发现了这一点:
typedef int v8si __attribute__((vector_size(32)));
v8si OriginalVector,masker,ShiftedVector;
OriginalVector = {1,2,3,4,5,6,7,8};
masker = {1,2,3,4,5,6,7,0};
ShiftedVector = __builtin_shuffle(OriginalVector,masker);
在没有任何理由的情况下,我在masker的末尾添加0(任何元素0-7会工作)。这样做只是将原始元素映射到masker中定义的位置,并将其保存到结果中。
尽管这是一个答案,但它可能不会成为最好的答案,因为我想有一种比创建一个新的矢量更好的方法,占用一个新矢量的寄存器,分配位置,将每个元素放在不合适的位置并放在另一个任意位置,并保存结果。
是的,我们可以将循环中的masker缓存或者其他东西,而不是每次都创建它,但是我想有一些简单的permute left指令可以滑动它结束...
I am new to GCC's C vector extensions. I am considering use of them in my project, but their utility is (somewhat) contingent on the ability to efficiently move all elements in a vector one position to the left and store the result in a new vector. How can I do this efficiently (such as in a SIMD-accelerated way)?
So, basically:
- OriginalVector = {1, 2, 3, 4, 5, 6, 7, 8}
- ShiftedVector = {2, 3, 4, 5, 6, 7, 8, X}(where X can be anything.)
Background information (you can skip this): The purpose of such a transformation is in dealing with matrices where each row is represented with vectors. Specifically, it would enable one to treat ShiftedVector as the upper-left diagonal for the row beneath, and compare all values in one SIMD operation. If there is another way to compare a vector with another vector offset by one element, that would solve the problem too. But I'm assuming not, and that the most efficient way to perform this comparison is to move all the elements leftward and do the comparison 1:1.
General stipulations:
- The original vector mustn't be harmed in the process
- It is fine if I have to use an x86 intrinsic function of some sort, but I don't know which or how
- It is fine if I lose the left-most element in the vector and introduce gibberish in the right-most
- It is fine if the most efficient method is an unaligned load of the original vector from its second position to end+1, but I still would like to know how to best code this
It seems the bottleneck here is the lack of general information on the process of using the intrinsics. It seems people are either using assembly (which I am no expert in) or auto-vectorization (which doesn't work well here), so vector types are the most logical choice.
Thanks!
Crawling around in the depths of the manual, I uncovered this bit of tomfoolery:
typedef int v8si __attribute__ ((vector_size (32)));
v8si OriginalVector, masker, ShiftedVector;
OriginalVector = {1, 2, 3, 4, 5, 6, 7, 8};
masker = {1,2,3,4,5,6,7,0};
ShiftedVector = __builtin_shuffle(OriginalVector, masker);
Where I put a 0 at the end of "masker" for no reason (any element 0-7 would work). What this does is just map the elements in the original to the positions defined in masker, and save them to the result.
But although this is an answer, it may not be the "best" answer, since I imagine there is a better way than creating a new vector, taking up a register with the new vector, assigning positions, taking each element out of place and putting it in another arbitrary place, and saving the result.
Yes, we can cache the masker outside the loop or something instead of creating it every time, but I imagine there's some simple "permute left" instruction somewhere which can just slide it over...
这篇关于GCC C矢量扩展:如何将矢量的内容向左移动一个元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!