问题描述
我已经遍历了 Intel Intrinsics ,并且每个函数都对整数或打包或未打包或扩展打包的float或double.
似乎应该在互联网上的某个地方回答这个问题,但我根本找不到答案.
那是什么包装物?
好吧,我一直在寻找相同问题的答案,但是也没有成功.所以我只能猜测.
英特尔已经在其MMX技术中引入了打包和标量指令.例如,他们引入了一个功能
__ m64 _mm_add_pi8(__m64 a,__m64 b)
当时没有扩展包装"之类的东西.唯一的数据类型是 __ m64
,所有操作都对整数有效.借助SSE,出现了128位寄存器和浮点数运算.但是,SSE2包括对在128位寄存器中执行的整数的MMX操作的超集.例如,
__ m128i _mm_add_epi8(__m128i a,__m128i b)
这是我们第一次看到"ep"(扩展包装的)功能名称的一部分.为什么要引入?我相信这是上面所列的MMX指令已采用名称 _mm_add_pi8
的问题的解决方案.SSE/AVX的接口使用C语言,其中函数名称没有多态性.
对于AVX,Intel选择了不同的策略,并在打开"_mm"后立即开始增加寄存器长度.字母,抄底:
__ m256i _mm256_add_epi8(__m256i a,__m256i b)__m512i _mm512_add_epi8(__ m512i a,__ m512i b)
为什么他们在这里选择"ep"?而不是"p"对于程序员来说是一个不相关的谜.实际上,它们似乎使用"p"表示.用于浮点和双打以及"ep"的操作.用于整数.
__ m128d _mm_add_pd(__m128d a,__m128d b);//"d":函数对double进行运算__m256 _mm256_add_ps(__m256 a,__m256 b);//"s":函数在浮点数上运行
这也许可以追溯到从MMX到SSE的过渡,其中"ep"代表引入用于整数操作(MMX不处理浮点数),并尝试使AVX助记符尽可能接近SSE.
因此,从程序员的角度来看,基本上,"ep"与"ep"之间没有区别.(扩展包装")和"p"表示(打包"),因为我们已经知道我们在代码中定位的寄存器长度.对于问题的下一部分,打开包装"术语标量"属于完全不同的概念类别.和打包" .对于特定的数据重新排列或混排(例如旋转或移位),这只是一个口语化的术语.
使用"epi"的原因以诸如 _mm256_unpackhi_epi16
之类的内在函数的名义,它是对16位整数元素的向量真正的向量(非标量)函数.请注意,此处解压"属于功能名称中描述其动作的部分(例如mul,add或permute),而"s"表示功能名称的一部分./"p"/"ep"(标量,打包,扩展打包)属于描述操作模式的部分("s"的标量,"p"或"ep"的向量).
(在两个XMM寄存器之间没有可操作的标量整数指令,但是在 moved eax,xmm0
的固有名称中确实出现了"si": _mm_cvtsi128_si32
.有一些相似的内在函数.)
I have been going through Intel Intrinsics and every function is working on integers or floats or double that are packed or unpacked or extended packed.
It seems like this question should be answered some where on the internet but I can't find the answer at all.
What is that packing thing?
Well, I've just been searching for the answer to the same question, and also with no success. So I can only be guessing.
Intel introduced packed and scalar instructions already in their MMX technology. For example, they introduced a function
__m64 _mm_add_pi8 (__m64 a, __m64 b)
At that time there was no such a thing as "extended packed". The only data type was __m64
and all operations worked on integers.With SSE there came 128-bit registers and operations on floating point numbers. However, SSE2 included a superset of MMX operations on integers performed in 128-bit registers. For example,
__m128i _mm_add_epi8 (__m128i a, __m128i b)
Here for the first time we see the "ep" (extended packed") part of the function name. Why it was introduced? I believe this was a solution to the problem of the name _mm_add_pi8
being already taken by the MMX instruction listed above. The interface of SSE/AVX is in the C language, where there's no polymorphism of function names.
With AVX, Intel chose a different strategy, and started to add the register length just after the opening "_mm" letters, c.f.:
__m256i _mm256_add_epi8 (__m256i a, __m256i b)
__m512i _mm512_add_epi8 (__m512i a, __m512i b)
Why they here chose "ep" and not "p" is a mystery, irrelevant for programmers. Actually, they seem to use "p" for operations on floats and doubles and "ep" for integers.
__m128d _mm_add_pd (__m128d a, __m128d b); // "d": function operates on doubles
__m256 _mm256_add_ps (__m256 a, __m256 b); // "s": function operates on floats
Perhaps this goes back to the transition from MMX to SSE, where "ep" was introduced for operations on integers (no floats were handled by MMX) and an attempt to make AVX mnemonics as close to the SSE ones as possible
Thus, basically, from the perspective of a programmer, there's no difference between "ep" ("extended packed") and "p" ("packed"), for we are already aware of the register length that we target in our code.
As for the next part of the question, "unpacking" belongs to a completely different category of notions than "scalar" and "packed". This is rather a colloquial term for a particular data rearrangement or shuffle, like rotation or shift.
The reason for using "epi" in the name of intrinsics like _mm256_unpackhi_epi16
is that it is a truly vector (not scalar) function on a vector of 16-bit integer elements. Notice that here "unpack" belongs to the part of the function name that describe its action (like mul, add, or permute), whereas "s" / "p" / "ep" (scalar, packed, extended packed) belong to the part describing the operation mode (scalar for "s", vector for "p" or "ep").
(There are no scalar-integer instructions that operate between two XMM registers, but "si" does appear in the intrinsic name for movd eax, xmm0
: _mm_cvtsi128_si32
. There are a few similar intrinsics.)
这篇关于打包和解包以及扩展打包数据是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!