本文介绍了SSE2指令在使用C ++的内联汇编中不工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个函数,它使用SSE2添加一些值在一起,它应该添加lhs和rhs在一起,存储结果回lhs:

I have this function which uses SSE2 to add some values together it's supposed to add lhs and rhs together and store the result back into lhs:

template<typename T>
void simdAdd(T *lhs,T *rhs)
{
    asm volatile("movups %0,%%xmm0"::"m"(lhs));
    asm volatile("movups %0,%%xmm1"::"m"(rhs));

    switch(sizeof(T))
    {
        case sizeof(uint8_t):
        asm volatile("paddb %%xmm0,%%xmm1":);
        break;

        case sizeof(uint16_t):
        asm volatile("paddw %%xmm0,%%xmm1":);
        break;

        case sizeof(float):
        asm volatile("addps %%xmm0,%%xmm1":);
        break;

        case sizeof(double):
        asm volatile("addpd %%xmm0,%%xmm1":);
        break;

        default:
        std::cout<<"error"<<std::endl;
        break;
    }

    asm volatile("movups %%xmm0,%0":"=m"(lhs));
}

,我的代码使用如下函数:

and my code uses the function like this:

float *values=new float[4];
float *values2=new float[4];

values[0]=1.0f;
values[1]=2.0f;
values[2]=3.0f;
values[3]=4.0f;

values2[0]=1.0f;
values2[1]=2.0f;
values2[2]=3.0f;
values2[3]=4.0f;

simdAdd(values,values2);
for(uint32_t count=0;count<4;count++) std::cout<<values[count]<<std::endl;

但是这不工作,因为当代码运行时输出1,2,3,4 of 2,4,6,8

However this isn't working because when the code runs it outputs 1,2,3,4 instead of 2,4,6,8

推荐答案

我发现内联汇编支持在大多数现代编译器因为,实现只是简单的bug)。您通常最好使用,这些声明是看起来像C函数,但实际上是编译到一个特定的操作码。

I've found that inline assembly support isn't reliable in most modern compilers (as in, the implementations are just plain buggy). You are generally better off using compiler intrinsics which are declarations that look like C functions, but actually compile to a specific opcode.

Intrinsics允许您指定操作码的确切顺序,但将寄存器颜色留给编译器。它比在C变量和asm寄存器之间移动数据更可靠,这是内联汇编程序总是下降我的。它还允许编译器安排您的说明,如果它适用于。也就是说,在这种情况下你可以做

Intrinsics let you specify an exact sequence of opcodes, but leave the register coloring to the compiler. It's much more reliable than trying to move data between C variables and asm registers, which is where inline assemblers have always fallen down for me. It also lets the compiler schedule your instructions, which can provide better performance if it works around pipeline hazards. Ie, in this case you could do

void simdAdd(float *lhs,float *rhs)
{
   _mm_storeu_ps( lhs, _mm_add_ps(_mm_loadu_ps( lhs ), _mm_loadu_ps( rhs )) );
}

无论如何,你有两个问题:

In your case, anyway, you've two problems:


  1. 可怕的GCC内联汇编语法,极大地混淆了指针和值之间的差异。使用 * lhs * rhs 而不只是lhs和rhs;显然= m语法意味着隐含地使用指向我传递的东西而不是东西本身的指针。

  2. GCC有一个源,目标语法 - addps将结果存储在第二个参数中,因此您需要输出 xmm1 ,而不是 xmm0

  1. The terrible GCC inline assembly syntax which makes great confusion of the difference between pointers and values. Use *lhs and *rhs instead of just lhs and rhs; apparently the "=m" syntax means "implicitly use a pointer to this thing that I'm passing you instead of the thing itself."
  2. GCC has a source,destination syntax -- The addps stores its result in the second parameter, so you you need to output xmm1, not xmm0.

我已将(避免混乱这个答案,并证明它的工作原理)。

I've put a fixed example on codepad (to avoid cluttering up this answer, and to demonstrate that it works).

这篇关于SSE2指令在使用C ++的内联汇编中不工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 05:37