本文介绍了SSE:如果不为零则为倒数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用 SSE 指令取浮点数的倒数(倒数),但仅适用于非零值?

How can I take the reciprocal (inverse) of floats with SSE instructions, but only for non-zero values?

背景如下:

我想对一组向量进行归一化,以便每个维度都具有相同的平均值.在 C 中,这可以编码为:

I want to normalize an array of vectors so that each dimension has the same average. In C this can be coded as:

float vectors[num * dim]; // input data

// step 1. compute the sum on each dimension
float norm[dim];
memset(norm, 0, dim * sizeof(float));
for(int i = 0; i < num; i++) for(int j = 0; j < dims; j++)
    norm[j] += vectors[i * dims + j];
// step 2. convert sums to reciprocal of average
for(int j = 0; j < dims; j++) if(norm[j]) norm[j] = float(num) / norm[j];
// step 3. normalize the data
for(int i = 0; i < num; i++) for(int j = 0; j < dims; j++)
    vectors[i * dims + j] *= norm[j];

现在出于性能原因,我想使用 SSE 内部函数来执行此操作.Setp 1 和第 3 步很简单,但我停留在第 2 步.我似乎没有找到任何代码示例或明显的 SSE 指令来取值的倒数如果它不为零.对于除法,_mm_rcp_ps 可以解决问题,并且可能将其与条件移动相结合,但是如何获得指示哪个分量为零的掩码?

Now for performance reasons, I want to do this using SSE intinsics. Setp 1 et step 3 are easy, but I'm stuck at step 2. I don't seem to find any code sample or obvious SSE instruction to take the recirpocal of a value if it is not zero.For the division, _mm_rcp_ps does the trick, and maybe combine it with a conditional move, but how to get a mask indicating which component is zero?

我不需要上述算法的代码,只需要非零则求逆"函数:

I don't need the code to the algorithm described above, just the "inverse if not zero" function:

__m128 rcp_nz_ps(__m128 input) {
    // ????
}

谢谢!

推荐答案

__m128 rcp_nz_ps(__m128 input) {
    __m128 mask = _mm_cmpeq_ps(_mm_set1_ps(0.0), input);
    __m128 recip = _mm_rcp_ps(input);
    return _mm_andnot_ps(mask, recip);
}

如果输入为零,mask 的每个通道设置为 b111...11,否则设置为 b000...00.And-not 用该掩码替换对应于零输入的倒数元素为零.

Each lane of mask is set to either b111...11 if the input is zero, and b000...00 otherwise. And-not with that mask replaces elements of the reciprocal corresponding to a zero input with zero.

这篇关于SSE:如果不为零则为倒数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 00:10