问题描述
我是新的SSE说明,我试图从这个网站学习他们:
I am new to the SSE instructions and I was trying to learn them from this site:http://www.codeproject.com/Articles/4522/Introduction-to-SSE-Programming
我是在Ubuntu 10.10上使用GCC编译器与Intel Core i7 960 CPU
I am using the GCC compiler on Ubuntu 10.10 with an Intel Core i7 960 CPU
这是一个基于我尝试的文章的代码:
Here is a code based on the article which I attempted:
对于长度为ARRAY_SIZE的两个数组,它计算
For two arrays of length ARRAY_SIZE it calculates
fResult [i] = sqrt(fSource1 [i] * fSource1 [i ] + fSource2 [i] * fSource2 [i])+ 0.5
以下是代码
#include <iostream>
#include <iomanip>
#include <ctime>
#include <stdlib.h>
#include <xmmintrin.h> // Contain the SSE compiler intrinsics
#include <malloc.h>
void myssefunction(
float* pArray1, // [in] first source array
float* pArray2, // [in] second source array
float* pResult, // [out] result array
int nSize) // [in] size of all arrays
{
int nLoop = nSize/ 4;
__m128 m1, m2, m3, m4;
__m128* pSrc1 = (__m128*) pArray1;
__m128* pSrc2 = (__m128*) pArray2;
__m128* pDest = (__m128*) pResult;
__m128 m0_5 = _mm_set_ps1(0.5f); // m0_5[0, 1, 2, 3] = 0.5
for ( int i = 0; i < nLoop; i++ )
{
m1 = _mm_mul_ps(*pSrc1, *pSrc1); // m1 = *pSrc1 * *pSrc1
m2 = _mm_mul_ps(*pSrc2, *pSrc2); // m2 = *pSrc2 * *pSrc2
m3 = _mm_add_ps(m1, m2); // m3 = m1 + m2
m4 = _mm_sqrt_ps(m3); // m4 = sqrt(m3)
*pDest = _mm_add_ps(m4, m0_5); // *pDest = m4 + 0.5
pSrc1++;
pSrc2++;
pDest++;
}
}
int main(int argc, char *argv[])
{
int ARRAY_SIZE = atoi(argv[1]);
float* m_fArray1 = (float*) _aligned_malloc(ARRAY_SIZE * sizeof(float), 16);
float* m_fArray2 = (float*) _aligned_malloc(ARRAY_SIZE * sizeof(float), 16);
float* m_fArray3 = (float*) _aligned_malloc(ARRAY_SIZE * sizeof(float), 16);
for (int i = 0; i < ARRAY_SIZE; ++i)
{
m_fArray1[i] = ((float)rand())/RAND_MAX;
m_fArray2[i] = ((float)rand())/RAND_MAX;
}
myssefunction(m_fArray1 , m_fArray2 , m_fArray3, ARRAY_SIZE);
_aligned_free(m_fArray1);
_aligned_free(m_fArray2);
_aligned_free(m_fArray3);
return 0;
}
我收到以下压缩错误
[Programming/SSE]$ g++ -g -Wall -msse sseintro.cpp
sseintro.cpp: In function ‘int main(int, char**)’:
sseintro.cpp:41: error: ‘_aligned_malloc’ was not declared in this scope
sseintro.cpp:53: error: ‘_aligned_free’ was not declared in this scope
[Programming/SSE]$
我在哪里搞乱?我缺少一些头文件?我似乎包含了所有相关的内容。
Where am I messing up? Am I missing some header files? I seem to have included all the relevant ones.
推荐答案
和是Microsoft主题。使用或在Linux上等。对于Mac OS X,您可以使用malloc,因为它总是16字节对齐。对于可移植SSE代码,您通常希望实现用于对齐内存分配的封装函数,例如
_aligned_malloc and _aligned_free are Microsoft-isms. Use posix_memalign or memalign on Linux et al. For Mac OS X you can just use malloc, as it is always 16 byte aligned. For portable SSE code you generally want to implement wrapper functions for aligned memory allocations, e.g.
void * malloc_simd(const size_t size)
{
#if defined WIN32 // WIN32
return _aligned_malloc(size, 16);
#elif defined __linux__ // Linux
return memalign(16, size);
#elif defined __MACH__ // Mac OS X
return malloc(size);
#else // other (use valloc for page-aligned memory)
return valloc(size);
#endif
}
实施 free_simd
作为练习留给读者。
Implementation of free_simd
is left as an exercise for the reader.
这篇关于使用SSE内在函数编译一个简单的c ++程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!