本文介绍了如何确定内存是否对齐?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不熟悉使用SSE/SSE2指令优化代码的经验,到目前为止,我还没有走得很远.据我所知,常见的SSE优化功能如下所示:

I am new to optimizing code with SSE/SSE2 instructions and until now I have not gotten very far. To my knowledge a common SSE-optimized function would look like this:

void sse_func(const float* const ptr, int len){
    if( ptr is aligned )
    {
        for( ... ){
            // unroll loop by 4 or 2 elements
        }
        for( ....){
            // handle the rest
            // (non-optimized code)
        }
    } else {
        for( ....){
            // regular C code to handle non-aligned memory
        }
    }
}

但是,如何正确确定内存ptr所指向的位置是否按例如16个字节?我认为我必须为非对齐内存包括常规C代码路径,因为我无法确保传递给此函数的每个内存都将对齐.而且使用内部函数将未对齐的内存中的数据加载到SSE寄存器中似乎太慢了(甚至比常规C代码还慢).

However, how do I correctly determine if the memory ptr points to is aligned by e.g. 16 Bytes? I think I have to include the regular C code path for non-aligned memory as I cannot make sure that every memory passed to this function will be aligned. And using the intrinsics to load data from unaligned memory into the SSE registers seems to be horrible slow (Even slower than regular C code).

预先感谢您...

推荐答案

强制转换为long是保护自己免受当今int和指针大小不同的可能性最大的廉价方法.

casting to long is a cheap way to protect oneself against the most likely possibility of int and pointers being different sizes nowadays.

如下面的评论所指出的,如果您愿意包含标题,则有更好的解决方案...

As pointed out in the comments below, there are better solutions if you are willing to include a header...

指针p对齐在16位字节的边界iff ((unsigned long)p & 15) == 0上.

A pointer p is aligned on a 16-byte boundary iff ((unsigned long)p & 15) == 0.

这篇关于如何确定内存是否对齐?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-19 15:53