本文介绍了“自定义固有” x64的功能,而不是内联汇编可能吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试为我的图书馆创建高度优化,可重用的功能。例如,我通过以下方式编写函数是2的幂:

I am currently experimenting with the creation of highly-optimized, reusable functions for a library of mine. For instance, I write the function "is power of 2" the following way:

template<class IntType>  
inline bool is_power_of_two( const IntType x )
{
    return (x != 0) && ((x & (x - 1)) == 0);
}

这是一种可移植的,低维护的实现,它是嵌入式C ++模板。此代码由VC ++ 2008编译为带有分支的以下代码:

This is a portable, low-maintenance implementation as an inline C++ template. This code is compiled by VC++ 2008 to the following code with branches:

is_power_of_two PROC
    test    rcx, rcx
    je  SHORT $LN3@is_power_o
    lea rax, QWORD PTR [rcx-1]
    test    rax, rcx
    jne SHORT $LN3@is_power_o
    mov al, 1
    ret 0
$LN3@is_power_o:
    xor al, al
    ret 0
is_power_of_two ENDP

我也从这里找到实现:,将在x64的汇编中进行如下编码:

I found also the implementation from here: "The bit twiddler", which would be coded in assembly for x64 as follows:

is_power_of_two_fast PROC
    test rcx, rcx
    je  SHORT NotAPowerOfTwo
    lea rax, [rcx-1]
    and rax, rcx
    neg rax
    sbb rax, rax
    inc rax
    ret
NotAPowerOfTwo:
    xor rax, rax
    ret
is_power_of_two_fast ENDP

我测试了两个子程序wri tten在汇编模块(.asm文件)中与C ++分开,第二个则快20%!

I tested both subroutines written separately from C++ in an assembly module (.asm file), and the second one works about 20% faster!

不过,函数调用的开销是相当大的:如果我将第二个程序集实现 is_power_of_two_fast与模板函数的内联版本进行了比较,尽管分支,后者却更快!

Yet the overhead of the function call is considerable: if I compare the second assembly implementation "is_power_of_two_fast" to the inline'd-version of the template function, the latter is faster despite branches!

不幸的是,x64的新约定指定不允许任何内联汇编。相反,应该使用本征函数。

Unfortunately, the new conventions for x64 specify that no inline assembly is allowed. One should instead use "intrinsic functions".

现在的问题:我可以将更快版本的 is_power_of_two_fast实现为自定义内在函数或类似的函数,以便可以内联使用吗?或者,是否可以强制编译器生成该函数的低分支版本?

Now the question: can I implement the faster version "is_power_of_two_fast" as a custom intrinsic function or something similar, so that it can be used inline? Or alternatively, is it possible to somehow force the compiler to produce the low-branch version of the function?

推荐答案

甚至VC 2005能够使用sbb指令生成代码。

Even VC 2005 is capable of producing code with sbb instruction.

用于C代码

bool __declspec(noinline) IsPowOf2(unsigned int a)
{
    return (a>=1)&((a&(a-1))<1);
}

编译为以下内容

00401000  lea         eax,[ecx-1] 
00401003  and         eax,ecx 
00401005  cmp         eax,1 
00401008  sbb         eax,eax 
0040100A  neg         eax  
0040100C  cmp         ecx,1 
0040100F  sbb         ecx,ecx 
00401011  add         ecx,1 
00401014  and         eax,ecx 
00401016  ret          

这篇关于“自定义固有” x64的功能,而不是内联汇编可能吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-27 16:54