问题描述
我目前正在尝试为我的图书馆创建高度优化,可重用的功能。例如,我通过以下方式编写函数是2的幂:
I am currently experimenting with the creation of highly-optimized, reusable functions for a library of mine. For instance, I write the function "is power of 2" the following way:
template<class IntType>
inline bool is_power_of_two( const IntType x )
{
return (x != 0) && ((x & (x - 1)) == 0);
}
这是一种可移植的,低维护的实现,它是嵌入式C ++模板。此代码由VC ++ 2008编译为带有分支的以下代码:
This is a portable, low-maintenance implementation as an inline C++ template. This code is compiled by VC++ 2008 to the following code with branches:
is_power_of_two PROC
test rcx, rcx
je SHORT $LN3@is_power_o
lea rax, QWORD PTR [rcx-1]
test rax, rcx
jne SHORT $LN3@is_power_o
mov al, 1
ret 0
$LN3@is_power_o:
xor al, al
ret 0
is_power_of_two ENDP
我也从这里找到实现:,将在x64的汇编中进行如下编码:
I found also the implementation from here: "The bit twiddler", which would be coded in assembly for x64 as follows:
is_power_of_two_fast PROC
test rcx, rcx
je SHORT NotAPowerOfTwo
lea rax, [rcx-1]
and rax, rcx
neg rax
sbb rax, rax
inc rax
ret
NotAPowerOfTwo:
xor rax, rax
ret
is_power_of_two_fast ENDP
我测试了两个子程序wri tten在汇编模块(.asm文件)中与C ++分开,第二个则快20%!
I tested both subroutines written separately from C++ in an assembly module (.asm file), and the second one works about 20% faster!
不过,函数调用的开销是相当大的:如果我将第二个程序集实现 is_power_of_two_fast与模板函数的内联版本进行了比较,尽管分支,后者却更快!
Yet the overhead of the function call is considerable: if I compare the second assembly implementation "is_power_of_two_fast" to the inline'd-version of the template function, the latter is faster despite branches!
不幸的是,x64的新约定指定不允许任何内联汇编。相反,应该使用本征函数。
Unfortunately, the new conventions for x64 specify that no inline assembly is allowed. One should instead use "intrinsic functions".
现在的问题:我可以将更快版本的 is_power_of_two_fast实现为自定义内在函数或类似的函数,以便可以内联使用吗?或者,是否可以强制编译器生成该函数的低分支版本?
Now the question: can I implement the faster version "is_power_of_two_fast" as a custom intrinsic function or something similar, so that it can be used inline? Or alternatively, is it possible to somehow force the compiler to produce the low-branch version of the function?
推荐答案
甚至VC 2005能够使用sbb指令生成代码。
Even VC 2005 is capable of producing code with sbb instruction.
用于C代码
bool __declspec(noinline) IsPowOf2(unsigned int a)
{
return (a>=1)&((a&(a-1))<1);
}
编译为以下内容
00401000 lea eax,[ecx-1]
00401003 and eax,ecx
00401005 cmp eax,1
00401008 sbb eax,eax
0040100A neg eax
0040100C cmp ecx,1
0040100F sbb ecx,ecx
00401011 add ecx,1
00401014 and eax,ecx
00401016 ret
这篇关于“自定义固有” x64的功能,而不是内联汇编可能吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!