本文介绍了数位最快的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能重复:结果
  How算在32位整数集的位数?

给一个无符号的字符类型值,计算总位it.What是最快的方法是什么?
我写了如下三个功能,什么是最好的办法,而且可以有人想出一个更快的吗?(我只是想非常快速的)

Give a unsigned char type value,count the total bits in it.What's the fastest way?I wrote three function as below,what's the best way,and can someone come up with a faster one?(I just want the extremely fast one)

const int tbl[] =
{
#define B2(n)   n, n+1, n+1, n+2
#define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2)
#define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2)
    B6(0), B6(1), B6(1), B6(2)
};

char naivecount (unsigned char val)
{
    char cnt = 0;
    while (val)
    {
        cnt += (val & 1);
        val = val >> 1;
    }
    return cnt;
}

inline tableLookUp(int val)
{
    assert(val >= 0 && val <= 255);
    return tbl[val];
}

int asmCount(int val)
{
    int res = 0;
    asm volatile("xor %0, %0\n\t"
            "begin:\n\t"
            "cmp $0x0, %1\n\t"
            "jle end\n\t"
            "movl %1, %%ecx\n\t"
            "and $0x1, %%ecx\n\t"
            "addl %%ecx, %0\n\t"
            "shrl %1\n\t"
            "jmp begin\n\t"
            "end:"
            : "=r"(res)
            : "r" (val));
    return res;
}

编辑:

我测试的所有方法中,最快的是使用 popcntl instruction.In没有指令,我将使用查表平台。

I have test all the method,the fastest one is to use the popcntl instruction.In platform without the instruction,I will use table look-up.

推荐答案

如果您想code将其手,试试这个:

If you want to code it by hand, try this:

#include <stdint.h>

int popcnt8(uint8_t x) {

    x = (x & 0x55) + (x >> 1 & 0x55);
    x = (x & 0x33) + (x >> 2 & 0x33);
    x = (x & 0x0f) + (x >> 4 & 0x0f);

    return x;
}

在x86上,这编译为(AT&amp; T公司的语法):

on x86, this compiles to (AT&T-syntax):

popcnt8:
    movl    %edi, %eax
    shrb    %dil
    andl    $85, %eax
    andl    $85, %edi
    addl    %eax, %edi
    movl    %edi, %eax
    shrb    $2, %dil
    andl    $51, %eax
    andl    $51, %edi
    addl    %eax, %edi
    movl    %edi, %eax
    shrb    $4, %dil
    andl    $15, %eax
    addl    %edi, %eax
    movzbl  %al, %eax
    ret

比较这对GCC什么用的内在生成:

Compare this to what gcc generates with the intrinsic:

#include <stdint.h>

int popcnt8_intrin(uint8_t x) { return __builtin_popcount(x); }

在86与SSE 4.2:

On x86 with SSE 4.2:

popcnt8_intrin:
movzbl  %dil, %eax
popcntl %eax, %eax
ret

这不是最佳的;铛产生:

which is not optimal; clang generates:

popcnt8_intrin:
    popcntl %edi,%eax
    ret

减少计算一(!)的指令。

reducing the calculation to one (!) instruction.

在86无4.2 SSE:

On x86 without SSE 4.2:

popcnt8_intrin:
subq    $8, %rsp
movzbl  %dil, %edi
call    __popcountdi2
addq    $8, %rsp
ret

GCC实质上这里调用它的库。不太理想。铛做一个好一点:

gcc essentially calls its library here. Not quite optimal. clang does a little better:

popcnt8_intrin:                         # @popcnt8_intrin
movl    %edi, %eax
shrl    %eax
andl    $85, %eax
subl    %eax, %edi
movl    %edi, %eax
andl    $858993459, %eax        # imm = 0x33333333
shrl    $2, %edi
andl    $858993459, %edi        # imm = 0x33333333
addl    %eax, %edi
movl    %edi, %eax
shrl    $4, %eax
addl    %edi, %eax
andl    $252645135, %eax        # imm = 0xF0F0F0F
imull   $16843009, %eax, %eax   # imm = 0x1010101
shrl    $24, %eax
ret

铛计算了整整32位的数字POPCNT。这不是最佳的IMHO

clang calculates popcnt for a whole 32 bit number. This is not optimal imho.

这篇关于数位最快的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-16 07:15