本文介绍了为什么比for循环快检查memcmp这么多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么 memcmp(A,B,大小)所以比快得多:

for(i = 0; i < nelements; i++) {
    if a[i] != b[i] return 0;
}
return 1;

时memcmp CPU的指令还是什么?它必须是pretty深,因为我使用了 memcmp 在循环大规模加速。

推荐答案

memcmp ,并且往往是在组装实施采取了一些具体的架构功能优势,它可以使的的比用C一个简单的循环速度更快。

memcmp, and is often implemented in assembly to take advantage of a number of architecture-specific features, which can make it much faster than a simple loop in C.

GCC支持 memcmp (以及大量的其他功能),为的。在GCC的一些版本/配置,以 memcmp 通话将被识别为 __ builtin_memcmp 。相反,散发出通话 memcmp 库函数,GCC会发出指令少数作为一个优化的直列版本的功能。

GCC supports memcmp (as well as a ton of other functions) as builtins. In some versions / configurations of GCC, a call to memcmp will be recognized as __builtin_memcmp. Instead of emitting a call to the memcmp library function, GCC will emit a handful of instructions to act as an optimized inline version of the function.

在x86上,这利用了 CMPSB 指令,该指令字节的字符串在一个存储位置比较到另一个。这再加上 REPE preFIX,所以比较字符串,直到他们不再是平等的,或计数用尽。 (究竟是什么 memcmp 一样)。

On x86, this leverages the use of the cmpsb instruction, which compares a string of bytes at one memory location to another. This is coupled with the repe prefix, so the strings are compared until they are no longer equal, or a count is exhausted. (Exactly what memcmp does).

由于以下code:

int test(const void* s1, const void* s2, int count)
{
    return memcmp(s1, s2, count) == 0;
}

gcc版本3.4.4 在Cygwin生成以下组件:

gcc version 3.4.4 on Cygwin generates the following assembly:

; (prologue)
mov     esi, [ebp+arg_0]    ; Move first pointer to esi
mov     edi, [ebp+arg_4]    ; Move second pointer to edi
mov     ecx, [ebp+arg_8]    ; Move length to ecx

cld                         ; Clear DF, the direction flag, so comparisons happen
                            ; at increasing addresses
cmp     ecx, ecx            ; Special case: If length parameter to memcmp is
                            ; zero, don't compare any bytes.
repe cmpsb                  ; Compare bytes at DS:ESI and ES:EDI, setting flags
                            ; Repeat this while equal ZF is set
setz    al                  ; Set al (return value) to 1 if ZF is still set
                            ; (all bytes were equal).
; (epilogue) 

参考:



高度优化的 memcmp 的版本在许多C标准库存在。这些通常会利用特殊结构的指令与大量数据并行工作。

Highly-optimized versions of memcmp exist in many C standard libraries. These will usually take advantage of architecture-specific instructions to work with lots of data in parallel.

glibc中,有 memcmp 的,可以采取下面的指令集扩展的优势:

In Glibc, there are versions of memcmp for x86_64 that can take advantage of the following instruction set extensions:


  • - 的

  • - 的

  • - 的

  • SSE2 - sysdeps/x86_64/memcmp.S
  • SSE4 - sysdeps/x86_64/multiarch/memcmp-sse4.S
  • SSSE3 - sysdeps/x86_64/multiarch/memcmp-ssse3.S

凉爽的部分是,glibc就检测(在运行时)的最新指令集你的CPU有,并执行其优化的版本。看到这个片段从:

The cool part is that glibc will detect (at run-time) the newest instruction set your CPU has, and execute the version optimized for it. See this snippet from sysdeps/x86_64/multiarch/memcmp.S:

ENTRY(memcmp)
    .type   memcmp, @gnu_indirect_function
    LOAD_RTLD_GLOBAL_RO_RDX
    HAS_CPU_FEATURE (SSSE3)
    jnz 2f
    leaq    __memcmp_sse2(%rip), %rax
    ret 

2:  HAS_CPU_FEATURE (SSE4_1)
    jz  3f  
    leaq    __memcmp_sse4_1(%rip), %rax
    ret 

3:  leaq    __memcmp_ssse3(%rip), %rax
    ret 

END(memcmp)

在Linux内核

的Linux似乎并不具有 memcmp 为x86_64的优化版本,但它确实为的memcpy ,在。请注意,是使用的替代的基础设施(的)不仅决定在运行时使用的版本,但实际上补丁本身仅在启动时曾经做出这个决定。

In the Linux kernel

Linux does not seem to have an optimized version of memcmp for x86_64, but it does for memcpy, in arch/x86/lib/memcpy_64.S. Note that is uses the alternatives infrastructure (arch/x86/kernel/alternative.c) for not only deciding at runtime which version to use, but actually patching itself to only make this decision once at boot-up.

这篇关于为什么比for循环快检查memcmp这么多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 13:43