本文介绍了有效地实现地板/欧几里德整数除法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

地板事业部是当结果总是难倒向下(朝-∞),不计入0:

Floored division is when the result is always floored down (towards −∞), not towards 0:

是否有可能有效地实现用C或地板欧几里德整数除法/ C ++?

Is it possible to efficiently implement floored or euclidean integer division in C/C++?

(显而易见的解决办法是检查分红的符号)

(the obvious solution is to check the dividend's sign)

推荐答案

我在五年后重新审视这个问题,因为这是有关对我来说太。我做了两个纯-C版本和两个内联汇编版本的x86-64,结果一些性能测量可能是有趣的。

I'm revisiting this question five years later, as this is relevant for me too. I did some performance measurements on two pure-C versions and two inline-assembly versions for x86-64, and the results may be interesting.

地板事业部的测试变体是:

The tested variants of floored division are:


  1. 我一直在使用了一段时间的实施情况;

  2. 在高于该psented了$ P $的轻微变异只使用一个师;

  3. 的previous之一,但另一方面,在实施的直列组装;和

  4. A CMOV 版本装配实现。

  1. The implementation I've been using for some time now;
  2. The slight variant on that presented above which only uses one division;
  3. The previous one, but hand-implemented in inline-assembly; and
  4. A CMOV version implemented in assembly.

以下是我的基准测试程序:

The following is my benchmark program:

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

#ifndef VARIANT
#define VARIANT 3
#endif

#if VARIANT == 0
#define floordiv(a, b) (((a) < 0)?((((a) + 1) / (b)) - 1):((a) / (b)))
#elif VARIANT == 1
#define floordiv(a, b) ((((a) < 0)?((a) - ((b) - 1)):(a)) / (b))
#elif VARIANT == 2
#define floordiv(a, b) ({                                   \
    int result;                                             \
    asm("test %%eax, %%eax; jns 1f; sub %1, %%eax;"         \
        "add $1, %%eax; 1: cltd; idivl %1;"                 \
        : "=a" (result)                                     \
        : "r" (b),                                          \
          "0" (a)                                           \
        : "rdx");                                           \
    result;})
#elif VARIANT == 3
#define floordiv(a, b) ({                                           \
    int result;                                                     \
    asm("mov %%eax, %%edx; sub %1, %%edx; add $1, %%edx;"           \
        "test %%eax, %%eax; cmovs %%edx, %%eax; cltd;"              \
        "idivl %1;"                                                 \
        : "=a" (result)                                             \
        : "r" (b),                                                  \
          "0" (a)                                                   \
        : "rdx");                                                   \
    result;})
#endif

double ntime(void)
{
    struct timeval tv;

    gettimeofday(&tv, NULL);
    return(tv.tv_sec + (((double)tv.tv_usec) / 1000000.0));
}

void timediv(int n, int *p, int *q, int *r)
{
    int i;

    for(i = 0; i < n; i++)
        r[i] = floordiv(p[i], q[i]);
}

int main(int argc, char **argv)
{
    int n, i, *q, *p, *r;
    double st;

    n = 10000000;
    p = malloc(sizeof(*p) * n);
    q = malloc(sizeof(*q) * n);
    r = malloc(sizeof(*r) * n);
    for(i = 0; i < n; i++) {
        p[i] = (rand() % 1000000) - 500000;
        q[i] = (rand() % 1000000) + 1;
    }

    st = ntime();
    for(i = 0; i < 100; i++)
        timediv(n, p, q, r);
    printf("%g\n", ntime() - st);
    return(0);
}

我编这与 GCC -march =本地-Ofast 使用GCC 4.9.2,结果,在我的酷睿i5-2400,如下。结果是相当重复性的运行,从运行 - 他们总是降落在同一顺序,至少

I compiled this with gcc -march=native -Ofast using GCC 4.9.2, and the results, on my Core i5-2400, were as follows. The results are fairly reproducible from run to run -- they always land in the same order, at least.


  • 0变7.21秒。

  • 变1 7.26秒。

  • 法2:6.73秒

  • 变3:4.32秒

所以 CMOV 实施打击别人出来的水,至少。令我惊讶的是,变种2由一个相当大幅超出执行其纯C版本(变种1)。我还以为编译器应该能够至少有效放出code作为我的。

So the CMOV implementation blows the others out of the water, at least. What surprises me is that variant 2 out-does its pure-C version (variant 1) by a fairly wide margin. I'd have thought the compiler should be able to emit code at least as efficient as mine.

下面是一些其他的平台,以进行比较:

Here are some other platforms, for comparison:

AMD的Athlon 64 X2 4200+,GCC 4.7.2:

AMD Athlon 64 X2 4200+, GCC 4.7.2:


  • 变0:26.33秒

  • 变式1:25.38秒

  • 法2:25.19秒

  • 变3:22.39秒

至强E3-1271 V3,GCC 4.9.2:

Xeon E3-1271 v3, GCC 4.9.2:


  • 变0普通:5.95秒。

  • 变式1:5.62秒

  • 法2 5.40秒。

  • 变3:3.44秒

这篇关于有效地实现地板/欧几里德整数除法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 10:26