我试图计算运行单个asm指令所需的cpu周期数。为此,我创建了以下函数:

measure_register_op:
    # Calculate time of required for movl operation

    # function setup
    pushl %ebp
    movl %esp, %ebp
    pushl %ebx
    pushl %edi

    xor %edi, %edi

    # first time measurement
    xorl %eax, %eax
    cpuid               # sync of threads
    rdtsc               # result in edx:eax

    # we are measuring instuction below
    movl %eax, %edi

    # second time measurement
    cpuid               # sync of threads
    rdtsc               # result in edx:eax

    # time difference
    sub %eax, %edi

    # move to EAX. Value of EAX is what function returns
    movl %edi, %eax

    # End of function
    popl %edi
    popl %ebx
    mov %ebp, %esp
    popl %ebp

    ret

我在*.c文件中使用它:
extern unsigned int measure_register_op();

int main(void)
{

    for (int a = 0; a < 10; a++)
    {
        printf("Instruction took %u cycles \n", measure_register_op());
    }

    return 0;
}

问题是:我看到的值太大了。我现在正变得3684414156。这里会出什么问题?
编辑:
从ebx改为edi,但结果仍然相似。它必须是RDTSC本身的东西。在调试器中,我可以看到第二个测量结果是0x7f61e078和第一个0x42999940,经过减法运算后,结果仍然是1019758392
编辑:
这是我的生成文件。也许我编译的不对:
compile: measurement.s measurement.c
    gcc -g measurement.s measurement.c -o ./build/measurement -m32

编辑:
下面是我看到的确切结果:
Instruction took 4294966680 cycles
Instruction took 4294966696 cycles
Instruction took 4294966688 cycles
Instruction took 4294966672 cycles
Instruction took 4294966680 cycles
Instruction took 4294966688 cycles
Instruction took 4294966688 cycles
Instruction took 4294966696 cycles
Instruction took 4294966688 cycles
Instruction took 4294966680 cycles

最佳答案

在不影响开始时间的更新版本中(bug@r.指出):
sub %eax, %edi正在计算。这是一个负数,即一个略低于2^32的巨大无符号数。如果要使用start - end,请在调试时习惯于将其输出解释回位模式。
你想要%u
顺便说一句,使用end - start;它比lfence更有效。它保证在英特尔上序列化指令执行(不象完全序列化指令那样刷新存储缓冲区)。它在AMD CPUs with Spectre mitigation enabled上也很安全。
有关序列化rdtsc和/或rdtscp的一些不同方法,请参见http://akaros.cs.berkeley.edu/lxr/akaros/kern/arch/x86/rdtsc_test.c
有关rdtsc的更多信息,请参见Get CPU cycle count?,特别是它不计算核心时钟周期,只计算参考周期。因此,idle/turbo将影响您的结果。
而且,一个指令的成本不是一维的。用这样的rdtsc给一条指令计时并不是特别有用。有关如何测量单个指令的吞吐量/延迟/UOP的详细信息,请参阅RDTSCP in NASM always returns the same value
rdtsc对于计时整个循环或比cpu的ooo执行窗口更长的指令序列非常有用。

关于c - 使用RDTSC测量时差-结果太大,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/56200229/

10-11 16:41