本文介绍了内存和CPUID / RDTSC:rdtscp,RDTSC之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们正在尝试使用的TSC性能监视和我们我们要prevent指令重新排序。

Assume we're trying to use the tsc for performance monitoring and we we want to prevent instruction reordering.

这是我们的选择:

1 rdtscp 是一个序列化的呼叫。它prevents重新排序随时待命,以rdtscp。

1: rdtscp is a serializing call. It prevents reordering around the call to rdtscp.

__asm__ __volatile__("rdtscp; "         // serializing read of tsc
                     "shl $32,%%rdx; "  // shift higher 32 bits stored in rdx up
                     "or %%rdx,%%rax"   // and or onto rax
                     : "=a"(tsc)        // output to tsc variable
                     :
                     : "%rcx", "%rdx"); // rcx and rdx are clobbered

然而, rdtscp 仅适用于新的CPU可用。因此,在这种情况下,我们必须使用 RDTSC 。但 RDTSC 非序列化,因此使用它自己不会prevent从重新排序,将CPU。

However, rdtscp is only available on newer CPUs. So in this case we have to use rdtsc. But rdtsc is non-serializing, so using it alone will not prevent the CPU from reordering it.

因此​​,我们可以使用这两个选项,以prevent重新排序:

So we can use either of these two options to prevent reordering:

2 这是 CPUID ,然后调用 RDTSC CPUID 是一个序列化的呼叫。

2: This is a call to cpuid and then rdtsc. cpuid is a serializing call.

volatile int dont_remove __attribute__((unused)); // volatile to stop optimizing
unsigned tmp;
__cpuid(0, tmp, tmp, tmp, tmp);                   // cpuid is a serialising call
dont_remove = tmp;                                // prevent optimizing out cpuid

__asm__ __volatile__("rdtsc; "          // read of tsc
                     "shl $32,%%rdx; "  // shift higher 32 bits stored in rdx up
                     "or %%rdx,%%rax"   // and or onto rax
                     : "=a"(tsc)        // output to tsc
                     :
                     : "%rcx", "%rdx"); // rcx and rdx are clobbered

3 这是 RDTSC 调用使用在撞列表内存 ,其中prevents重新排序

3: This is a call to rdtsc with memory in the clobber list, which prevents reordering

__asm__ __volatile__("rdtsc; "          // read of tsc
                     "shl $32,%%rdx; "  // shift higher 32 bits stored in rdx up
                     "or %%rdx,%%rax"   // and or onto rax
                     : "=a"(tsc)        // output to tsc
                     :
                     : "%rcx", "%rdx", "memory"); // rcx and rdx are clobbered
                                                  // memory to prevent reordering

我对第三选项的理解如下:

My understanding for the 3rd option is as follows:

进行调用 __挥发性__ $ P $从删除ASM或移动它在任何指令,可能需要的结果(或更改输入)的ASM的pvents优化。然而,它仍然可以相对于不相干的操作移动。因此, __ __波动是远远不够的。

Making the call __volatile__ prevents the optimizer from removing the asm or moving it across any instructions that could need the results (or change the inputs) of the asm. However it could still move it with respect to unrelated operations. So __volatile__ is not enough.

告诉编译器内存被破坏的::记忆)。在记忆撞意味着GCC不能对保持不变跨越ASM,内存内容的任何假设,因此它周围不会重新排序。

Tell the compiler memory is being clobbered: : "memory"). The "memory" clobber means that GCC cannot make any assumptions about memory contents remaining the same across the asm, and thus will not reorder around it.

所以我的问题是:


  • 1:我的 __ 记忆正确
  • __挥发性的理解?
  • 2:做第二次两个调用​​做同样的事情

  • 3:使用记忆看起来比使用其它串行化指令的要简单得多。为什么会有人使用第三选项可在第二个选项?

  • 1: Is my understanding of __volatile__ and "memory" correct?
  • 2: Do the second two calls do the same thing?
  • 3: Using "memory" looks much simpler than using another serializing instruction. Why would anyone use the 3rd option over the 2nd option?

推荐答案

作为的评论中提到,有一个的编译屏障之间的和的处理器障碍的区别。 挥发性内存 ASM声明充当编译屏障,但是处理器仍然可以自由地指令重新排序。

As mentioned in a comment, there's a difference between a compiler barrier and a processor barrier. volatile and memory in the asm statement act as a compiler barrier, but the processor is still free to reorder instructions.

处理器障碍是必须明确给出特别说明,如 rdtscp,CPUID ,存储栅栏指令( MFENCE,lfence, ...)等。

Processor barrier are special instructions that must be explicitly given, e.g. rdtscp, cpuid, memory fence instructions (mfence, lfence, ...) etc.

顺便说一句,在使用 CPUID 像以前那样的屏障 RDTSC 是常见的,它也可以是非常糟糕从性能的角度来看,因为虚拟机的平台往往陷阱和仿效的 CPUID 指令,以强加在集群一套通用的跨多台计算机CPU功能(以确保现场移民工程)。因此,最好使用的存储栅栏指令之一。

As an aside, while using cpuid as a barrier before rdtsc is common, it can also be very bad from a performance perspective, since virtual machine platforms often trap and emulate the cpuid instruction in order to impose a common set of CPU features across multiple machines in a cluster (to ensure that live migration works). Thus it's better to use one of the memory fence instructions.

Linux内核使用 MFENCE; RDTSC 英特尔; RDTSC lfence在AMD平台。如果你不希望这些区分打扰, MFENCE; RDTSC 适用于双方虽然它为 MFENCE 比一个更强有力的屏障 lfence

The Linux kernel uses mfence;rdtsc on AMD platforms and lfence;rdtsc on Intel. If you don't want to bother with distinguishing between these, mfence;rdtsc works on both although it's slightly slower as mfence is a stronger barrier than lfence.

这篇关于内存和CPUID / RDTSC:rdtscp,RDTSC之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-05 09:58