问题描述
假设我们正在尝试使用的TSC性能监视和我们我们要prevent指令重新排序。
Assume we're trying to use the tsc for performance monitoring and we we want to prevent instruction reordering.
这是我们的选择:
1 rdtscp
是一个序列化的呼叫。它prevents重新排序随时待命,以rdtscp。
1: rdtscp
is a serializing call. It prevents reordering around the call to rdtscp.
__asm__ __volatile__("rdtscp; " // serializing read of tsc
"shl $32,%%rdx; " // shift higher 32 bits stored in rdx up
"or %%rdx,%%rax" // and or onto rax
: "=a"(tsc) // output to tsc variable
:
: "%rcx", "%rdx"); // rcx and rdx are clobbered
然而, rdtscp
仅适用于新的CPU可用。因此,在这种情况下,我们必须使用 RDTSC
。但 RDTSC
非序列化,因此使用它自己不会prevent从重新排序,将CPU。
However, rdtscp
is only available on newer CPUs. So in this case we have to use rdtsc
. But rdtsc
is non-serializing, so using it alone will not prevent the CPU from reordering it.
因此,我们可以使用这两个选项,以prevent重新排序:
So we can use either of these two options to prevent reordering:
2 这是 CPUID
,然后调用 RDTSC
。 CPUID
是一个序列化的呼叫。
2: This is a call to cpuid
and then rdtsc
. cpuid
is a serializing call.
volatile int dont_remove __attribute__((unused)); // volatile to stop optimizing
unsigned tmp;
__cpuid(0, tmp, tmp, tmp, tmp); // cpuid is a serialising call
dont_remove = tmp; // prevent optimizing out cpuid
__asm__ __volatile__("rdtsc; " // read of tsc
"shl $32,%%rdx; " // shift higher 32 bits stored in rdx up
"or %%rdx,%%rax" // and or onto rax
: "=a"(tsc) // output to tsc
:
: "%rcx", "%rdx"); // rcx and rdx are clobbered
3 这是 RDTSC
调用使用在撞列表内存
,其中prevents重新排序
3: This is a call to rdtsc
with memory
in the clobber list, which prevents reordering
__asm__ __volatile__("rdtsc; " // read of tsc
"shl $32,%%rdx; " // shift higher 32 bits stored in rdx up
"or %%rdx,%%rax" // and or onto rax
: "=a"(tsc) // output to tsc
:
: "%rcx", "%rdx", "memory"); // rcx and rdx are clobbered
// memory to prevent reordering
我对第三选项的理解如下:
My understanding for the 3rd option is as follows:
进行调用 __挥发性__
$ P $从删除ASM或移动它在任何指令,可能需要的结果(或更改输入)的ASM的pvents优化。然而,它仍然可以相对于不相干的操作移动。因此, __ __波动
是远远不够的。
Making the call __volatile__
prevents the optimizer from removing the asm or moving it across any instructions that could need the results (or change the inputs) of the asm. However it could still move it with respect to unrelated operations. So __volatile__
is not enough.
告诉编译器内存被破坏的::记忆)
。在记忆
撞意味着GCC不能对保持不变跨越ASM,内存内容的任何假设,因此它周围不会重新排序。
Tell the compiler memory is being clobbered: : "memory")
. The "memory"
clobber means that GCC cannot make any assumptions about memory contents remaining the same across the asm, and thus will not reorder around it.
所以我的问题是:
- 1:我的
__
和记忆
正确 __挥发性的理解? - 2:做第二次两个调用做同样的事情
- 3:使用
记忆
看起来比使用其它串行化指令的要简单得多。为什么会有人使用第三选项可在第二个选项?
- 1: Is my understanding of
__volatile__
and"memory"
correct? - 2: Do the second two calls do the same thing?
- 3: Using
"memory"
looks much simpler than using another serializing instruction. Why would anyone use the 3rd option over the 2nd option?
推荐答案
作为的评论中提到,有一个的编译屏障之间的和的处理器障碍的区别。 挥发性
和内存
ASM声明充当编译屏障,但是处理器仍然可以自由地指令重新排序。
As mentioned in a comment, there's a difference between a compiler barrier and a processor barrier. volatile
and memory
in the asm statement act as a compiler barrier, but the processor is still free to reorder instructions.
处理器障碍是必须明确给出特别说明,如 rdtscp,CPUID
,存储栅栏指令( MFENCE,lfence,
...)等。
Processor barrier are special instructions that must be explicitly given, e.g. rdtscp, cpuid
, memory fence instructions (mfence, lfence,
...) etc.
顺便说一句,在使用 CPUID
像以前那样的屏障 RDTSC
是常见的,它也可以是非常糟糕从性能的角度来看,因为虚拟机的平台往往陷阱和仿效的 CPUID
指令,以强加在集群一套通用的跨多台计算机CPU功能(以确保现场移民工程)。因此,最好使用的存储栅栏指令之一。
As an aside, while using cpuid
as a barrier before rdtsc
is common, it can also be very bad from a performance perspective, since virtual machine platforms often trap and emulate the cpuid
instruction in order to impose a common set of CPU features across multiple machines in a cluster (to ensure that live migration works). Thus it's better to use one of the memory fence instructions.
Linux内核使用 MFENCE; RDTSC
英特尔; RDTSC lfence在AMD平台。如果你不希望这些区分打扰,
MFENCE; RDTSC
适用于双方虽然它为 MFENCE
比一个更强有力的屏障 lfence
。
The Linux kernel uses mfence;rdtsc
on AMD platforms and lfence;rdtsc
on Intel. If you don't want to bother with distinguishing between these, mfence;rdtsc
works on both although it's slightly slower as mfence
is a stronger barrier than lfence
.
这篇关于内存和CPUID / RDTSC:rdtscp,RDTSC之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!