问题描述
我正在寻找一些非常基本的微基准的小代码路径,如紧凑循环,我用C ++编写。我在Linux和OSX上运行,并使用GCC。什么设施有亚毫秒的准确性?我想一个简单的运行代码路径测试多次(几千万?)将给我足够的一致性,以获得良好的阅读。
I'm looking to do some very basic micro benchmarking of small code paths, such as tight loops, that I've written in C++. I'm running on Linux and OSX, and using GCC. What facilities are there for sub millisecond accuracy? I am thinking a simple test of running the code path many times (several tens of millions?) will give me enough consistency to get a good reading. If anyone knows of preferable methods, please feel free to suggest them.
推荐答案
您可以使用rdtsc
x86 / x86_64上的处理器指令。对于多核系统,检查CPUID(linux中的/ proc / cpuinfo)中的constant_tsc功能 - 这意味着所有内核都使用相同的刻度计数器,即使动态频率更改和休眠也是如此。
You can use "rdtsc"
processor instruction on x86/x86_64. For multicore systems check the "constant_tsc" capability in CPUID (/proc/cpuinfo in linux) - it will mean that all cores uses the same tick counter, even with dynamic freq changing and sleeping.
如果你的处理器不支持constant_tsc,一定要绑定你的程序到核心( taskset
在Linux中的实用程序)。
If you processor does not support constant_tsc, be sure to bind you programm to the core (taskset
utility in Linux).
在乱序CPU上使用rdtsc时(除了Intel Atom以外,可能还有其他低端cpus),在之前添加一个ordering指令,例如cpuid - 它将临时禁用指令重新排序。
When using rdtsc on out-of-order CPUs (All besides Intel Atom, may be some other low-end cpus), add an "ordering" instruction before, e.g. "cpuid" - it will temporary disable instruction reordering.
此外,MacOsX还有Shark可以测量代码中的一些硬件事件。
Also, MacOsX have "Shark" which can measure some hardware events in your code.
RDTSC和无序cpus。这个伟大的Fog手册的第18节(主要网站是)
RDTSC and out-of-order cpus. Section 18 of this great Fog's manual ( main site of it is http://www.agner.org/optimize/ )
这篇关于寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!