问题描述
简而言之,我正在尝试在用户级基准测试流程(伪代码,假设x86_64和UNIX系统)中实现以下目标:
In a nutshell, I'm trying to achieve the following inside a userland benchmark process (pseudo-code, assuming x86_64 and a UNIX system):
results[] = ...
for (iteration = 0; iteration < num_iterations; iteration++) {
pctr_start = sample_pctr();
the_benchmark();
pctr_stop = sample_pctr();
results[iteration] = pctr_stop - pctr_start;
}
FWIW,我正在考虑使用的性能计数器是CPU_CLK_UNHALTED.THREAD_ALL
,用于读取独立于时钟频率变化的核心周期数(在 PAPI 建立在perf之上,因此很可能继承了上述问题.
- Perf tools for Linux. Seems to be geared up for sampling over the whole lifetime of a process, not within a process as specific points (before and after each iteration).
- Use perf syscalls directly (i.e.
perf_event_open
). Looks like the counter value will only update periodically (using a sample rate) or after the counter exceeds a threshold. I need the counter value precisely at the moment I ask. This is whyRDPMC
seemed so attractive. I imagine that sampling frequently will itself skew the performance counter readings. - PAPI builds on perf, so probably inherits the above problem.
- Write a kernel module -- too much effort, too error prone.
理想情况下,我想要一个可以在OpenBSD和Linux上运行的解决方案,但是我认为这是一个很高的要求.也许目前仅适用于Linux.
Ideally I would like a solution which works on OpenBSD and Linux, but somehow I think that is a tall order. Perhaps just for Linux for now.
我们非常感谢您的帮助.谢谢.
Any help is most appreciated. Thanks.
我刚刚找到了 Linux msr设备节点,这可能就足够了.如果出现更好的答案,我将保留该问题.
I just found the Linux msr device node, which would probably suffice. I'll leave the question up in case a better answer shows up.
推荐答案
似乎最好的方法-至少对于Linux-是使用 msr设备节点.
It seems the best way -- for Linux at least -- is to use the msr device node.
您只需打开设备节点,查找所需的MSR地址,然后读取或写入8个字节即可.
You simply open a device node, seek to the address of the MSR required, and read or write 8 bytes.
OpenBSD更加困难,因为(在编写本文时)没有用户空间代理到MSR.因此,您需要手动编写内核模块或实现sysctl.
OpenBSD is harder, since (at the time of writing) there is no user-space proxy to the MSRs. So you would need to write a kernel module or implement a sysctl by hand.
这篇关于如何在过程中配置和采样英特尔性能计数器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!