问题描述
好的,这真的让我很烦.
Okay, so this really bugs me.
我正在使用perf记录cpu-clock事件(一个软件事件):
I'm using perf to record the cpu-clock event (a software event):
$ > perf record -e cpu-clock srun -n 1 ./stream
...,并且性能报告生成的表为空.
... and the table produced by perf report is empty.
我正在使用perf来记录perf列表中列出的所有可用软件事件:
I'm using perf to record all available software events listed in perf list:
$ > perf record -e alignment-faults,context-switches,cpu-clock,cpu-migrations,\
dummy,emulation-faults,major-faults,minor-faults,page-faults,task-clock\
srun -n 1 ./stream
...该表为我提供了可用样品的列表:
... the table gives me a list of available samples:
0 alignment-faults
125 context-switches
255 cpu-clock
21 cpu-migrations
0 dummy
0 emulation-faults
0 major-faults
128 minor-faults
132 page-faults
254 task-clock
我可以查看在cpu-clock中收集的样本,它为我提供了信息.为什么?!如果仅测量cpu-clock,为什么它不起作用?为什么在四个事件中没有收集到样本?
I can look at the samples collected in cpu-clock and it gives me information. Why?! Why does it not work if I only measure cpu-clock? Why were there no samples collected in four events?
这是此问题的后续措施:错误:perf.data文件没有示例
This is a follow-up to this question:error: perf.data file has no samples
推荐答案
可能 srun
不会直接使用分叉启动目标进程.它可能使用诸如ssh或daemon之类的各种远程外壳程序来启动进程.
Probably srun
don't start target process with direct fork. It may use some varian ot remote shell like ssh or daemon to start processes.
性能记录
(不带 -a
选项)将仅跟踪直接分叉的子流程,而不跟踪由sshd或其他守护程序启动(分叉)的进程.如果 srun
可以转到它并使用 perf record ... srun
命令,它将永远不会对远程计算机进行配置(这是对srun应用程序及其进行分叉的所有文件进行配置)
perf record
(without -a
option) will track only directly forked sub-processes, not the process started (forked) by sshd or other daemon. And it will never profile remote machine if the srun
can go to it and perf record ... srun
command was used (this is to profile srun application and everything it forks).
首先尝试使用 perf stat
获得总(原始)性能计数器,然后将perf用作srun参数;这是使用远程Shell或守护程序的工具(可能具有perf的完整路径)的正确用法:
Try perf stat
first to get total (raw) performance counters, and put perf as srun argument; this is the correct usage with tools which uses remote shell or daemons (probably with full path to perf):
srun -n 1 perf stat ./stream
srun -n 1 /usr/bin/perf stat ./stream
性能统计
将显示目标任务的运行时间.然后选择一些原始计数器较高的事件(perf记录通常将采样率调整到几kHz左右,如果有足够的原始事件计数,则将生成数千个样本):
perf stat
will print running time of target task. Then select some event with high raw counter (perf record usually tune sample rate to around several kHz, so thousands of samples will be generated, if there are enough raw event counts):
srun -n 1 perf record -e cpu-clock ./stream
srun -n 1 /usr/bin/perf record -e cpu-clock ./stream
这篇关于性能:软件事件之间的奇怪关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!