问题描述
有时候,我问了以下问题,@ M-Iduoad友好地提供了一种 pgrep
来捕获的解决方案所有子PID,并将其与perf stat中的-p一起使用.效果很好!
Some times ago, I asked the following question "How to count number of executed instructions of a process id including child processes", and @M-Iduoad kindly provided a solution with pgrep
to capture all child PIDs and use it with -p in perf stat. It works great!
但是,我遇到的一个问题是多线程应用程序以及何时生成新线程.由于我不是算命先生(太糟糕了!),我不知道新生成的线程的 tid
,因此我无法将它们添加到 perf stat
的-p或-t参数.
However, one problem I encountered is with multi-threaded application, and when a new thread is being spawned. Since I'm not a fortune teller (too bad!), I don't know tid
of the newly generated threads, and therefore I can't add them in the perf stat
's -p or -t parameter.
作为示例,假设我有一个多线程nodejs服务器(部署为Kubernetes之上的容器),具有以下 pstree
:
As an example, let's assume I have a multithreaded nodejs server (deployed as a container on top of Kubernetes) with the following pstree
:
root@node2:/home/m# pstree -p 4037791
node(4037791)─┬─sh(4037824)───node(4037825)─┬─{node}(4037826)
│ ├─{node}(4037827)
│ ├─{node}(4037828)
│ ├─{node}(4037829)
│ ├─{node}(4037830)
│ └─{node}(4037831)
├─{node}(4037805)
├─{node}(4037806)
├─{node}(4037807)
├─{node}(4037808)
├─{node}(4037809)
├─{node}(4037810)
├─{node}(4037811)
├─{node}(4037812)
├─{node}(4037813)
└─{node}(4037814)
当然,我可以使用以下 perf stat
命令来监视其线程:
Of course, I can have the following perf stat
command to watch its threads:
perf stat --per-thread -e instructions,cycles,task-clock,cpu-clock,cpu-migrations,context-switches,cache-misses,duration_time -p $(pgrep --ns 4037791 | paste -s -d ",")
它与单线程nodejs应用程序一起正常工作.但是,在多线程服务的情况下,一旦收到请求, pstree
的输出将如下所示:
It works fine with a single threaded nodejs application. But in case of a multi-threaded service, as soon as it receives a request, the pstree
output would be look like this:
root@node2:/home/m# pstree -p 4037791
node(4037791)─┬─sh(4037824)───node(4037825)─┬─{node}(4037826)
│ ├─{node}(4037827)
│ ├─{node}(4037828)
│ ├─{node}(4037829)
│ ├─{node}(4037830)
│ ├─{node}(4037831)
│ ├─{node}(1047898)
│ ├─{node}(1047899)
│ ├─{node}(1047900)
│ ├─{node}(1047901)
│ ├─{node}(1047902)
│ ├─{node}(1047903)
│ ├─{node}(1047904)
│ ├─{node}(1047905)
│ ├─{node}(1047906)
│ ├─{node}(1047907)
│ ├─{node}(1047908)
│ ├─{node}(1047909)
│ ├─{node}(1047910)
│ ├─{node}(1047911)
│ ├─{node}(1047913)
│ ├─{node}(1047914)
│ ├─{node}(1047919)
│ ├─{node}(1047920)
│ ├─{node}(1047921)
│ └─{node}(1047922)
├─{node}(4037805)
├─{node}(4037806)
├─{node}(4037807)
├─{node}(4037808)
├─{node}(4037809)
├─{node}(4037810)
├─{node}(4037811)
├─{node}(4037812)
├─{node}(4037813)
└─{node}(4037814)
因此,我之前的 perf stat
命令不会捕获新生成的线程的统计信息.我的意思是,它可能会捕获累积的指令,但绝对不会显示在每线程"中.格式.
Therefore, my previous perf stat
command would not capture the stats of the newly generated threads. I mean, it may capture accumulated instructions but it's definitely not showing in a "per-thread" format.
有什么方法可以在perf统计信息中使用-per-thread
并捕获多线程应用程序中新产生的线程的统计信息?似乎只能使用 -p
或 -t
来遵循 perf
启动时已经存在的固定线程集,而不会跟随新的.
Is there any way that I can use --per-thread
in perf stat and capture stats of the newly spawned threads in a multithreaded application? It seems to only work with -p
or -t
to follow a fixed set of threads that already exist when perf
starts, and won't follow new ones.
这里有一个类似的有关 perf记录的问题
,但我使用的是 perf stat
.另外,这似乎并没有按线程分开记录的概要文件,因此它等效于 perf stat节点...
,除非有一种方法可以处理记录的数据,然后在线程之后将其按线程分开事实吗?
There's a similar question here for perf record
but I'm using perf stat
. Also, that doesn't seem to separate the recorded profile by thread, so it's just equivalent to perf stat node ...
Unless there's a way to process the recorded data to separate it out by thread after the fact?
可以帮助我动态计算指令,周期,任务时钟,cpu时钟,cpu迁移,上下文切换,缓存丢失"的任何其他潜在解决方案.给定PID的每个线程(包括新生成的线程),无论使用 perf
还是其他任何方法,都是可以接受的!
Any other potential solutions that help me dynamically count "instructions,cycles,task-clock,cpu-clock,cpu-migrations,context-switches,cache-misses" per threads of a given PID (including newly spawned threads) is acceptable, whether using perf
or anything else!
推荐答案
perf record -s
和 perf report -T
的组合应为您提供所需的信息
The combination of perf record -s
and perf report -T
should give you the information you need.
为了演示,请使用以下具有良好定义的指令数的线程作为示例代码:
To demonstrate, take the following example code using threads with well-defined instruction counts:
#include <cstdint>
#include <thread>
void work(int64_t count) {
for (int64_t i = 0; i < count; i++);
}
int main() {
std::thread first(work, 100000000ll);
std::thread second(work, 400000000ll);
std::thread third(work, 800000000ll);
first.join();
second.join();
third.join();
}
(无需优化即可编译!)
(Compile without optimization!)
现在,使用 perf record
作为前缀命令.它将遵循所有产生的进程和线程.
Now, use perf record
as a prefix command. It will follow all spawned processes and threads.
$ perf record -s -e instructions -c 1000000000 ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (5 samples) ]
要很好地显示统计信息:
To display the statistics nicely:
$ perf report -T
[... snip ...]
# PID TID instructions:u
270682 270683 500003888
270682 270684 2000001866
270682 270685 4000002177
perf record
的参数有些棘手. -s
用相当精确的数字写单独的记录-它们不依赖于指令样本(每1000000000条指令生成).但是,即使找不到 -T
,性能报告
也会失败,因为它找不到单个样本.因此,您需要设置至少触发一次的指令样本计数 -c
(或频率).任何示例都可以,每个线程都不需要示例.
The parameters for perf record
are a little bit tricky. -s
writes separate records with fairly precise numbers - they do not depend on the instruction samples (generated every 1000000000 instructions). However, perf report
, even with -T
fails when it does not find a single sample. So you need to set a instruction sample count -c
(or frequency) that triggers at least once. Any sample will do, it does not need a sample per thread.
或者,您可以查看来自 perf.data
的原始记录.然后,您实际上可以告诉性能记录
不收集任何任何样本.
Alternatively, you could look at the raw records from perf.data
. Then you can actually tell perf record
to not collect any samples.
$ perf record -s -e instructions -n ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data ]
但是您需要过滤掉相关记录,并且可能还需要汇总其他记录.
But you need to filter out the relevant records and there might be additional records you need to sum up.
$ perf script -D | grep PERF_RECORD_READ | grep -v " 0$"
# Annotation by me PID TID
213962455637481 0x760 [0x40]: PERF_RECORD_READ: 270887 270888 instructions:u 500003881
213963194850657 0x890 [0x40]: PERF_RECORD_READ: 270887 270889 instructions:u 2000001874
213964190418415 0x9c0 [0x40]: PERF_RECORD_READ: 270887 270890 instructions:u 4000002175
这篇关于如何计算进程 id 的执行指令数,包括所有未来的子线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!