问题描述
我刚刚开始使用google性能工具(ubuntu中的google-perftools
和libgoogle-perftools4
软件包),我发誓要搜索大约一天,但没有找到答案!!问题是我无法通过CPU分析获得所有函数的结果.这是我的代码:
I just started using google performance tools (google-perftools
and libgoogle-perftools4
packages in ubuntu), I swear that I'm googling for around a day and I didn't find an answer!!The problem is that I do not get the result for ALL of my functions with CPU profiling. This is my code:
#include "gperftools/profiler.h"
#include <iostream>
#include <math.h>
using namespace std;
void bar()
{
int a,b,c,d,j,k;
a=0;
int z=0;
b = 1000;
while(z < b)
{
while (a < b)
{
d = sin(a);
c = cos(a);
j = tan(a);
k = tan(a);
k = d * c + j *k;
a++;
}
a = 0;
z++;
}
}
void foo()
{
cout << "hey " << endl;
}
int main()
{
ProfilerStart("/home/mohammad/gperf/dump.txt");
int a = 1000;
while(a--){foo();}
bar();
ProfilerFlush();
ProfilerStop();
}
编译为g++ test.cc -lprofiler -o a.out
这是我运行代码的方式:
this is how I run the code:
CPUPROFILE=dump.txt ./a.out
我也尝试过:
CPUPROFILE_FREQUENCY=10000 LD_PRELOAD=/usr/local/lib/libprofiler.so.0.3.0 CPUPROFILE=dump.txt ./a.out
这就是我从google-pprof --text a.out dump.txt
那里得到的:
And this is what I get from google-pprof --text a.out dump.txt
:
Using local file ./a.out.
Using local file ./dump.txt.
Total: 22 samples
8 36.4% 36.4% 8 36.4% 00d8cb04
6 27.3% 63.6% 6 27.3% bar
3 13.6% 77.3% 3 13.6% __cos (inline)
2 9.1% 86.4% 2 9.1% 00d8cab4
1 4.5% 90.9% 1 4.5% 00d8cab6
1 4.5% 95.5% 1 4.5% 00d8cb06
1 4.5% 100.0% 1 4.5% __write_nocancel
0 0.0% 100.0% 3 13.6% __cos
但是没有有关foo函数的信息!
But there is no information about the foo function!
我的系统信息:Ubuntu 12.04g ++ 4.6.3
my system information:ubuntu 12.04g++ 4.6.3
仅此而已!
推荐答案
TL; DR:foo
要获取概要分析事件又快又小,请再运行100次.频率设置带有错别字,pprof
的采样频率不会超过CONFIG_HZ(通常为250).最好改用更现代的Linux perf
分析器(作者的教程,维基百科).
TL;DR: foo
is to fast and small to get profiling events, run it 100 more times. Frequency setting was with typo, and pprof
will not sample more often than CONFIG_HZ (usually 250). It is better to switch to more modern Linux perf
profiler (tutorial from its authors, wikipedia).
长版:
您的foo
函数太短和太简单-只需调用两个函数即可.使用g++ test.cc -lprofiler -o test.s -S -g
编译测试,并使用c++filt
程序过滤test.s
,以使c ++名称可读:
Your foo
function is just too short and simple - just call two functions. Compiled the test with g++ test.cc -lprofiler -o test.s -S -g
, with filtering of test.s
with c++filt
program to make c++ names readable:
foo():
.LFB972:
.loc 1 27 0
pushq %rbp
movq %rsp, %rbp
.loc 1 28 0
movl $.LC0, %esi
movl std::cout, %edi
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
movl std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&), %esi
movq %rax, %rdi
call std::basic_ostream<char, std::char_traits<char> >::operator<<(std::basic_ostream<char, std::char_traits<char> >& (*)(std::basic_ostream<char, std::char_traits<char> >&))
.loc 1 29 0
popq %rbp
ret
.LFE972:
.size foo(), .-foo()
因此,要在配置文件中看到它,您应该运行foo
更多次,将main中的int a = 1000;
更改为更大的值,例如10000或更好的100000(我在测试中也是如此).
So, to see it in the profile you should run foo
for more times, changing int a = 1000;
in main to something much greater, like 10000 or better 100000 (as did I for the test).
此外,您还可以修复不正确的"CPUPROFILE_FREQUENC=10000
"来更正CPUPROFILE_FREQUENCY
(请注意Y
).我应该说CPUPROFILE_FREQUENCY的设置太高了10000,因为它通常每秒可能仅生成1000或250个事件,具体取决于内核配置CONFIG_HZ
(大多数3.x内核有250个,请检查grep CONFIG_HZ= /boot/config*
). pprof中的CPUPROFILE_FREQUENCY的默认设置为100.
Also you may fix the incorrect "CPUPROFILE_FREQUENC=10000
" to correct CPUPROFILE_FREQUENCY
(note the Y
). I should say that 10000 is too high setting for CPUPROFILE_FREQUENCY, because it usually may only generate 1000 or 250 events per second depending on kernel configuration CONFIG_HZ
(most 3.x kernels have 250, check grep CONFIG_HZ= /boot/config*
). And default setting for CPUPROFILE_FREQUENCY in pprof is 100.
我在Ubuntu 14.04上使用bash脚本测试了CPUPROFILE_FREQUENCY的不同值:100000、10000、1000、250
I tested different values of CPUPROFILE_FREQUENCY: 100000, 10000, 1000, 250 with bash script on Ubuntu 14.04
for a in 100000 100000 10000 10000 1000 1000 300 300 250 250; do
echo -n "$a ";
CPUPROFILE_FREQUENCY=$a CPUPROFILE=dump$a.txt ./test >/dev/null;
done
结果是相同的120-140个事件和每个./test的运行时间大约为0.5秒,因此google-perftools的cpuprofiler不能为单线程每秒执行的事件要比内核中设置的CONFIG_HZ多(我有250).
And the results were the same 120-140 events and runtime of every ./test around 0.5 seconds, so cpuprofiler from google-perftools can't do more events per second for single thread, than CONFIG_HZ set in kernel (my has 250).
100000 PROFILE: interrupts/evictions/bytes = 124/1/6584
100000 PROFILE: interrupts/evictions/bytes = 134/0/7864
10000 PROFILE: interrupts/evictions/bytes = 125/0/7488
10000 PROFILE: interrupts/evictions/bytes = 123/0/6960
1000 PROFILE: interrupts/evictions/bytes = 134/0/6264
1000 PROFILE: interrupts/evictions/bytes = 125/2/7272
300 PROFILE: interrupts/evictions/bytes = 137/2/7984
300 PROFILE: interrupts/evictions/bytes = 126/0/7216
250 PROFILE: interrupts/evictions/bytes = 123/3/6680
250 PROFILE: interrupts/evictions/bytes = 137/2/7352
原始a = 1000 foo
且cout的函数运行太快而无法在每次运行中获得任何概要分析事件(即使是250个事件/秒),因此您没有foo
,也没有任何Input/Output函数.在少量运行中,__write_nocancel
可能会发生采样事件,然后将报告由libstdc ++组成的foo
和I/O函数(某些地方不在最顶端,因此请使用pprof
或google-pprof
),自我事件计数为零,子事件计数为非零:
With original a=1000 foo
and cout's functions runs too fast to get any profiling event (even on 250 events/s) on them in every run, so you have no foo
, nor any Input/Output functions. In small amount of runs, the __write_nocancel
may got sampling event, and then foo
and I/O functions form libstdc++ will be reported (somewhere not in the very top, so use --text
option of pprof
or google-pprof
) with zero self event count, and non-zero child event count:
....
1 0.9% 99.1% 1 0.9% __write_nocancel
....
0 0.0% 100.0% 1 0.9% _IO_new_file_overflow
0 0.0% 100.0% 1 0.9% _IO_new_file_write
0 0.0% 100.0% 1 0.9% __GI__IO_putc
0 0.0% 100.0% 1 0.9% foo
0 0.0% 100.0% 1 0.9% new_do_write
0 0.0% 100.0% 1 0.9% std::endl
0 0.0% 100.0% 1 0.9% std::ostream::put
使用a=100000
时,foo仍然太短和太快而无法获得自己的事件,但是I/O函数却得到了几个.这是我从长--text
输出中提取的列表:
With a=100000
, foo is still too short and fast to get own events, but I/O functions got several. This is list I grepped from long --text
output:
34 24.6% 24.6% 34 24.6% __write_nocancel
1 0.7% 95.7% 35 25.4% __GI__IO_fflush
1 0.7% 96.4% 1 0.7% __GI__IO_putc
1 0.7% 97.8% 2 1.4% std::operator<<
1 0.7% 98.6% 36 26.1% std::ostream::flush
1 0.7% 99.3% 2 1.4% std::ostream::put
0 0.0% 100.0% 34 24.6% _IO_new_file_sync
0 0.0% 100.0% 34 24.6% _IO_new_file_write
0 0.0% 100.0% 40 29.0% foo
0 0.0% 100.0% 34 24.6% new_do_write
0 0.0% 100.0% 2 1.4% std::endl
仅凭pprof
读取调用链的功能(只有在不省略帧信息的情况下,谁才能调用获得样本的函数),才能看到拥有自己的计数器为零的函数.
Functions with zero own counters seen only thanks to pprof
capability of reading call chains (it knows who calls the functions which got sample, if frame info is not omitted).
我还可以推荐更现代,更强大的功能(软件和硬件事件,高达5 kHz或更高的频率;用户空间和内核性能分析)以及更好支持的探查器Linux perf
探查器(来自其作者的教程,维基百科).
I can also recommend more modern, more capable (both software and hardware events, up to 5 kHz frequency or more; both user-space and kernel profiling) and better supported profiler, the Linux perf
profiler (tutorial from its authors, wikipedia).
从perf
到a=10000
的结果:
$ perf record ./test3 >/dev/null
... skip some perf's spam about inaccessibility of kernel symbols
... note the 3 kHz frequency here VVVV
Lowering default frequency rate to 3250.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.078 MB perf.data (~3386 samples) ]
要查看来自perf.data
输出文件的文本报告,我将减少使用(因为默认情况下perf report
启动交互式配置文件浏览器):
To see text report from perf.data
output file I'll use less (because perf report
by default starts interactive profile browser):
$ perf report |less
... skip some extra info about the machine, kernel, and perf starting command
# Samples: 1K of event 'cycles'
# Event count (approx.): 1155264208
# Overhead Command Shared Object Symbol
41.94% test3 libm-2.19.so [.] __tan_sse2
16.95% test3 libm-2.19.so [.] __sin_sse2
13.40% test3 libm-2.19.so [.] __cos_sse2
4.93% test3 test3 [.] bar()
2.90% test3 libc-2.19.so [.] __GI___libc_write
....
0.20% test3 test3 [.] foo()
或perf report -n | less
查看原始事件(样本)计数器:
Or perf report -n | less
to see raw event (sample) counters:
# Overhead Samples Command Shared Object
41.94% 663 test3 libm-2.19.so [.] __tan_sse2
16.95% 268 test3 libm-2.19.so [.] __sin_sse2
13.40% 212 test3 libm-2.19.so [.] __cos_sse2
4.93% 78 test3 test3 [.] bar()
2.90% 62 test3 libc-2.19.so [.] __GI___libc_write
....
0.20% 4 test3 test3 [.] foo()
这篇关于如何使用Google Perf工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!