我正在用cachegrind,callgrind和gem5做一些实验。我注意到许多访问被认为是cachegrind的read、callgrind的write以及gem5的read和write。
举个简单的例子:
int main() {
int i, l;
for (i = 0; i < 1000; i++) {
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
... (100 times)
}
}
我编译时使用:
gcc ex.c—静态-o ex
所以基本上,根据asm文件,
addl $1, -8(%rbp)
被执行100000次。因为它既是读又是写,所以我希望读10万,写10万。但是,cachegrind只将它们计为read,而callgrind只计为write。 % valgrind --tool=cachegrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15356== Cachegrind, a cache and branch-prediction profiler
==15356== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al.
==15356== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15356== Command: ./ex
==15356==
--15356-- warning: L3 cache found, using its data for the LL simulation.
==15356==
==15356== I refs: 111,535
==15356== I1 misses: 475
==15356== LLi misses: 280
==15356== I1 miss rate: 0.42%
==15356== LLi miss rate: 0.25%
==15356==
==15356== D refs: 104,894 (103,791 rd + 1,103 wr)
==15356== D1 misses: 557 ( 414 rd + 143 wr)
==15356== LLd misses: 172 ( 89 rd + 83 wr)
==15356== D1 miss rate: 0.5% ( 0.3% + 12.9% )
==15356== LLd miss rate: 0.1% ( 0.0% + 7.5% )
==15356==
==15356== LL refs: 1,032 ( 889 rd + 143 wr)
==15356== LL misses: 452 ( 369 rd + 83 wr)
==15356== LL miss rate: 0.2% ( 0.1% + 7.5% )
-
% valgrind --tool=callgrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15376== Callgrind, a call-graph generating cache profiler
==15376== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al.
==15376== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15376== Command: ./ex
==15376==
--15376-- warning: L3 cache found, using its data for the LL simulation.
==15376== For interactive control, run 'callgrind_control -h'.
==15376==
==15376== Events : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==15376== Collected : 111532 2777 102117 474 406 151 279 87 85
==15376==
==15376== I refs: 111,532
==15376== I1 misses: 474
==15376== LLi misses: 279
==15376== I1 miss rate: 0.42%
==15376== LLi miss rate: 0.25%
==15376==
==15376== D refs: 104,894 (2,777 rd + 102,117 wr)
==15376== D1 misses: 557 ( 406 rd + 151 wr)
==15376== LLd misses: 172 ( 87 rd + 85 wr)
==15376== D1 miss rate: 0.5% ( 14.6% + 0.1% )
==15376== LLd miss rate: 0.1% ( 3.1% + 0.0% )
==15376==
==15376== LL refs: 1,031 ( 880 rd + 151 wr)
==15376== LL misses: 451 ( 366 rd + 85 wr)
==15376== LL miss rate: 0.2% ( 0.3% + 0.0% )
有人能给我一个合理的解释吗?考虑到实际上有~100k个读取和~100k个写入(即一个addl有两个缓存访问),我是否正确?
最佳答案
From cachegrind manual: 5.7.1. Cache Simulation Specifics
修改内存位置的指令(例如inc和dec)是
算作只是一次读取,即一次数据引用。今年五月
似乎很奇怪,但既然写了就不会错过(读的
保证块在缓存中)这不是很有趣。
因此,它不测量数据缓存被访问的次数,
但数据缓存未命中的次数。
callgrind的缓存模拟逻辑似乎与cachegrind不同。我认为callgrind应该产生与cachegrind相同的结果,所以这可能是一个bug?
关于c - 使用cachegrind和callgrind的不同读写计数,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/15790541/