问题描述
我发现Dr.Clements找到了 mtrace .尽管它很有用,但在我需要的情况下无法正常工作.我打算使用该记录来了解不同情况下的内存访问模式.
I found mtrace by Dr.Clements. Although it is useful, it doesn't work normally in the situation I need. I intend to use the record to understand memory access pattern in different scenario.
有人可以分享相关经验吗?任何建议将不胜感激.
Can someone share the related experience? Any suggestion will be appreciated.
0313更新:我正在尝试使用qemu-mtrace通过linux-mtrace(3.8.0)引导ubuntu 16.04,但它只显示几个错误消息并终止.希望有一些工具能够记录每次访问.
0313 Updated:I'm trying to use qemu-mtrace to boot ubuntu 16.04 with linux-mtrace(3.8.0),but it only show several error message and terminated. Hope some tool be able to log every access.
$ ./qemu-system-x86_64 -mtrace-enable -mtrace-file mtrace.out -hda ubuntu.img -m 1024
Error: mtrace_entry_ascope (exit, syscall:xx) with no stack tag!
mtrace_entry_register: mtrace_host_addr failed (10)
mtrace_inst_exec: bad call 140734947607728
Aborted (core dumped)
推荐答案
已经为某些现代x86/EM64T CPU(可能是仅限Intel; Ivy和更新的台式机/服务器cpus). perf mem
的手册页为 http://man7.org /linux/man-pages/man1/perf-mem.1.html 和内核文档目录中的相同文本: http://lxr.free-electrons.com/source/tools/perf/Documentation/perf-mem.txt .文字不完整;最好的文档是资料来源: tools/perf/builtin-mem. c &部分在 tools/perf/builtin- report.c . https://perf.wiki.kernel.org/index.php/Tutorial中没有详细信息.
There is perf mem
tool implemented for some modern x86/EM64T CPUs (probably, Intel-only; Ivy and newer desktop/server cpus). Man page of perf mem
is http://man7.org/linux/man-pages/man1/perf-mem.1.html and same text in kernel docs dir: http://lxr.free-electrons.com/source/tools/perf/Documentation/perf-mem.txt. The text is incomplete; the best docs are sources: tools/perf/builtin-mem.c & partially in tools/perf/builtin-report.c. No details in https://perf.wiki.kernel.org/index.php/Tutorial.
与qemu-mtrace
不同,它不会记录每个内存访问,而只会记录第N个访问,其中N类似于10000或100000.但是它以本机速度和低开销工作.使用perf mem record ./program
记录模式;尝试为某些CPU内核的系统范围或全局采样添加-a
或-C cpulist
.无法从系统内部记录(跟踪)所有内存访问(工具应将信息写入内存,并将记录此访问-这是有限内存的无限递归),但是有非常昂贵的专有系统特定外部组件跟踪解决方案,例如JTAG或SDRAM嗅探器(5,000美元或以上).
Unlike qemu-mtrace
it will not log every memory access, but only every Nth access where N is like 10000 or 100000. But it works with native speed and low overhead. Use perf mem record ./program
to record pattern; try to add -a
or -C cpulist
for system-wide or global sampling for some CPU cores. There is no way to log (trace) all and every memory access from inside the system (tool should write info to memory and will log this access - this is infinite recursion with finite memory), but there are very costly proprietary system-specific external tracing solutions like JTAG or SDRAM sniffer ($5k or more).
perf mem
的工具在2013年左右(3.10版的Linux内核)中添加,在lwn上搜索perf mem有以下结果: https://lwn.net/Articles/531766/
The tools of perf mem
where added around 2013 (3.10 version of linux kernel), there are several results of searching perf mem on lwn: https://lwn.net/Articles/531766/
当前补丁 从Nehalem开始在Intel处理器上实现该功能. 这些补丁利用了PEBS负载延迟和精确存储 机制.精确商店仅在桑迪桥(Sandy Bridge)和 基于常春藤桥的处理器.
The current patches implement the feature on Intel processors starting with Nehalem. The patches leverage the PEBS Load Latency and Precise Store mechanisms. Precise Store is present only on Sandy Bridge and Ivy Bridge based processors.
添加了物理地址采样支持: https://lwn.net/Articles/555890/ (perf mem --phys-addr -t load rec
); (还有与位相关的2016年c2c
性能工具"来跟踪缓存行争用": https://lwn.net/Articles/704125/并带有示例 https://joemario.github.io/blog/2016/09/01/c2c-blog/)
Physical address sampling support added: https://lwn.net/Articles/555890/ (perf mem --phys-addr -t load rec
); (there is also bit related 2016 year c2c
perf tool "to track down cacheline contention": https://lwn.net/Articles/704125/ with examples https://joemario.github.io/blog/2016/09/01/c2c-blog/)
perf mem
上的一些随机幻灯片:
Some random slides on perf mem
:
- > ://indico.cern.ch/event/280897/contributions/1628882/attachments/515361/711133/SE-CERN_PMU_workshop_2013.pdf#page=4
- http ://www.linuxtag.org/2013/fileadmin/www.linuxtag.org/slides/Arnaldo_Melo_-_Linux__perf__tools__Overview_and_Current_Developments.e323.pdf#page=10
- https://people .netfilter.org/pablo/netdev0.1/slides/sowa-perf-analytics.pdf#page = 19
- http://indico.cern.ch/event/280897/contributions/1628882/attachments/515361/711133/SE-CERN_PMU_workshop_2013.pdf#page=4
- http://www.linuxtag.org/2013/fileadmin/www.linuxtag.org/slides/Arnaldo_Melo_-_Linux__perf__tools__Overview_and_Current_Developments.e323.pdf#page=10
- https://people.netfilter.org/pablo/netdev0.1/slides/sowa-perf-analytics.pdf#page=19
有关解码perf mem -D report
的一些信息: perf mem -D报告
Some info on decoding perf mem -D report
: perf mem -D report
# PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL
2054 2054 0xffffffff811186bf 0x016ffffe8fbffc804b0 49 0x68100842 /lib/modules/3.12.23/build/vmlinux:perf_event_aux_ctx
"ADDR","DSRC","SYMBOL"是什么意思?
What does "ADDR", "DSRC", "SYMBOL" mean?
(由与此答案相同的用户回答)
(answered by the same user as in this answer)
还可以进行排序以获取一些基本统计信息:perf mem rep --sort=mem
- http://thread.gmane.org/gmane.linux.kernel.perf.user/1438
There is also sorting to get some basic stats: perf mem rep --sort=mem
- http://thread.gmane.org/gmane.linux.kernel.perf.user/1438
其他工具..有一种基于valgrind的(慢速)cachegrind 模拟器,用于模拟用户空间程序的高速缓存内存- https://lwn.net/Articles/257209/.对于与DRAMsim/DRAMsim2相关的低级(最慢)模型,还应该有一些东西. http://eng.umd.edu/~blj/dramsim/
Other tools.. There is (slow) cachegrind emulator based on valgrind for simulating cache memory for userspace prograns - "7.2 Simulating CPU Caches" of https://lwn.net/Articles/257209/. There should also be something for low-level (slowest) models related to DRAMsim/DRAMsim2 http://eng.umd.edu/~blj/dramsim/
这篇关于记录内存访问足迹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!