本文介绍了哪些性能事件可以使用PEBS?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解哪些事件可以在我的事件上使用精确修饰符CPU(Sandy Bridge).

I want to understand which events can have the precise modifier on myCPU (Sandy Bridge).

Intel软件开发人员手册(表18-32.PEBS性能英特尔微体系结构代码名称Sandy Bridge的事件包含仅以下事件:INST_RETIREDUOPS_RETIREDBR_INST_RETIREDBR_MISP_RETIREDMEM_UOPS_RETIREDMEM_LOAD_UOPS_RETIREDMEM_LOAD_UOPS_LLC_HIT_RETIRED.并且 SandyBridge_core_V15.json 列出了PEBS> 0的相同事件.

Intel Software Developer's Manual (Table 18-32. PEBS PerformanceEvents for Intel Microarchitecture Code Name Sandy Bridge) containsonly the following events: INST_RETIRED, UOPS_RETIRED,BR_INST_RETIRED, BR_MISP_RETIRED, MEM_UOPS_RETIRED,MEM_LOAD_UOPS_RETIRED, MEM_LOAD_UOPS_LLC_HIT_RETIRED. And SandyBridge_core_V15.json lists the same events with PEBS > 0.

但是,有使用perf的一些示例,这些示例将:p添加到了cycles事件.而且我可以在计算机上成功运行perf record -e cycles:p.

However there are some examples of using perf, which add :p to the cycles event. And I can successfully run perf record -e cycles:p on my machine.

perf record -e cycles:p -vv -- sleep 1打印precise_ip 1.那么,这是否意味着CPU_CLK_UNHALTED事件实际上使用了PEBS?

Also perf record -e cycles:p -vv -- sleep 1 prints precise_ip 1. So does it mean that CPU_CLK_UNHALTED event actually uses PEBS?

是否可以获得支持:p的事件的完整列表?

Is it possible to get the full list of events, which support :p?

推荐答案

在SandyBridge上有支持cycles:p的技巧,而对于CPU_CLK_UNHALTED.*没有PEBS.该黑客行为是在perf内核部分中实现的" rel ="nofollow noreferrer"> intel_pebs_aliases_snb() .当用户使用非零的precise修饰符请求PERF_COUNT_HW_CPU_CYCLES(转换为CPU_CLK_UNHALTED.CORE)的-e cycles时,此函数将使用PEBS将硬件事件更改为UOPS_RETIRED.ALL:

There is hack to support cycles:p on SandyBridge which has no PEBS for CPU_CLK_UNHALTED.*. The hack is implemented in the kernel part of perf in intel_pebs_aliases_snb(). When user requests -e cycles which is PERF_COUNT_HW_CPU_CYCLES (translates to CPU_CLK_UNHALTED.CORE) with nonzero precise modifier, this function will change hardware event to UOPS_RETIRED.ALL with PEBS:

  29    [PERF_COUNT_HW_CPU_CYCLES]      = 0x003c,

2739 static void intel_pebs_aliases_snb(struct perf_event *event)
2740 {
2741    if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
2742        /*
2743         * Use an alternative encoding for CPU_CLK_UNHALTED.THREAD_P
2744         * (0x003c) so that we can use it with PEBS.
2745         *
2746         * The regular CPU_CLK_UNHALTED.THREAD_P event (0x003c) isn't
2747         * PEBS capable. However we can use UOPS_RETIRED.ALL
2748         * (0x01c2), which is a PEBS capable event, to get the same
2749         * count.
2750         *
2751         * UOPS_RETIRED.ALL counts the number of cycles that retires
2752         * CNTMASK micro-ops. By setting CNTMASK to a value (16)
2753         * larger than the maximum number of micro-ops that can be
2754         * retired per cycle (4) and then inverting the condition, we
2755         * count all cycles that retire 16 or less micro-ops, which
2756         * is every cycle.
2757         *
2758         * Thereby we gain a PEBS capable cycle counter.
2759         */
2760        u64 alt_config = X86_CONFIG(.event=0xc2, .umask=0x01, .inv=1, .cmask=16);
2761
2762        alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK);
2763        event->hw.config = alt_config;
2764    }
2765 }

intel_pebs_aliases_snb黑客已在 3557 __init int intel_pmu_init(void) 表示case INTEL_FAM6_SANDYBRIDGE:/case INTEL_FAM6_SANDYBRIDGE_X:

3772        x86_pmu.event_constraints = intel_snb_event_constraints;
3773        x86_pmu.pebs_constraints = intel_snb_pebs_event_constraints;
3774        x86_pmu.pebs_aliases = intel_pebs_aliases_snb;

precise_ip设置为非零时,noreferrer> intel_pmu_hw_config() :

pebs_aliases is called from intel_pmu_hw_config() when precise_ip is set to non-zero:

2814 static int intel_pmu_hw_config(struct perf_event *event)
2815 {

2821    if (event->attr.precise_ip) {

2828        if (x86_pmu.pebs_aliases)
2829            x86_pmu.pebs_aliases(event);
2830    }

黑客于2012年实施,lkml线程"[PATCH] perf,x86:使周期:p在SNB上运行","[tip:perf/core] perf/x86:为SNB/IVB实施周期:p" ,cccb9ba9e4ee0d750265f53de9258df69655c40b,:

The hack was implemented in 2012, lkml threads "[PATCH] perf, x86: Make cycles:p working on SNB", "[tip:perf/core] perf/x86: Implement cycles:p for SNB/IVB", cccb9ba9e4ee0d750265f53de9258df69655c40b, http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=cccb9ba9e4ee0d750265f53de9258df69655c40b:

现在终于有了一个可以正常工作的PEBS芯片(IvyBridge),我们可以 启用硬件并实施周期:p用于SNB/IVB.

Now that there's finally a chip with working PEBS (IvyBridge), we can enable the hardware and implement cycles:p for SNB/IVB.

而且我认为,除了 arch/x86/events/intel/core.c ,用于static void intel_pebs_aliases的grep(通常已实现cycles:p/CPU_CLK_UNHALTED 0x003c)并检查intel_pmu_init的实际模型并选择了确切的x86_pmu.pebs_aliases变体:

And I think, there is no full list of such "precise" converting hack besides the linux source code in arch/x86/events/intel/core.c, grep for static void intel_pebs_aliases (usually cycles:p / CPU_CLK_UNHALTED 0x003c is implemented) and check intel_pmu_init for actual model and exact x86_pmu.pebs_aliases variant selected:

  • intel_pebs_aliases_core2,INST_RETIRED.ANY_P (0x00c0) CNTMASK=16代替cycles:p
  • intel_pebs_aliases_snb,UOPS_RETIRED.ALL (0x01c2) CNTMASK=16代替cycles:p
  • intel_pebs_aliases_precdist可获取SKL,IVB,HSW,BDW上precise_ipINST_RETIRED.PREC_DIST (0x01c0)而不是cycles:ppp的最大值
  • intel_pebs_aliases_core2, INST_RETIRED.ANY_P (0x00c0) CNTMASK=16 instead of cycles:p
  • intel_pebs_aliases_snb, UOPS_RETIRED.ALL (0x01c2) CNTMASK=16 instead of cycles:p
  • intel_pebs_aliases_precdist for highest values of precise_ip, INST_RETIRED.PREC_DIST (0x01c0) instead of cycles:ppp on SKL, IVB, HSW, BDW

这篇关于哪些性能事件可以使用PEBS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 05:51