Using Performance Counters¶
In [1]:
!mkdir -p tmp
Using perf
¶
A Linux tool for accessing performance counters.
See also the Wiki documentation for perf
.
In [2]:
!perf list
branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] cache-misses [Hardware event] cache-references [Hardware event] cpu-cycles OR cycles [Hardware event] instructions [Hardware event] ref-cycles [Hardware event] alignment-faults [Software event] bpf-output [Software event] cgroup-switches [Software event] context-switches OR cs [Software event] cpu-clock [Software event] cpu-migrations OR migrations [Software event] dummy [Software event] emulation-faults [Software event] major-faults [Software event] minor-faults [Software event] page-faults OR faults [Software event] task-clock [Software event] tool: duration_time user_time system_time cache: L1-dcache-loads OR cpu_atom/L1-dcache-loads/ L1-dcache-stores OR cpu_atom/L1-dcache-stores/ L1-icache-loads OR cpu_atom/L1-icache-loads/ L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/ LLC-loads OR cpu_atom/LLC-loads/ LLC-load-misses OR cpu_atom/LLC-load-misses/ LLC-stores OR cpu_atom/LLC-stores/ LLC-store-misses OR cpu_atom/LLC-store-misses/ dTLB-loads OR cpu_atom/dTLB-loads/ dTLB-load-misses OR cpu_atom/dTLB-load-misses/ dTLB-stores OR cpu_atom/dTLB-stores/ dTLB-store-misses OR cpu_atom/dTLB-store-misses/ iTLB-load-misses OR cpu_atom/iTLB-load-misses/ branch-loads OR cpu_atom/branch-loads/ branch-load-misses OR cpu_atom/branch-load-misses/ L1-dcache-loads OR cpu_core/L1-dcache-loads/ L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/ L1-dcache-stores OR cpu_core/L1-dcache-stores/ L1-icache-load-misses OR cpu_core/L1-icache-load-misses/ LLC-loads OR cpu_core/LLC-loads/ LLC-load-misses OR cpu_core/LLC-load-misses/ LLC-stores OR cpu_core/LLC-stores/ LLC-store-misses OR cpu_core/LLC-store-misses/ dTLB-loads OR cpu_core/dTLB-loads/ dTLB-load-misses OR cpu_core/dTLB-load-misses/ dTLB-stores OR cpu_core/dTLB-stores/ dTLB-store-misses OR cpu_core/dTLB-store-misses/ iTLB-load-misses OR cpu_core/iTLB-load-misses/ branch-loads OR cpu_core/branch-loads/ branch-load-misses OR cpu_core/branch-load-misses/ node-loads OR cpu_core/node-loads/ node-load-misses OR cpu_core/node-load-misses/ branch-instructions OR cpu_atom/branch-instructions/[Kernel PMU event] branch-misses OR cpu_atom/branch-misses/ [Kernel PMU event] bus-cycles OR cpu_atom/bus-cycles/ [Kernel PMU event] cache-misses OR cpu_atom/cache-misses/ [Kernel PMU event] cache-references OR cpu_atom/cache-references/ [Kernel PMU event] cpu-cycles OR cpu_atom/cpu-cycles/ [Kernel PMU event] instructions OR cpu_atom/instructions/ [Kernel PMU event] mem-loads OR cpu_atom/mem-loads/ [Kernel PMU event] mem-stores OR cpu_atom/mem-stores/ [Kernel PMU event] ref-cycles OR cpu_atom/ref-cycles/ [Kernel PMU event] topdown-bad-spec OR cpu_atom/topdown-bad-spec/ [Kernel PMU event] topdown-be-bound OR cpu_atom/topdown-be-bound/ [Kernel PMU event] topdown-fe-bound OR cpu_atom/topdown-fe-bound/ [Kernel PMU event] topdown-retiring OR cpu_atom/topdown-retiring/ [Kernel PMU event] branch-instructions OR cpu_core/branch-instructions/[Kernel PMU event] branch-misses OR cpu_core/branch-misses/ [Kernel PMU event] bus-cycles OR cpu_core/bus-cycles/ [Kernel PMU event] cache-misses OR cpu_core/cache-misses/ [Kernel PMU event] cache-references OR cpu_core/cache-references/ [Kernel PMU event] cpu-cycles OR cpu_core/cpu-cycles/ [Kernel PMU event] instructions OR cpu_core/instructions/ [Kernel PMU event] mem-loads OR cpu_core/mem-loads/ [Kernel PMU event] mem-loads-aux OR cpu_core/mem-loads-aux/ [Kernel PMU event] mem-stores OR cpu_core/mem-stores/ [Kernel PMU event] ref-cycles OR cpu_core/ref-cycles/ [Kernel PMU event] slots OR cpu_core/slots/ [Kernel PMU event] topdown-bad-spec OR cpu_core/topdown-bad-spec/ [Kernel PMU event] topdown-be-bound OR cpu_core/topdown-be-bound/ [Kernel PMU event] topdown-br-mispredict OR cpu_core/topdown-br-mispredict/[Kernel PMU event] topdown-fe-bound OR cpu_core/topdown-fe-bound/ [Kernel PMU event] topdown-fetch-lat OR cpu_core/topdown-fetch-lat/ [Kernel PMU event] topdown-heavy-ops OR cpu_core/topdown-heavy-ops/ [Kernel PMU event] topdown-mem-bound OR cpu_core/topdown-mem-bound/ [Kernel PMU event] topdown-retiring OR cpu_core/topdown-retiring/ [Kernel PMU event] cstate_core/c1-residency/ [Kernel PMU event] cstate_core/c6-residency/ [Kernel PMU event] cstate_core/c7-residency/ [Kernel PMU event] cstate_pkg/c10-residency/ [Kernel PMU event] cstate_pkg/c2-residency/ [Kernel PMU event] cstate_pkg/c3-residency/ [Kernel PMU event] cstate_pkg/c6-residency/ [Kernel PMU event] cstate_pkg/c8-residency/ [Kernel PMU event] i915/actual-frequency/ [Kernel PMU event] i915/bcs0-busy/ [Kernel PMU event] i915/bcs0-sema/ [Kernel PMU event] i915/bcs0-wait/ [Kernel PMU event] i915/interrupts/ [Kernel PMU event] i915/rc6-residency/ [Kernel PMU event] i915/rcs0-busy/ [Kernel PMU event] i915/rcs0-sema/ [Kernel PMU event] i915/rcs0-wait/ [Kernel PMU event] i915/requested-frequency/ [Kernel PMU event] i915/software-gt-awake-time/ [Kernel PMU event] i915/vcs0-busy/ [Kernel PMU event] i915/vcs0-sema/ [Kernel PMU event] i915/vcs0-wait/ [Kernel PMU event] i915/vcs1-busy/ [Kernel PMU event] i915/vcs1-sema/ [Kernel PMU event] i915/vcs1-wait/ [Kernel PMU event] i915/vecs0-busy/ [Kernel PMU event] i915/vecs0-sema/ [Kernel PMU event] i915/vecs0-wait/ [Kernel PMU event] intel_bts// [Kernel PMU event] intel_pt// [Kernel PMU event] msr/aperf/ [Kernel PMU event] msr/cpu_thermal_margin/ [Kernel PMU event] msr/mperf/ [Kernel PMU event] msr/pperf/ [Kernel PMU event] msr/smi/ [Kernel PMU event] msr/tsc/ [Kernel PMU event] power/energy-cores/ [Kernel PMU event] power/energy-gpu/ [Kernel PMU event] power/energy-pkg/ [Kernel PMU event] power/energy-psys/ [Kernel PMU event] uncore_clock/clockticks/ [Kernel PMU event] uncore_imc_free_running/data_read/ [Kernel PMU event] uncore_imc_free_running/data_total/ [Kernel PMU event] uncore_imc_free_running/data_write/ [Kernel PMU event] cache: longest_lat_cache.miss [Counts the number of cacheable memory requests that miss in the LLC. Counts on a per core basis. Unit: cpu_atom] longest_lat_cache.reference [Counts the number of cacheable memory requests that access the LLC. Counts on a per core basis. Unit: cpu_atom] mem_bound_stalls.ifetch [Counts the number of cycles the core is stalled due to an instruction cache or TLB miss which hit in the L2,LLC,DRAM or MMIO (Non-DRAM). Unit: cpu_atom] mem_bound_stalls.ifetch_dram_hit [Counts the number of cycles the core is stalled due to an instruction cache or TLB miss which hit in DRAM or MMIO (Non-DRAM). Unit: cpu_atom] mem_bound_stalls.ifetch_l2_hit [Counts the number of cycles the core is stalled due to an instruction cache or TLB miss which hit in the L2 cache. Unit: cpu_atom] mem_bound_stalls.ifetch_llc_hit [Counts the number of cycles the core is stalled due to an instruction cache or TLB miss which hit in the LLC or other core with HITE/F/M. Unit: cpu_atom] mem_bound_stalls.load [Counts the number of cycles the core is stalled due to a demand load miss which hit in the L2,LLC,DRAM or MMIO (Non-DRAM). Unit: cpu_atom] mem_bound_stalls.load_dram_hit [Counts the number of cycles the core is stalled due to a demand load miss which hit in DRAM or MMIO (Non-DRAM). Unit: cpu_atom] mem_bound_stalls.load_l2_hit [Counts the number of cycles the core is stalled due to a demand load which hit in the L2 cache. Unit: cpu_atom] mem_bound_stalls.load_llc_hit [Counts the number of cycles the core is stalled due to a demand load which hit in the LLC or other core with HITE/F/M. Unit: cpu_atom] mem_load_uops_retired.dram_hit [Counts the number of load uops retired that hit in DRAM Supports address when precise (Precise event). Unit: cpu_atom] mem_load_uops_retired.l2_hit [Counts the number of load uops retired that hit in the L2 cache Supports address when precise (Precise event). Unit: cpu_atom] mem_load_uops_retired.l3_hit [Counts the number of load uops retired that hit in the L3 cache Supports address when precise (Precise event). Unit: cpu_atom] mem_scheduler_block.all [Counts the number of cycles that uops are blocked for any of the following reasons: load buffer,store buffer or RSV full. Unit: cpu_atom] mem_scheduler_block.ld_buf [Counts the number of cycles that uops are blocked due to a load buffer full condition. Unit: cpu_atom] mem_scheduler_block.rsv [Counts the number of cycles that uops are blocked due to an RSV full condition. Unit: cpu_atom] mem_scheduler_block.st_buf [Counts the number of cycles that uops are blocked due to a store buffer full condition. Unit: cpu_atom] mem_uops_retired.all_loads [Counts the number of load uops retired Supports address when precise (Precise event). Unit: cpu_atom] mem_uops_retired.all_stores [Counts the number of store uops retired Supports address when precise (Precise event). Unit: cpu_atom] mem_uops_retired.load_latency_gt_128 [Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 128 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled Supports address when precise (Must be precise). Unit: cpu_atom] mem_uops_retired.load_latency_gt_16 [Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 16 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled Supports address when precise (Must be precise). Unit: cpu_atom] mem_uops_retired.load_latency_gt_256 [Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 256 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled Supports address when precise (Must be precise). Unit: cpu_atom] mem_uops_retired.load_latency_gt_32 [Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 32 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled Supports address when precise (Must be precise). Unit: cpu_atom] mem_uops_retired.load_latency_gt_4 [Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 4 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled Supports address when precise (Must be precise). Unit: cpu_atom] mem_uops_retired.load_latency_gt_512 [Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 512 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled Supports address when precise (Must be precise). Unit: cpu_atom] mem_uops_retired.load_latency_gt_64 [Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 64 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled Supports address when precise (Must be precise). Unit: cpu_atom] mem_uops_retired.load_latency_gt_8 [Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 8 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled Supports address when precise (Must be precise). Unit: cpu_atom] mem_uops_retired.lock_loads [Counts the number of load uops retired that performed one or more locks Supports address when precise (Precise event). Unit: cpu_atom] mem_uops_retired.split_loads [Counts the number of retired split load uops Supports address when precise (Precise event). Unit: cpu_atom] mem_uops_retired.store_latency [Counts the number of stores uops retired. Counts with or without PEBS enabled Supports address when precise (Must be precise). Unit: cpu_atom] ocr.demand_data_rd.l3_hit [Counts demand data reads that were supplied by the L3 cache. Unit: cpu_atom] ocr.demand_data_rd.l3_hit.snoop_hit_no_fwd [Counts demand data reads that were supplied by the L3 cache where a snoop was sent,the snoop hit,but no data was forwarded. Unit: cpu_atom] ocr.demand_data_rd.l3_hit.snoop_hit_with_fwd [Counts demand data reads that were supplied by the L3 cache where a snoop was sent,the snoop hit,and non-modified data was forwarded. Unit: cpu_atom] ocr.demand_data_rd.l3_hit.snoop_hitm [Counts demand data reads that were supplied by the L3 cache where a snoop was sent,the snoop hit,and modified data was forwarded. Unit: cpu_atom] ocr.demand_rfo.l3_hit [Counts demand reads for ownership (RFO) and software prefetches for exclusive ownership (PREFETCHW) that were supplied by the L3 cache. Unit: cpu_atom] ocr.demand_rfo.l3_hit.snoop_hitm [Counts demand reads for ownership (RFO) and software prefetches for exclusive ownership (PREFETCHW) that were supplied by the L3 cache where a snoop was sent,the snoop hit,and modified data was forwarded. Unit: cpu_atom] topdown_fe_bound.icache [Counts the number of issue slots every cycle that were not delivered by the frontend due to instruction cache misses. Unit: cpu_atom] l1d.hwpf_miss [L1D.HWPF_MISS. Unit: cpu_core] l1d.replacement [Counts the number of cache lines replaced in L1 data cache. Unit: cpu_core] l1d_pend_miss.fb_full [Number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailability. Unit: cpu_core] l1d_pend_miss.fb_full_periods [Number of phases a demand request has waited due to L1D Fill Buffer (FB) unavailability. Unit: cpu_core] l1d_pend_miss.l2_stalls [Number of cycles a demand request has waited due to L1D due to lack of L2 resources. Unit: cpu_core] l1d_pend_miss.pending [Number of L1D misses that are outstanding. Unit: cpu_core] l1d_pend_miss.pending_cycles [Cycles with L1D load Misses outstanding. Unit: cpu_core] l2_lines_in.all [L2 cache lines filling L2. Unit: cpu_core] l2_lines_out.useless_hwpf [Cache lines that have been L2 hardware prefetched but not used by demand accesses. Unit: cpu_core] l2_request.all [All accesses to L2 cache [This event is alias to L2_RQSTS.REFERENCES]. Unit: cpu_core] l2_request.miss [Read requests with true-miss in L2 cache. [This event is alias to L2_RQSTS.MISS]. Unit: cpu_core] l2_rqsts.all_code_rd [L2 code requests. Unit: cpu_core] l2_rqsts.all_demand_data_rd [Demand Data Read access L2 cache. Unit: cpu_core] l2_rqsts.all_demand_miss [Demand requests that miss L2 cache. Unit: cpu_core] l2_rqsts.all_hwpf [L2_RQSTS.ALL_HWPF. Unit: cpu_core] l2_rqsts.all_rfo [RFO requests to L2 cache. Unit: cpu_core] l2_rqsts.code_rd_hit [L2 cache hits when fetching instructions,code reads. Unit: cpu_core] l2_rqsts.code_rd_miss [L2 cache misses when fetching instructions. Unit: cpu_core] l2_rqsts.demand_data_rd_hit [Demand Data Read requests that hit L2 cache. Unit: cpu_core] l2_rqsts.demand_data_rd_miss [Demand Data Read miss L2 cache. Unit: cpu_core] l2_rqsts.hwpf_miss [L2_RQSTS.HWPF_MISS. Unit: cpu_core] l2_rqsts.miss [Read requests with true-miss in L2 cache. [This event is alias to L2_REQUEST.MISS]. Unit: cpu_core] l2_rqsts.references [All accesses to L2 cache [This event is alias to L2_REQUEST.ALL]. Unit: cpu_core] l2_rqsts.rfo_hit [RFO requests that hit L2 cache. Unit: cpu_core] l2_rqsts.rfo_miss [RFO requests that miss L2 cache. Unit: cpu_core] l2_rqsts.swpf_hit [SW prefetch requests that hit L2 cache. Unit: cpu_core] l2_rqsts.swpf_miss [SW prefetch requests that miss L2 cache. Unit: cpu_core] l2_trans.l2_wb [L2 writebacks that access L2 cache. Unit: cpu_core] longest_lat_cache.miss [Core-originated cacheable requests that missed L3 (Except hardware prefetches to the L3). Unit: cpu_core] longest_lat_cache.reference [Core-originated cacheable requests that refer to L3 (Except hardware prefetches to the L3). Unit: cpu_core] mem_inst_retired.all_loads [Retired load instructions Supports address when precise (Precise event). Unit: cpu_core] mem_inst_retired.all_stores [Retired store instructions Supports address when precise (Precise event). Unit: cpu_core] mem_inst_retired.any [All retired memory instructions Supports address when precise (Precise event). Unit: cpu_core] mem_inst_retired.lock_loads [Retired load instructions with locked access Supports address when precise (Precise event). Unit: cpu_core] mem_inst_retired.split_loads [Retired load instructions that split across a cacheline boundary Supports address when precise (Precise event). Unit: cpu_core] mem_inst_retired.split_stores [Retired store instructions that split across a cacheline boundary Supports address when precise (Precise event). Unit: cpu_core] mem_inst_retired.stlb_miss_loads [Retired load instructions that miss the STLB Supports address when precise (Precise event). Unit: cpu_core] mem_inst_retired.stlb_miss_stores [Retired store instructions that miss the STLB Supports address when precise (Precise event). Unit: cpu_core] mem_load_completed.l1_miss_any [Completed demand load uops that miss the L1 d-cache. Unit: cpu_core] mem_load_l3_hit_retired.xsnp_fwd [Retired load instructions whose data sources were HitM responses from shared L3 Supports address when precise (Precise event). Unit: cpu_core] mem_load_l3_hit_retired.xsnp_hit [Retired load instructions whose data sources were L3 and cross-core snoop hits in on-pkg core cache Supports address when precise (Precise event). Unit: cpu_core] mem_load_l3_hit_retired.xsnp_hitm [Retired load instructions whose data sources were HitM responses from shared L3 Supports address when precise (Precise event). Unit: cpu_core] mem_load_l3_hit_retired.xsnp_miss [Retired load instructions whose data sources were L3 hit and cross-core snoop missed in on-pkg core cache Supports address when precise (Precise event). Unit: cpu_core] mem_load_l3_hit_retired.xsnp_no_fwd [Retired load instructions whose data sources were L3 and cross-core snoop hits in on-pkg core cache Supports address when precise (Precise event). Unit: cpu_core] mem_load_l3_hit_retired.xsnp_none [Retired load instructions whose data sources were hits in L3 without snoops required Supports address when precise (Precise event). Unit: cpu_core] mem_load_l3_miss_retired.local_dram [Retired load instructions which data sources missed L3 but serviced from local dram Supports address when precise (Precise event). Unit: cpu_core] mem_load_misc_retired.uc [Retired instructions with at least 1 uncacheable load or lock Supports address when precise (Precise event). Unit: cpu_core] mem_load_retired.fb_hit [Number of completed demand load requests that missed the L1,but hit the FB(fill buffer),because a preceding miss to the same cacheline initiated the line to be brought into L1,but data is not yet ready in L1 Supports address when precise (Precise event). Unit: cpu_core] mem_load_retired.l1_hit [Retired load instructions with L1 cache hits as data sources Supports address when precise (Precise event). Unit: cpu_core] mem_load_retired.l1_miss [Retired load instructions missed L1 cache as data sources Supports address when precise (Precise event). Unit: cpu_core] mem_load_retired.l2_hit [Retired load instructions with L2 cache hits as data sources Supports address when precise (Precise event). Unit: cpu_core] mem_load_retired.l2_miss [Retired load instructions missed L2 cache as data sources Supports address when precise (Precise event). Unit: cpu_core] mem_load_retired.l3_hit [Retired load instructions with L3 cache hits as data sources Supports address when precise (Precise event). Unit: cpu_core] mem_load_retired.l3_miss [Retired load instructions missed L3 cache as data sources Supports address when precise (Precise event). Unit: cpu_core] mem_store_retired.l2_hit [MEM_STORE_RETIRED.L2_HIT. Unit: cpu_core] mem_uop_retired.any [Retired memory uops for any access. Unit: cpu_core] ocr.demand_data_rd.l3_hit.snoop_hit_with_fwd [Counts demand data reads that resulted in a snoop hit in another cores caches which forwarded the unmodified data to the requesting core. Unit: cpu_core] ocr.demand_data_rd.l3_hit.snoop_hitm [Counts demand data reads that resulted in a snoop hit in another cores caches,data forwarding is required as the data is modified. Unit: cpu_core] ocr.demand_rfo.l3_hit.snoop_hitm [Counts demand read for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that resulted in a snoop hit in another cores caches,data forwarding is required as the data is modified. Unit: cpu_core] offcore_requests.all_requests [OFFCORE_REQUESTS.ALL_REQUESTS. Unit: cpu_core] offcore_requests.data_rd [Demand and prefetch data reads. Unit: cpu_core] offcore_requests.demand_code_rd [Cacheable and noncacheable code read requests. Unit: cpu_core] offcore_requests.demand_data_rd [Demand Data Read requests sent to uncore. Unit: cpu_core] offcore_requests.demand_rfo [Demand RFO requests including regular RFOs,locks,ItoM. Unit: cpu_core] offcore_requests_outstanding.cycles_with_data_rd [OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD Spec update: ADL038. Unit: cpu_core] offcore_requests_outstanding.cycles_with_demand_code_rd [Cycles with offcore outstanding Code Reads transactions in the SuperQueue (SQ),queue to uncore. Unit: cpu_core] offcore_requests_outstanding.cycles_with_demand_data_rd [Cycles where at least 1 outstanding demand data read request is pending. Unit: cpu_core] offcore_requests_outstanding.cycles_with_demand_rfo [For every cycle where the core is waiting on at least 1 outstanding Demand RFO request,increments by 1. Unit: cpu_core] offcore_requests_outstanding.data_rd [OFFCORE_REQUESTS_OUTSTANDING.DATA_RD Spec update: ADL038. Unit: cpu_core] offcore_requests_outstanding.demand_code_rd [Offcore outstanding Code Reads transactions in the SuperQueue (SQ),queue to uncore,every cycle. Unit: cpu_core] offcore_requests_outstanding.demand_data_rd [For every cycle,increments by the number of outstanding demand data read requests pending. Unit: cpu_core] sq_misc.bus_lock [Counts bus locks,accounts for cache line split locks and UC locks. Unit: cpu_core] sw_prefetch_access.any [Counts the number of PREFETCHNTA,PREFETCHW,PREFETCHT0,PREFETCHT1 or PREFETCHT2 instructions executed. Unit: cpu_core] sw_prefetch_access.nta [Number of PREFETCHNTA instructions executed. Unit: cpu_core] sw_prefetch_access.prefetchw [Number of PREFETCHW instructions executed. Unit: cpu_core] sw_prefetch_access.t0 [Number of PREFETCHT0 instructions executed. Unit: cpu_core] sw_prefetch_access.t1_t2 [Number of PREFETCHT1 or PREFETCHT2 instructions executed. Unit: cpu_core] floating point: machine_clears.fp_assist [Counts the number of floating point operations retired that required microcode assist. Unit: cpu_atom] uops_retired.fpdiv [Counts the number of floating point divide uops retired (x87 and SSE, including x87 sqrt) (Precise event). Unit: cpu_atom] arith.fpdiv_active [ARITH.FPDIV_ACTIVE. Unit: cpu_core] assists.fp [Counts all microcode FP assists. Unit: cpu_core] assists.sse_avx_mix [ASSISTS.SSE_AVX_MIX. Unit: cpu_core] fp_arith_dispatched.port_0 [FP_ARITH_DISPATCHED.PORT_0 [This event is alias to FP_ARITH_DISPATCHED.V0]. Unit: cpu_core] fp_arith_dispatched.port_1 [FP_ARITH_DISPATCHED.PORT_1 [This event is alias to FP_ARITH_DISPATCHED.V1]. Unit: cpu_core] fp_arith_dispatched.port_5 [FP_ARITH_DISPATCHED.PORT_5 [This event is alias to FP_ARITH_DISPATCHED.V2]. Unit: cpu_core] fp_arith_dispatched.v0 [FP_ARITH_DISPATCHED.V0 [This event is alias to FP_ARITH_DISPATCHED.PORT_0]. Unit: cpu_core] fp_arith_dispatched.v1 [FP_ARITH_DISPATCHED.V1 [This event is alias to FP_ARITH_DISPATCHED.PORT_1]. Unit: cpu_core] fp_arith_dispatched.v2 [FP_ARITH_DISPATCHED.V2 [This event is alias to FP_ARITH_DISPATCHED.PORT_5]. Unit: cpu_core] fp_arith_inst_retired.128b_packed_double [Counts number of SSE/AVX computational 128-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 2 computation operations,one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. Unit: cpu_core] fp_arith_inst_retired.128b_packed_single [Number of SSE/AVX computational 128-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations,one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. Unit: cpu_core] fp_arith_inst_retired.256b_packed_double [Counts number of SSE/AVX computational 256-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations,one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. Unit: cpu_core] fp_arith_inst_retired.256b_packed_single [Counts number of SSE/AVX computational 256-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations,one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. Unit: cpu_core] fp_arith_inst_retired.4_flops [Number of SSE/AVX computational 128-bit packed single and 256-bit packed double precision FP instructions retired; some instructions will count twice as noted below. Each count represents 2 or/and 4 computation operations,1 for each element. Applies to SSE* and AVX* packed single precision and packed double precision FP instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB count twice as they perform 2 calculations per element. Unit: cpu_core] fp_arith_inst_retired.scalar [Number of SSE/AVX computational scalar floating-point instructions retired; some instructions will count twice as noted below. Applies to SSE* and AVX* scalar,double and single precision floating-point: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 RANGE SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform multiple calculations per element. Unit: cpu_core] fp_arith_inst_retired.scalar_double [Counts number of SSE/AVX computational scalar double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. Unit: cpu_core] fp_arith_inst_retired.scalar_single [Counts number of SSE/AVX computational scalar single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. Unit: cpu_core] fp_arith_inst_retired.vector [Number of any Vector retired FP arithmetic instructions. Unit: cpu_core] frontend: baclears.any [Counts the total number of BACLEARS due to all branch types including conditional and unconditional jumps,returns,and indirect branches. Unit: cpu_atom] icache.accesses [Counts the number of requests to the instruction cache for one or more bytes of a cache line. Unit: cpu_atom] icache.misses [Counts the number of instruction cache misses. Unit: cpu_atom] baclears.any [Clears due to Unknown Branches. Unit: cpu_core] decode.lcp [Stalls caused by changing prefix length of the instruction. Unit: cpu_core] decode.ms_busy [Cycles the Microcode Sequencer is busy. Unit: cpu_core] dsb2mite_switches.penalty_cycles [DSB-to-MITE switch true penalty cycles. Unit: cpu_core] frontend_retired.any_dsb_miss [Retired Instructions who experienced DSB miss (Precise event). Unit: cpu_core] frontend_retired.dsb_miss [Retired Instructions who experienced a critical DSB miss (Precise event). Unit: cpu_core] frontend_retired.itlb_miss [Retired Instructions who experienced iTLB true miss (Precise event). Unit: cpu_core] frontend_retired.l1i_miss [Retired Instructions who experienced Instruction L1 Cache true miss (Precise event). Unit: cpu_core] frontend_retired.l2_miss [Retired Instructions who experienced Instruction L2 Cache true miss (Precise event). Unit: cpu_core] frontend_retired.latency_ge_1 [Retired instructions after front-end starvation of at least 1 cycle (Precise event). Unit: cpu_core] frontend_retired.latency_ge_128 [Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 128 cycles which was not interrupted by a back-end stall (Precise event). Unit: cpu_core] frontend_retired.latency_ge_16 [Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 16 cycles which was not interrupted by a back-end stall (Precise event). Unit: cpu_core] frontend_retired.latency_ge_2 [Retired instructions after front-end starvation of at least 2 cycles (Precise event). Unit: cpu_core] frontend_retired.latency_ge_256 [Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 256 cycles which was not interrupted by a back-end stall (Precise event). Unit: cpu_core] frontend_retired.latency_ge_2_bubbles_ge_1 [Retired instructions that are fetched after an interval where the front-end had at least 1 bubble-slot for a period of 2 cycles which was not interrupted by a back-end stall (Precise event). Unit: cpu_core] frontend_retired.latency_ge_32 [Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 32 cycles which was not interrupted by a back-end stall (Precise event). Unit: cpu_core] frontend_retired.latency_ge_4 [Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 4 cycles which was not interrupted by a back-end stall (Precise event). Unit: cpu_core] frontend_retired.latency_ge_512 [Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 512 cycles which was not interrupted by a back-end stall (Precise event). Unit: cpu_core] frontend_retired.latency_ge_64 [Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 64 cycles which was not interrupted by a back-end stall (Precise event). Unit: cpu_core] frontend_retired.latency_ge_8 [Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 8 cycles which was not interrupted by a back-end stall (Precise event). Unit: cpu_core] frontend_retired.ms_flows [FRONTEND_RETIRED.MS_FLOWS (Precise event). Unit: cpu_core] frontend_retired.stlb_miss [Retired Instructions who experienced STLB (2nd level TLB) true miss (Precise event). Unit: cpu_core] frontend_retired.unknown_branch [FRONTEND_RETIRED.UNKNOWN_BRANCH (Precise event). Unit: cpu_core] icache_data.stall_periods [ICACHE_DATA.STALL_PERIODS. Unit: cpu_core] icache_data.stalls [Cycles where a code fetch is stalled due to L1 instruction cache miss. Unit: cpu_core] icache_tag.stalls [Cycles where a code fetch is stalled due to L1 instruction cache tag miss. Unit: cpu_core] idq.dsb_cycles_any [Cycles Decode Stream Buffer (DSB) is delivering any Uop. Unit: cpu_core] idq.dsb_cycles_ok [Cycles DSB is delivering optimal number of Uops. Unit: cpu_core] idq.dsb_uops [Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Unit: cpu_core] idq.mite_cycles_any [Cycles MITE is delivering any Uop. Unit: cpu_core] idq.mite_cycles_ok [Cycles MITE is delivering optimal number of Uops. Unit: cpu_core] idq.mite_uops [Uops delivered to Instruction Decode Queue (IDQ) from MITE path. Unit: cpu_core] idq.ms_cycles_any [Cycles when uops are being delivered to IDQ while MS is busy. Unit: cpu_core] idq.ms_switches [Number of switches from DSB or MITE to the MS. Unit: cpu_core] idq.ms_uops [Uops delivered to IDQ while MS is busy. Unit: cpu_core] idq_bubbles.core [Uops not delivered by IDQ when backend of the machine is not stalled [This event is alias to IDQ_UOPS_NOT_DELIVERED.CORE]. Unit: cpu_core] idq_bubbles.cycles_0_uops_deliv.core [Cycles when no uops are not delivered by the IDQ when backend of the machine is not stalled [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE]. Unit: cpu_core] idq_bubbles.cycles_fe_was_ok [Cycles when optimal number of uops was delivered to the back-end when the back-end is not stalled [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK]. Unit: cpu_core] idq_uops_not_delivered.core [Uops not delivered by IDQ when backend of the machine is not stalled [This event is alias to IDQ_BUBBLES.CORE]. Unit: cpu_core] idq_uops_not_delivered.cycles_0_uops_deliv.core [Cycles when no uops are not delivered by the IDQ when backend of the machine is not stalled [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE]. Unit: cpu_core] idq_uops_not_delivered.cycles_fe_was_ok [Cycles when optimal number of uops was delivered to the back-end when the back-end is not stalled [This event is alias to IDQ_BUBBLES.CYCLES_FE_WAS_OK]. Unit: cpu_core] memory: ld_head.any_at_ret [Counts the number of cycles that the head (oldest load) of the load buffer is stalled due to any number of reasons,including an L1 miss,WCB full,pagewalk,store address block or store data block,on a load that retires. Unit: cpu_atom] ld_head.l1_bound_at_ret [Counts the number of cycles that the head (oldest load) of the load buffer is stalled due to a core bound stall including a store address match,a DTLB miss or a page walk that detains the load from retiring. Unit: cpu_atom] ld_head.l1_miss_at_ret [Counts the number of cycles that the head (oldest load) of the load buffer and retirement are both stalled due to a DL1 miss. Unit: cpu_atom] ld_head.other_at_ret [Counts the number of cycles that the head (oldest load) of the load buffer and retirement are both stalled due to other block cases. Unit: cpu_atom] ld_head.pgwalk_at_ret [Counts the number of cycles that the head (oldest load) of the load buffer and retirement are both stalled due to a pagewalk. Unit: cpu_atom] ld_head.st_addr_at_ret [Counts the number of cycles that the head (oldest load) of the load buffer and retirement are both stalled due to a store address match. Unit: cpu_atom] machine_clears.memory_ordering [Counts the number of machine clears due to memory ordering caused by a snoop from an external agent. Does not count internally generated machine clears such as those due to memory disambiguation. Unit: cpu_atom] ocr.demand_data_rd.l3_miss [Counts demand data reads that were not supplied by the L3 cache. Unit: cpu_atom] ocr.demand_data_rd.l3_miss_local [Counts demand data reads that were not supplied by the L3 cache. [L3_MISS_LOCAL is alias to L3_MISS]. Unit: cpu_atom] ocr.demand_rfo.l3_miss [Counts demand reads for ownership (RFO) and software prefetches for exclusive ownership (PREFETCHW) that were not supplied by the L3 cache. Unit: cpu_atom] ocr.demand_rfo.l3_miss_local [Counts demand reads for ownership (RFO) and software prefetches for exclusive ownership (PREFETCHW) that were not supplied by the L3 cache. [L3_MISS_LOCAL is alias to L3_MISS]. Unit: cpu_atom] cycle_activity.stalls_l3_miss [Execution stalls while L3 cache miss demand load is outstanding. Unit: cpu_core] machine_clears.memory_ordering [Number of machine clears due to memory ordering conflicts. Unit: cpu_core] mem_trans_retired.load_latency_gt_1024 [Counts randomly selected loads when the latency from first dispatch to completion is greater than 1024 cycles Supports address when precise (Must be precise). Unit: cpu_core] mem_trans_retired.load_latency_gt_128 [Counts randomly selected loads when the latency from first dispatch to completion is greater than 128 cycles Supports address when precise (Must be precise). Unit: cpu_core] mem_trans_retired.load_latency_gt_16 [Counts randomly selected loads when the latency from first dispatch to completion is greater than 16 cycles Supports address when precise (Must be precise). Unit: cpu_core] mem_trans_retired.load_latency_gt_256 [Counts randomly selected loads when the latency from first dispatch to completion is greater than 256 cycles Supports address when precise (Must be precise). Unit: cpu_core] mem_trans_retired.load_latency_gt_32 [Counts randomly selected loads when the latency from first dispatch to completion is greater than 32 cycles Supports address when precise (Must be precise). Unit: cpu_core] mem_trans_retired.load_latency_gt_4 [Counts randomly selected loads when the latency from first dispatch to completion is greater than 4 cycles Supports address when precise (Must be precise). Unit: cpu_core] mem_trans_retired.load_latency_gt_512 [Counts randomly selected loads when the latency from first dispatch to completion is greater than 512 cycles Supports address when precise (Must be precise). Unit: cpu_core] mem_trans_retired.load_latency_gt_64 [Counts randomly selected loads when the latency from first dispatch to completion is greater than 64 cycles Supports address when precise (Must be precise). Unit: cpu_core] mem_trans_retired.load_latency_gt_8 [Counts randomly selected loads when the latency from first dispatch to completion is greater than 8 cycles Supports address when precise (Must be precise). Unit: cpu_core] mem_trans_retired.store_sample [Retired memory store access operations. A PDist event for PEBS Store Latency Facility Supports address when precise (Must be precise). Unit: cpu_core] memory_activity.cycles_l1d_miss [Cycles while L1 cache miss demand load is outstanding. Unit: cpu_core] memory_activity.stalls_l1d_miss [Execution stalls while L1 cache miss demand load is outstanding. Unit: cpu_core] memory_activity.stalls_l2_miss [Execution stalls while L2 cache miss demand cacheable load request is outstanding. Unit: cpu_core] memory_activity.stalls_l3_miss [Execution stalls while L3 cache miss demand cacheable load request is outstanding. Unit: cpu_core] ocr.demand_data_rd.l3_miss [Counts demand data reads that were not supplied by the L3 cache. Unit: cpu_core] ocr.demand_rfo.l3_miss [Counts demand read for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that were not supplied by the L3 cache. Unit: cpu_core] offcore_requests.l3_miss_demand_data_rd [Counts demand data read requests that miss the L3 cache. Unit: cpu_core] offcore_requests_outstanding.l3_miss_demand_data_rd [For every cycle,increments by the number of demand data read requests pending that are known to have missed the L3 cache. Unit: cpu_core] other: ocr.corewb_m.any_response [Counts modified writebacks from L1 cache and L2 cache that have any type of response. Unit: cpu_atom] ocr.demand_data_rd.any_response [Counts demand data reads that have any type of response. Unit: cpu_atom] ocr.demand_rfo.any_response [Counts demand reads for ownership (RFO) and software prefetches for exclusive ownership (PREFETCHW) that have any type of response. Unit: cpu_atom] ocr.streaming_wr.any_response [Counts streaming stores that have any type of response. Unit: cpu_atom] serialization.c01_ms_scb [Counts the number of issue slots in a UMWAIT or TPAUSE instruction where no uop issues due to the instruction putting the CPU into the C0.1 activity state. For Tremont,UMWAIT and TPAUSE will only put the CPU into C0.1 activity state (not C0.2 activity state). Unit: cpu_atom] assists.hardware [ASSISTS.HARDWARE. Unit: cpu_core] assists.page_fault [ASSISTS.PAGE_FAULT. Unit: cpu_core] core_power.license_1 [CORE_POWER.LICENSE_1. Unit: cpu_core] core_power.license_2 [CORE_POWER.LICENSE_2. Unit: cpu_core] core_power.license_3 [CORE_POWER.LICENSE_3. Unit: cpu_core] ocr.demand_data_rd.any_response [Counts demand data reads that have any type of response. Unit: cpu_core] ocr.demand_data_rd.dram [Counts demand data reads that were supplied by DRAM. Unit: cpu_core] ocr.demand_rfo.any_response [Counts demand read for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that have any type of response. Unit: cpu_core] ocr.streaming_wr.any_response [Counts streaming stores that have any type of response. Unit: cpu_core] rs.empty [Cycles when Reservation Station (RS) is empty for the thread. Unit: cpu_core] rs.empty_count [Counts end of periods where the Reservation Station (RS) was empty. Unit: cpu_core] rs.empty_resource [Cycles when Reservation Station (RS) is empty due to a resource in the back-end. Unit: cpu_core] xq.full_cycles [Cycles the uncore cannot take further requests. Unit: cpu_core] pipeline: br_inst_retired.all_branches [Counts the total number of branch instructions retired for all branch types (Precise event). Unit: cpu_atom] br_inst_retired.cond [Counts the number of retired JCC (Jump on Conditional Code) branch instructions retired,includes both taken and not taken branches (Precise event). Unit: cpu_atom] br_inst_retired.cond_taken [Counts the number of taken JCC (Jump on Conditional Code) branch instructions retired (Precise event). Unit: cpu_atom] br_inst_retired.far_branch [Counts the number of far branch instructions retired,includes far jump, far call and return,and interrupt call and return (Precise event). Unit: cpu_atom] br_inst_retired.indirect [Counts the number of near indirect JMP and near indirect CALL branch instructions retired (Precise event). Unit: cpu_atom] br_inst_retired.indirect_call [Counts the number of near indirect CALL branch instructions retired (Precise event). Unit: cpu_atom] br_inst_retired.near_call [Counts the number of near CALL branch instructions retired (Precise event). Unit: cpu_atom] br_inst_retired.near_return [Counts the number of near RET branch instructions retired (Precise event). Unit: cpu_atom] br_inst_retired.near_taken [Counts the number of near taken branch instructions retired (Precise event). Unit: cpu_atom] br_inst_retired.rel_call [Counts the number of near relative CALL branch instructions retired (Precise event). Unit: cpu_atom] br_misp_retired.all_branches [Counts the total number of mispredicted branch instructions retired for all branch types (Precise event). Unit: cpu_atom] br_misp_retired.cond [Counts the number of mispredicted JCC (Jump on Conditional Code) branch instructions retired (Precise event). Unit: cpu_atom] br_misp_retired.cond_taken [Counts the number of mispredicted taken JCC (Jump on Conditional Code) branch instructions retired (Precise event). Unit: cpu_atom] br_misp_retired.indirect [Counts the number of mispredicted near indirect JMP and near indirect CALL branch instructions retired (Precise event). Unit: cpu_atom] br_misp_retired.indirect_call [Counts the number of mispredicted near indirect CALL branch instructions retired (Precise event). Unit: cpu_atom] br_misp_retired.near_taken [Counts the number of mispredicted near taken branch instructions retired (Precise event). Unit: cpu_atom] br_misp_retired.return [Counts the number of mispredicted near RET branch instructions retired (Precise event). Unit: cpu_atom] cpu_clk_unhalted.core [Counts the number of unhalted core clock cycles. (Fixed event). Unit: cpu_atom] cpu_clk_unhalted.core_p [Counts the number of unhalted core clock cycles. Unit: cpu_atom] cpu_clk_unhalted.ref_tsc [Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event). Unit: cpu_atom] cpu_clk_unhalted.ref_tsc_p [Counts the number of unhalted reference clock cycles at TSC frequency. Unit: cpu_atom] cpu_clk_unhalted.thread [Counts the number of unhalted core clock cycles. (Fixed event). Unit: cpu_atom] cpu_clk_unhalted.thread_p [Counts the number of unhalted core clock cycles. Unit: cpu_atom] inst_retired.any [Counts the total number of instructions retired. (Fixed event) (Precise event). Unit: cpu_atom] inst_retired.any_p [Counts the total number of instructions retired (Precise event). Unit: cpu_atom] ld_blocks.address_alias [Counts the number of retired loads that are blocked because it initially appears to be store forward blocked,but subsequently is shown not to be blocked based on 4K alias check (Precise event). Unit: cpu_atom] ld_blocks.data_unknown [Counts the number of retired loads that are blocked because its address exactly matches an older store whose data is not ready (Precise event). Unit: cpu_atom] machine_clears.disambiguation [Counts the number of machine clears due to memory ordering in which an internal load passes an older store within the same CPU. Unit: cpu_atom] machine_clears.mrn_nuke [Counts the number of machines clears due to memory renaming. Unit: cpu_atom] machine_clears.page_fault [Counts the number of machine clears due to a page fault. Counts both I-Side and D-Side (Loads/Stores) page faults. A page fault occurs when either the page is not present,or an access violation occurs. Unit: cpu_atom] machine_clears.slow [Counts the number of machine clears that flush the pipeline and restart the machine with the use of microcode due to SMC,MEMORY_ORDERING, FP_ASSISTS,PAGE_FAULT,DISAMBIGUATION,and FPC_VIRTUAL_TRAP. Unit: cpu_atom] machine_clears.smc [Counts the number of machine clears due to program modifying data (self modifying code) within 1K of a recently fetched code page. Unit: cpu_atom] misc_retired.lbr_inserts [Counts the number of LBR entries recorded. Requires LBRs to be enabled in IA32_LBR_CTL. [This event is alias to LBR_INSERTS.ANY] (Precise event). Unit: cpu_atom] serialization.non_c01_ms_scb [Counts the number of issue slots not consumed by the backend due to a micro-sequencer (MS) scoreboard,which stalls the front-end from issuing from the UROM until a specified older uop retires. Unit: cpu_atom] topdown_bad_speculation.all [Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Unit: cpu_atom] topdown_bad_speculation.fastnuke [Counts the number of issue slots every cycle that were not consumed by the backend due to fast nukes such as memory ordering and memory disambiguation machine clears. Unit: cpu_atom] topdown_bad_speculation.machine_clears [Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a machine clear (nuke) of any kind including memory ordering and memory disambiguation. Unit: cpu_atom] topdown_bad_speculation.mispredict [Counts the number of issue slots every cycle that were not consumed by the backend due to branch mispredicts. Unit: cpu_atom] topdown_bad_speculation.nuke [Counts the number of issue slots every cycle that were not consumed by the backend due to a machine clear (nuke). Unit: cpu_atom] topdown_be_bound.all [Counts the total number of issue slots every cycle that were not consumed by the backend due to backend stalls. Unit: cpu_atom] topdown_be_bound.alloc_restrictions [Counts the number of issue slots every cycle that were not consumed by the backend due to certain allocation restrictions. Unit: cpu_atom] topdown_be_bound.mem_scheduler [Counts the number of issue slots every cycle that were not consumed by the backend due to memory reservation stalls in which a scheduler is not able to accept uops. Unit: cpu_atom] topdown_be_bound.non_mem_scheduler [Counts the number of issue slots every cycle that were not consumed by the backend due to IEC or FPC RAT stalls,which can be due to FIQ or IEC reservation stalls in which the integer,floating point or SIMD scheduler is not able to accept uops. Unit: cpu_atom] topdown_be_bound.register [Counts the number of issue slots every cycle that were not consumed by the backend due to the physical register file unable to accept an entry (marble stalls). Unit: cpu_atom] topdown_be_bound.reorder_buffer [Counts the number of issue slots every cycle that were not consumed by the backend due to the reorder buffer being full (ROB stalls). Unit: cpu_atom] topdown_be_bound.serialization [Counts the number of issue slots every cycle that were not consumed by the backend due to scoreboards from the instruction queue (IQ),jump execution unit (JEU),or microcode sequencer (MS). Unit: cpu_atom] topdown_fe_bound.all [Counts the total number of issue slots every cycle that were not consumed by the backend due to frontend stalls. Unit: cpu_atom] topdown_fe_bound.branch_detect [Counts the number of issue slots every cycle that were not delivered by the frontend due to BACLEARS. Unit: cpu_atom] topdown_fe_bound.branch_resteer [Counts the number of issue slots every cycle that were not delivered by the frontend due to BTCLEARS. Unit: cpu_atom] topdown_fe_bound.cisc [Counts the number of issue slots every cycle that were not delivered by the frontend due to the microcode sequencer (MS). Unit: cpu_atom] topdown_fe_bound.decode [Counts the number of issue slots every cycle that were not delivered by the frontend due to decode stalls. Unit: cpu_atom] topdown_fe_bound.frontend_bandwidth [Counts the number of issue slots every cycle that were not delivered by the frontend due to frontend bandwidth restrictions due to decode, predecode,cisc,and other limitations. Unit: cpu_atom] topdown_fe_bound.frontend_latency [Counts the number of issue slots every cycle that were not delivered by the frontend due to a latency related stalls including BACLEARs,BTCLEARs, ITLB misses,and ICache misses. Unit: cpu_atom] topdown_fe_bound.itlb [Counts the number of issue slots every cycle that were not delivered by the frontend due to ITLB misses. Unit: cpu_atom] topdown_fe_bound.other [Counts the number of issue slots every cycle that were not delivered by the frontend due to other common frontend stalls not categorized. Unit: cpu_atom] topdown_fe_bound.predecode [Counts the number of issue slots every cycle that were not delivered by the frontend due to wrong predecodes. Unit: cpu_atom] topdown_retiring.all [Counts the total number of consumed retirement slots (Precise event). Unit: cpu_atom] uops_issued.any [Counts the number of uops issued by the front end every cycle. Unit: cpu_atom] uops_retired.all [Counts the total number of uops retired (Precise event). Unit: cpu_atom] uops_retired.idiv [Counts the number of integer divide uops retired (Precise event). Unit: cpu_atom] uops_retired.ms [Counts the number of uops that are from complex flows issued by the micro-sequencer (MS) (Precise event). Unit: cpu_atom] uops_retired.x87 [Counts the number of x87 uops retired,includes those in MS flows (Precise event). Unit: cpu_atom] arith.div_active [Cycles when divide unit is busy executing divide or square root operations. Unit: cpu_core] arith.idiv_active [This event counts the cycles the integer divider is busy. Unit: cpu_core] assists.any [Number of occurrences where a microcode assist is invoked by hardware. Unit: cpu_core] br_inst_retired.all_branches [All branch instructions retired (Precise event). Unit: cpu_core] br_inst_retired.cond [Conditional branch instructions retired (Precise event). Unit: cpu_core] br_inst_retired.cond_ntaken [Not taken branch instructions retired (Precise event). Unit: cpu_core] br_inst_retired.cond_taken [Taken conditional branch instructions retired (Precise event). Unit: cpu_core] br_inst_retired.far_branch [Far branch instructions retired (Precise event). Unit: cpu_core] br_inst_retired.indirect [Indirect near branch instructions retired (excluding returns) (Precise event). Unit: cpu_core] br_inst_retired.near_call [Direct and indirect near call instructions retired (Precise event). Unit: cpu_core] br_inst_retired.near_return [Return instructions retired (Precise event). Unit: cpu_core] br_inst_retired.near_taken [Taken branch instructions retired (Precise event). Unit: cpu_core] br_misp_retired.all_branches [All mispredicted branch instructions retired (Precise event). Unit: cpu_core] br_misp_retired.cond [Mispredicted conditional branch instructions retired (Precise event). Unit: cpu_core] br_misp_retired.cond_ntaken [Mispredicted non-taken conditional branch instructions retired (Precise event). Unit: cpu_core] br_misp_retired.cond_taken [number of branch instructions retired that were mispredicted and taken (Precise event). Unit: cpu_core] br_misp_retired.indirect [Miss-predicted near indirect branch instructions retired (excluding returns) (Precise event). Unit: cpu_core] br_misp_retired.indirect_call [Mispredicted indirect CALL retired (Precise event). Unit: cpu_core] br_misp_retired.near_taken [Number of near branch instructions retired that were mispredicted and taken (Precise event). Unit: cpu_core] br_misp_retired.ret [This event counts the number of mispredicted ret instructions retired. Non PEBS (Precise event). Unit: cpu_core] cpu_clk_unhalted.c01 [Core clocks when the thread is in the C0.1 light-weight slower wakeup time but more power saving optimized state. Unit: cpu_core] cpu_clk_unhalted.c02 [Core clocks when the thread is in the C0.2 light-weight faster wakeup time but less power saving optimized state. Unit: cpu_core] cpu_clk_unhalted.c0_wait [Core clocks when the thread is in the C0.1 or C0.2 or running a PAUSE in C0 ACPI state. Unit: cpu_core] cpu_clk_unhalted.distributed [Cycle counts are evenly distributed between active threads in the Core. Unit: cpu_core] cpu_clk_unhalted.one_thread_active [Core crystal clock cycles when this thread is unhalted and the other thread is halted. Unit: cpu_core] cpu_clk_unhalted.pause [CPU_CLK_UNHALTED.PAUSE. Unit: cpu_core] cpu_clk_unhalted.pause_inst [CPU_CLK_UNHALTED.PAUSE_INST. Unit: cpu_core] cpu_clk_unhalted.ref_distributed [Core crystal clock cycles. Cycle counts are evenly distributed between active threads in the Core. Unit: cpu_core] cpu_clk_unhalted.ref_tsc [Reference cycles when the core is not in halt state. Unit: cpu_core] cpu_clk_unhalted.ref_tsc_p [Reference cycles when the core is not in halt state. Unit: cpu_core] cpu_clk_unhalted.thread [Core cycles when the thread is not in halt state. Unit: cpu_core] cpu_clk_unhalted.thread_p [Thread cycles when thread is not in halt state. Unit: cpu_core] cycle_activity.cycles_l1d_miss [Cycles while L1 cache miss demand load is outstanding. Unit: cpu_core] cycle_activity.cycles_l2_miss [Cycles while L2 cache miss demand load is outstanding. Unit: cpu_core] cycle_activity.cycles_mem_any [Cycles while memory subsystem has an outstanding load. Unit: cpu_core] cycle_activity.stalls_l1d_miss [Execution stalls while L1 cache miss demand load is outstanding. Unit: cpu_core] cycle_activity.stalls_l2_miss [Execution stalls while L2 cache miss demand load is outstanding. Unit: cpu_core] cycle_activity.stalls_total [Total execution stalls. Unit: cpu_core] exe_activity.1_ports_util [Cycles total of 1 uop is executed on all ports and Reservation Station was not empty. Unit: cpu_core] exe_activity.2_3_ports_util [Cycles total of 2 or 3 uops are executed on all ports and Reservation Station (RS) was not empty. Unit: cpu_core] exe_activity.2_ports_util [Cycles total of 2 uops are executed on all ports and Reservation Station was not empty. Unit: cpu_core] exe_activity.3_ports_util [Cycles total of 3 uops are executed on all ports and Reservation Station was not empty. Unit: cpu_core] exe_activity.4_ports_util [Cycles total of 4 uops are executed on all ports and Reservation Station was not empty. Unit: cpu_core] exe_activity.bound_on_loads [Execution stalls while memory subsystem has an outstanding load. Unit: cpu_core] exe_activity.bound_on_stores [Cycles where the Store Buffer was full and no loads caused an execution stall. Unit: cpu_core] exe_activity.exe_bound_0_ports [Cycles no uop executed while RS was not empty,the SB was not full and there was no outstanding load. Unit: cpu_core] inst_decoded.decoders [Instruction decoders utilized in a cycle. Unit: cpu_core] inst_retired.any [Number of instructions retired. Fixed Counter - architectural event (Precise event). Unit: cpu_core] inst_retired.any_p [Number of instructions retired. General Counter - architectural event (Precise event). Unit: cpu_core] inst_retired.macro_fused [INST_RETIRED.MACRO_FUSED (Precise event). Unit: cpu_core] inst_retired.nop [Retired NOP instructions (Precise event). Unit: cpu_core] inst_retired.prec_dist [Precise instruction retired with PEBS precise-distribution (Precise event). Unit: cpu_core] inst_retired.rep_iteration [Iterations of Repeat string retired instructions (Precise event). Unit: cpu_core] int_misc.clear_resteer_cycles [Counts cycles after recovery from a branch misprediction or machine clear till the first uop is issued from the resteered path. Unit: cpu_core] int_misc.clears_count [Clears speculative count. Unit: cpu_core] int_misc.recovery_cycles [Core cycles the allocator was stalled due to recovery from earlier clear event for this thread. Unit: cpu_core] int_misc.unknown_branch_cycles [Bubble cycles of BAClear (Unknown Branch). Unit: cpu_core] int_misc.uop_dropping [TMA slots where uops got dropped. Unit: cpu_core] int_vec_retired.128bit [INT_VEC_RETIRED.128BIT. Unit: cpu_core] int_vec_retired.256bit [INT_VEC_RETIRED.256BIT. Unit: cpu_core] int_vec_retired.add_128 [integer ADD,SUB,SAD 128-bit vector instructions. Unit: cpu_core] int_vec_retired.add_256 [integer ADD,SUB,SAD 256-bit vector instructions. Unit: cpu_core] int_vec_retired.mul_256 [INT_VEC_RETIRED.MUL_256. Unit: cpu_core] int_vec_retired.shuffles [INT_VEC_RETIRED.SHUFFLES. Unit: cpu_core] int_vec_retired.vnni_128 [INT_VEC_RETIRED.VNNI_128. Unit: cpu_core] int_vec_retired.vnni_256 [INT_VEC_RETIRED.VNNI_256. Unit: cpu_core] ld_blocks.address_alias [False dependencies in MOB due to partial compare on address. Unit: cpu_core] ld_blocks.no_sr [The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use. Unit: cpu_core] ld_blocks.store_forward [Loads blocked due to overlapping with a preceding store that cannot be forwarded. Unit: cpu_core] load_hit_prefetch.swpf [Counts the number of demand load dispatches that hit L1D fill buffer (FB) allocated for software prefetch. Unit: cpu_core] lsd.cycles_active [Cycles Uops delivered by the LSD,but didn't come from the decoder. Unit: cpu_core] lsd.cycles_ok [Cycles optimal number of Uops delivered by the LSD,but did not come from the decoder. Unit: cpu_core] lsd.uops [Number of Uops delivered by the LSD. Unit: cpu_core] machine_clears.count [Number of machine clears (nukes) of any type. Unit: cpu_core] machine_clears.smc [Self-modifying code (SMC) detected. Unit: cpu_core] misc2_retired.lfence [LFENCE instructions retired. Unit: cpu_core] misc_retired.lbr_inserts [Increments whenever there is an update to the LBR array. Unit: cpu_core] resource_stalls.sb [Cycles stalled due to no store buffers available. (not including draining form sync). Unit: cpu_core] resource_stalls.scoreboard [Counts cycles where the pipeline is stalled due to serializing operations. Unit: cpu_core] topdown.backend_bound_slots [TMA slots where no uops were being issued due to lack of back-end resources. Unit: cpu_core] topdown.bad_spec_slots [TMA slots wasted due to incorrect speculations. Unit: cpu_core] topdown.br_mispredict_slots [TMA slots wasted due to incorrect speculation by branch mispredictions. Unit: cpu_core] topdown.memory_bound_slots [TOPDOWN.MEMORY_BOUND_SLOTS. Unit: cpu_core] topdown.slots [TMA slots available for an unhalted logical processor. Fixed counter - architectural event. Unit: cpu_core] topdown.slots_p [TMA slots available for an unhalted logical processor. General counter - architectural event. Unit: cpu_core] uops_decoded.dec0_uops [UOPS_DECODED.DEC0_UOPS. Unit: cpu_core] uops_dispatched.port_0 [Uops executed on port 0. Unit: cpu_core] uops_dispatched.port_1 [Uops executed on port 1. Unit: cpu_core] uops_dispatched.port_2_3_10 [Uops executed on ports 2,3 and 10. Unit: cpu_core] uops_dispatched.port_4_9 [Uops executed on ports 4 and 9. Unit: cpu_core] uops_dispatched.port_5_11 [Uops executed on ports 5 and 11. Unit: cpu_core] uops_dispatched.port_6 [Uops executed on port 6. Unit: cpu_core] uops_dispatched.port_7_8 [Uops executed on ports 7 and 8. Unit: cpu_core] uops_executed.core_cycles_ge_1 [Cycles at least 1 micro-op is executed from any thread on physical core. Unit: cpu_core] uops_executed.core_cycles_ge_2 [Cycles at least 2 micro-op is executed from any thread on physical core. Unit: cpu_core] uops_executed.core_cycles_ge_3 [Cycles at least 3 micro-op is executed from any thread on physical core. Unit: cpu_core] uops_executed.core_cycles_ge_4 [Cycles at least 4 micro-op is executed from any thread on physical core. Unit: cpu_core] uops_executed.cycles_ge_1 [Cycles where at least 1 uop was executed per-thread. Unit: cpu_core] uops_executed.cycles_ge_2 [Cycles where at least 2 uops were executed per-thread. Unit: cpu_core] uops_executed.cycles_ge_3 [Cycles where at least 3 uops were executed per-thread. Unit: cpu_core] uops_executed.cycles_ge_4 [Cycles where at least 4 uops were executed per-thread. Unit: cpu_core] uops_executed.stalls [Counts number of cycles no uops were dispatched to be executed on this thread. Unit: cpu_core] uops_executed.thread [Counts the number of uops to be executed per-thread each cycle. Unit: cpu_core] uops_executed.x87 [Counts the number of x87 uops dispatched. Unit: cpu_core] uops_issued.any [Uops that RAT issues to RS. Unit: cpu_core] uops_issued.cycles [UOPS_ISSUED.CYCLES. Unit: cpu_core] uops_retired.cycles [Cycles with retired uop(s). Unit: cpu_core] uops_retired.heavy [Retired uops except the last uop of each instruction. Unit: cpu_core] uops_retired.ms [UOPS_RETIRED.MS. Unit: cpu_core] uops_retired.slots [Retirement slots used. Unit: cpu_core] uops_retired.stalls [Cycles without actually retired uops. Unit: cpu_core] uncore interconnect: unc_arb_coh_trk_requests.all [Number of requests allocated in Coherency Tracker. Unit: uncore_arb] unc_arb_dat_occupancy.all [Each cycle counts number of any coherent request at memory controller that were issued by any core. Unit: uncore_arb] unc_arb_dat_occupancy.rd [Each cycle counts number of coherent reads pending on data return from memory controller that were issued by any core. Unit: uncore_arb] unc_arb_req_trk_occupancy.drd [Each cycle count number of 'valid' coherent Data Read entries . Such entry is defined as valid when it is allocated till deallocation. Doesn't include prefetches [This event is alias to UNC_ARB_TRK_OCCUPANCY.RD]. Unit: uncore_arb] unc_arb_req_trk_request.drd [Number of all coherent Data Read entries. Doesn't include prefetches [This event is alias to UNC_ARB_TRK_REQUESTS.RD]. Unit: uncore_arb] unc_arb_trk_occupancy.all [Each cycle counts number of all outgoing valid entries in ReqTrk. Such entry is defined as valid from its allocation in ReqTrk till deallocation. Accounts for Coherent and non-coherent traffic. Unit: uncore_arb] unc_arb_trk_occupancy.rd [Each cycle count number of 'valid' coherent Data Read entries . Such entry is defined as valid when it is allocated till deallocation. Doesn't include prefetches [This event is alias to UNC_ARB_REQ_TRK_OCCUPANCY.DRD]. Unit: uncore_arb] unc_arb_trk_requests.all [Counts the number of coherent and in-coherent requests initiated by IA cores,processor graphic units,or LLC. Unit: uncore_arb] unc_arb_trk_requests.rd [Number of all coherent Data Read entries. Doesn't include prefetches [This event is alias to UNC_ARB_REQ_TRK_REQUEST.DRD]. Unit: uncore_arb] uncore memory: unc_m_act_count_rd [ACT command for a read request sent to DRAM. Unit: uncore_imc] unc_m_act_count_total [ACT command sent to DRAM. Unit: uncore_imc] unc_m_act_count_wr [ACT command for a write request sent to DRAM. Unit: uncore_imc] unc_m_cas_count_rd [Read CAS command sent to DRAM. Unit: uncore_imc] unc_m_cas_count_wr [Write CAS command sent to DRAM. Unit: uncore_imc] unc_m_clockticks [Number of clocks. Unit: uncore_imc] unc_m_dram_page_empty_rd [incoming read request page status is Page Empty. Unit: uncore_imc] unc_m_dram_page_empty_wr [incoming write request page status is Page Empty. Unit: uncore_imc] unc_m_dram_page_hit_rd [incoming read request page status is Page Hit. Unit: uncore_imc] unc_m_dram_page_hit_wr [incoming write request page status is Page Hit. Unit: uncore_imc] unc_m_dram_page_miss_rd [incoming read request page status is Page Miss. Unit: uncore_imc] unc_m_dram_page_miss_wr [incoming write request page status is Page Miss. Unit: uncore_imc] unc_m_dram_thermal_hot [Any Rank at Hot state. Unit: uncore_imc] unc_m_dram_thermal_warm [Any Rank at Warm state. Unit: uncore_imc] unc_m_pre_count_idle [PRE command sent to DRAM due to page table idle timer expiration. Unit: uncore_imc] unc_m_pre_count_page_miss [PRE command sent to DRAM for a read/write request. Unit: uncore_imc] unc_m_prefetch_rd [Incoming read prefetch request from IA. Unit: uncore_imc] unc_m_vc0_requests_rd [Incoming VC0 read request. Unit: uncore_imc] unc_m_vc0_requests_wr [Incoming VC0 write request. Unit: uncore_imc] unc_m_vc1_requests_rd [Incoming VC1 read request. Unit: uncore_imc] unc_m_vc1_requests_wr [Incoming VC1 write request. Unit: uncore_imc] unc_mc0_rdcas_count_freerun [Counts every 64B read request entering the Memory Controller 0 to DRAM (sum of all channels). Unit: uncore_imc_free_running_0] unc_mc0_wrcas_count_freerun [Counts every 64B write request entering the Memory Controller 0 to DRAM (sum of all channels). Each write request counts as a new request incrementing this counter. However,same cache line write requests (both full and partial) are combined to a single 64 byte data transfer to DRAM. Unit: uncore_imc_free_running_0] uncore other: unc_clock.socket [This 48-bit fixed counter counts the UCLK cycles. Unit: uncore_clock] virtual memory: dtlb_load_misses.walk_completed [Counts the number of page walks completed due to load DTLB misses to any page size. Unit: cpu_atom] dtlb_store_misses.walk_completed [Counts the number of page walks completed due to store DTLB misses to any page size. Unit: cpu_atom] itlb_misses.miss_caused_walk [Counts the number of page walks initiated by a instruction fetch that missed the first and second level TLBs. Unit: cpu_atom] itlb_misses.pde_cache_miss [Counts the number of page walks due to an instruction fetch that miss the PDE (Page Directory Entry) cache. Unit: cpu_atom] itlb_misses.walk_completed [Counts the number of page walks completed due to instruction fetch misses to any page size. Unit: cpu_atom] ld_head.dtlb_miss_at_ret [Counts the number of cycles that the head (oldest load) of the load buffer and retirement are both stalled due to a DTLB miss. Unit: cpu_atom] dtlb_load_misses.stlb_hit [Loads that miss the DTLB and hit the STLB. Unit: cpu_core] dtlb_load_misses.walk_active [Cycles when at least one PMH is busy with a page walk for a demand load. Unit: cpu_core] dtlb_load_misses.walk_completed [Load miss in all TLB levels causes a page walk that completes. (All page sizes). Unit: cpu_core] dtlb_load_misses.walk_completed_1g [Page walks completed due to a demand data load to a 1G page. Unit: cpu_core] dtlb_load_misses.walk_completed_2m_4m [Page walks completed due to a demand data load to a 2M/4M page. Unit: cpu_core] dtlb_load_misses.walk_completed_4k [Page walks completed due to a demand data load to a 4K page. Unit: cpu_core] dtlb_load_misses.walk_pending [Number of page walks outstanding for a demand load in the PMH each cycle. Unit: cpu_core] dtlb_store_misses.stlb_hit [Stores that miss the DTLB and hit the STLB. Unit: cpu_core] dtlb_store_misses.walk_active [Cycles when at least one PMH is busy with a page walk for a store. Unit: cpu_core] dtlb_store_misses.walk_completed [Store misses in all TLB levels causes a page walk that completes. (All page sizes). Unit: cpu_core] dtlb_store_misses.walk_completed_1g [Page walks completed due to a demand data store to a 1G page. Unit: cpu_core] dtlb_store_misses.walk_completed_2m_4m [Page walks completed due to a demand data store to a 2M/4M page. Unit: cpu_core] dtlb_store_misses.walk_completed_4k [Page walks completed due to a demand data store to a 4K page. Unit: cpu_core] dtlb_store_misses.walk_pending [Number of page walks outstanding for a store in the PMH each cycle. Unit: cpu_core] itlb_misses.stlb_hit [Instruction fetch requests that miss the ITLB and hit the STLB. Unit: cpu_core] itlb_misses.walk_active [Cycles when at least one PMH is busy with a page walk for code (instruction fetch) request. Unit: cpu_core] itlb_misses.walk_completed [Code miss in all TLB levels causes a page walk that completes. (All page sizes). Unit: cpu_core] itlb_misses.walk_completed_2m_4m [Code miss in all TLB levels causes a page walk that completes. (2M/4M). Unit: cpu_core] itlb_misses.walk_completed_4k [Code miss in all TLB levels causes a page walk that completes. (4K). Unit: cpu_core] itlb_misses.walk_pending [Number of page walks outstanding for an outstanding code request in the PMH each cycle. Unit: cpu_core] rNNN [Raw event descriptor] cpu_atom/event=0..255,pc,edge,.../modifier [Raw event descriptor] [(see 'man perf-list' or 'man perf-record' on how to encode it)] cpu_core/event=0..255,pc,edge,.../modifier [Raw event descriptor] [(see 'man perf-list' or 'man perf-record' on how to encode it)] breakpoint//modifier [Raw event descriptor] cstate_core/event=0..0xffffffffffffffff/modifier [Raw event descriptor] cstate_pkg/event=0..0xffffffffffffffff/modifier [Raw event descriptor] i915/i915_eventid=0..0x1fffff/modifier [Raw event descriptor] intel_bts//modifier [Raw event descriptor] intel_pt/ptw,event,cyc_thresh=0..15,.../modifier [Raw event descriptor] kprobe/retprobe/modifier [Raw event descriptor] msr/event=0..0xffffffffffffffff/modifier [Raw event descriptor] power/event=0..255/modifier [Raw event descriptor] software//modifier [Raw event descriptor] tracepoint//modifier [Raw event descriptor] uncore_arb/event=0..255,edge,inv,.../modifier [Raw event descriptor] uncore_cbox/event=0..255,edge,threshold=0..63,.../modifier[Raw event descriptor] uncore_clock/event=0..255/modifier [Raw event descriptor] uncore_imc_free_running/event=0..255,umask=0..255/modifier[Raw event descriptor] uncore_imc/event=0..255,edge,chmask=0..15/modifier [Raw event descriptor] uprobe/ref_ctr_offset=0..0xffffffff,retprobe/modifier[Raw event descriptor] mem:<addr>[/len][:access] [Hardware breakpoint] Metric Groups: Backend: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_core_bound [This metric represents fraction of slots where Core non-memory issues were of a bottleneck] tma_info_core_ilp [Instruction-Level-Parallelism (average number of uops executed when there is execution) per thread (logical-processor)] tma_info_memory_l2mpki [L2 cache true misses per kilo instruction for retired demand loads] tma_memory_bound [This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck] Bad: tma_info_bad_spec_branch_misprediction_cost [Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)] tma_info_bad_spec_ipmisp_cond_ntaken [Instructions per retired mispredicts for conditional non-taken branches (lower number means higher occurrence rate)] tma_info_bad_spec_ipmisp_cond_taken [Instructions per retired mispredicts for conditional taken branches (lower number means higher occurrence rate)] tma_info_bad_spec_ipmisp_indirect [Instructions per retired mispredicts for indirect CALL or JMP branches (lower number means higher occurrence rate)] tma_info_bad_spec_ipmisp_ret [Instructions per retired mispredicts for return branches (lower number means higher occurrence rate)] tma_info_bad_spec_ipmispredict [Number of Instructions per non-speculative Branch Misprediction (JEClear) (lower number means higher occurrence rate)] tma_info_bottleneck_irregular_overhead [Total pipeline cost of irregular execution (e.g] tma_info_bottleneck_mispredictions [Total pipeline cost of Branch Misprediction related bottlenecks] tma_info_branches_callret [Fraction of branches that are CALL or RET] tma_info_branches_cond_nt [Fraction of branches that are non-taken conditionals] tma_info_branches_cond_tk [Fraction of branches that are taken conditionals] tma_info_branches_jump [Fraction of branches that are unconditional (direct or indirect) jumps] tma_info_branches_other_branches [Fraction of branches of other types (not individually covered by other metrics in Info.Branches group)] BadSpec: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_branch_mispredicts [This metric represents fraction of slots the CPU has wasted due to Branch Misprediction] tma_clears_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Machine Clears] tma_info_bad_spec_ipmispredict [Number of Instructions per non-speculative Branch Misprediction (JEClear) (lower number means higher occurrence rate)] tma_info_bottleneck_mispredictions [Total pipeline cost of Branch Misprediction related bottlenecks] tma_machine_clears [This metric represents fraction of slots the CPU has wasted due to Machine Clears] tma_mispredicts_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage] BigFootprint: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_icache_misses [This metric represents fraction of cycles the CPU was stalled due to instruction cache misses] tma_info_bottleneck_big_code [Total pipeline cost of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and BTB misses)] tma_itlb_misses [This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses] tma_unknown_branches [This metric represents fraction of cycles the CPU was stalled due to new branch address clears] BrMispredicts: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_branch_mispredicts [This metric represents fraction of slots the CPU has wasted due to Branch Misprediction] tma_info_bad_spec_branch_misprediction_cost [Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)] tma_info_bad_spec_ipmisp_cond_ntaken [Instructions per retired mispredicts for conditional non-taken branches (lower number means higher occurrence rate)] tma_info_bad_spec_ipmisp_cond_taken [Instructions per retired mispredicts for conditional taken branches (lower number means higher occurrence rate)] tma_info_bad_spec_ipmisp_indirect [Instructions per retired mispredicts for indirect CALL or JMP branches (lower number means higher occurrence rate)] tma_info_bad_spec_ipmisp_ret [Instructions per retired mispredicts for return branches (lower number means higher occurrence rate)] tma_info_bad_spec_ipmispredict [Number of Instructions per non-speculative Branch Misprediction (JEClear) (lower number means higher occurrence rate)] tma_info_bad_spec_spec_clears_ratio [Speculative to Retired ratio of all clears (covering mispredicts and nukes)] tma_info_bottleneck_mispredictions [Total pipeline cost of Branch Misprediction related bottlenecks] tma_mispredicts_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage] tma_other_mispredicts [This metric estimates fraction of slots the CPU was stalled due to other cases of misprediction (non-retired x86 branches or other types)] Branches: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_fused_instructions [This metric represents fraction of slots where the CPU was retiring fused instructions -- where one uop can represent multiple contiguous instructions] tma_info_branches_callret [Fraction of branches that are CALL or RET] tma_info_branches_cond_nt [Fraction of branches that are non-taken conditionals] tma_info_branches_cond_tk [Fraction of branches that are taken conditionals] tma_info_branches_jump [Fraction of branches that are unconditional (direct or indirect) jumps] tma_info_branches_other_branches [Fraction of branches of other types (not individually covered by other metrics in Info.Branches group)] tma_info_inst_mix_bptkbranch [Branch instructions per taken branch] tma_info_inst_mix_ipbranch [Instructions per Branch (lower number means higher occurrence rate)] tma_info_inst_mix_ipcall [Instructions per (near) call (lower number means higher occurrence rate)] tma_info_inst_mix_iptb [Instructions per taken branch] tma_info_system_ipfarbranch [Instructions per Far Branch ( Far Branches apply upon transition from application to operating system,handling interrupts,exceptions) [lower number means higher occurrence rate]] tma_info_thread_uptb [Uops per taken branch] tma_non_fused_branches [This metric represents fraction of slots where the CPU was retiring branch instructions that were not fused] BvBC: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_icache_misses [This metric represents fraction of cycles the CPU was stalled due to instruction cache misses] tma_info_bottleneck_big_code [Total pipeline cost of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and BTB misses)] tma_itlb_misses [This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses] tma_unknown_branches [This metric represents fraction of cycles the CPU was stalled due to new branch address clears] BvBO: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_fused_instructions [This metric represents fraction of slots where the CPU was retiring fused instructions -- where one uop can represent multiple contiguous instructions] tma_info_bottleneck_branching_overhead [Total pipeline cost of instructions used for program control-flow - a subset of the Retiring category in TMA] tma_non_fused_branches [This metric represents fraction of slots where the CPU was retiring branch instructions that were not fused] tma_nop_instructions [This metric represents fraction of slots where the CPU was retiring NOP (no op) instructions] BvCB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_divider [This metric represents fraction of cycles where the Divider unit was active] tma_info_bottleneck_compute_bound_est [Total pipeline cost when the execution is compute-bound - an estimation] tma_ports_utilized_3m [This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL,Physical Core cycles otherwise)] BvFB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_frontend_bound [This category represents fraction of slots where the processor's Frontend undersupplies its Backend] tma_info_bottleneck_instruction_fetch_bw [Total pipeline cost of instruction fetch bandwidth related bottlenecks (when the front-end could not sustain operations delivery to the back-end)] BvIO: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_assists [This metric estimates fraction of slots the CPU retired uops delivered by the Microcode_Sequencer as a result of Assists] tma_frontend_bound [This category represents fraction of slots where the processor's Frontend undersupplies its Backend] tma_info_bottleneck_irregular_overhead [Total pipeline cost of irregular execution (e.g] tma_other_mispredicts [This metric estimates fraction of slots the CPU was stalled due to other cases of misprediction (non-retired x86 branches or other types)] tma_other_nukes [This metric represents fraction of slots the CPU has wasted due to Nukes (Machine Clears) not related to memory ordering] tma_serializing_operation [This metric represents fraction of cycles the CPU issue-pipeline was stalled due to serializing operations] BvMB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_bottleneck_cache_memory_bandwidth [Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenecks] BvML: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_bottleneck_cache_memory_latency [Total pipeline cost of external Memory- or Cache-Latency related bottlenecks] tma_l1_hit_latency [This metric roughly estimates fraction of cycles with demand load accesses that hit the L1 cache] tma_l2_bound [This metric estimates how often the CPU was stalled due to L2 cache accesses by loads] tma_l3_hit_latency [This metric estimates fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited)] tma_mem_latency [This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM)] tma_store_latency [This metric estimates fraction of cycles the CPU spent handling L1D store misses] BvMP: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_branch_mispredicts [This metric represents fraction of slots the CPU has wasted due to Branch Misprediction] tma_info_bottleneck_mispredictions [Total pipeline cost of Branch Misprediction related bottlenecks] tma_mispredicts_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage] BvMS: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_contested_accesses [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses] tma_data_sharing [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses] tma_false_sharing [This metric roughly estimates how often CPU was handling synchronizations due to False Sharing] tma_fb_full [This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed] tma_info_bottleneck_memory_synchronization [Total pipeline cost of Memory Synchronization related bottlenecks (data transfers and coherency updates across processors)] tma_machine_clears [This metric represents fraction of slots the CPU has wasted due to Machine Clears] tma_mem_bandwidth [This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM)] tma_sq_full [This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors)] BvMT: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_dtlb_load [This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses] tma_dtlb_store [This metric roughly estimates the fraction of cycles spent handling first-level data TLB store misses] tma_info_bottleneck_memory_data_tlbs [Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)] BvOB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_backend_bound [This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend] tma_info_bottleneck_other_bottlenecks [Total pipeline cost of remaining bottlenecks in the back-end] BvUW: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_bottleneck_useful_work [Total pipeline cost of "useful operations" - the portion of Retiring category not covered by Branching_Overhead nor Irregular_Overhead] tma_retiring [This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired] C0Wait: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_c01_wait [This metric represents fraction of cycles the CPU was stalled due staying in C0.1 power-performance optimized state (Faster wakeup time; Smaller power savings)] tma_c02_wait [This metric represents fraction of cycles the CPU was stalled due staying in C0.2 power-performance optimized state (Slower wakeup time; Larger power savings)] tma_info_system_c0_wait [Fraction of cycles the processor is waiting yet unhalted; covering legacy PAUSE instruction,as well as C0.1 / C0.2 power-performance optimized states] CacheHits: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_memory_fb_hpki [Fill Buffer (FB) hits per kilo instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)] tma_info_memory_l1mpki [L1 cache true misses per kilo instruction for retired demand loads] tma_info_memory_l1mpki_load [L1 cache true misses per kilo instruction for all demand loads (including speculative)] tma_info_memory_l2hpki_all [L2 cache hits per kilo instruction for all request types (including speculative)] tma_info_memory_l2hpki_load [L2 cache hits per kilo instruction for all demand loads (including speculative)] tma_info_memory_l2mpki [L2 cache true misses per kilo instruction for retired demand loads] tma_info_memory_l2mpki_all [L2 cache ([RKL+] true) misses per kilo instruction for all request types (including speculative)] tma_info_memory_l2mpki_load [L2 cache ([RKL+] true) misses per kilo instruction for all demand loads (including speculative)] tma_l1_bound [This metric estimates how often the CPU was stalled without loads missing the L1 data cache] tma_l2_bound [This metric estimates how often the CPU was stalled due to L2 cache accesses by loads] tma_l3_bound [This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core] CacheMisses: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_memory_l2mpki_rfo [Offcore requests (L2 cache miss) per kilo instruction for demand RFOs] CodeGen: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_branches_cond_nt [Fraction of branches that are non-taken conditionals] tma_info_branches_cond_tk [Fraction of branches that are taken conditionals] Compute: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_core_bound [This metric represents fraction of slots where Core non-memory issues were of a bottleneck] tma_fp_scalar [This metric approximates arithmetic floating-point (FP) scalar uops fraction the CPU has retired] tma_fp_vector [This metric approximates arithmetic floating-point (FP) vector uops fraction the CPU has retired aggregated across all vector widths] tma_fp_vector_128b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors] tma_fp_vector_256b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors] tma_int_vector_128b [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_int_vector_256b [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_port_0 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch)] tma_x87_use [This metric serves as an approximation of legacy x87 usage] Cor: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_botlnk_l0_core_bound_likely [Probability of Core Bound bottleneck hidden by SMT-profiling artifacts] tma_info_bottleneck_compute_bound_est [Total pipeline cost when the execution is compute-bound - an estimation] tma_info_bottleneck_irregular_overhead [Total pipeline cost of irregular execution (e.g] tma_info_bottleneck_other_bottlenecks [Total pipeline cost of remaining bottlenecks in the back-end] tma_info_core_fp_arith_utilization [Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width)] tma_info_core_ilp [Instruction-Level-Parallelism (average number of uops executed when there is execution) per thread (logical-processor)] tma_info_pipeline_execute [Instruction-Level-Parallelism (average number of uops executed when there is execution) per core] tma_info_system_gflops [Giga Floating Point Operations Per Second] tma_info_thread_execute_per_issue [The ratio of Executed- by Issued-Uops] DSB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_dsb [This metric represents Core fraction of cycles in which CPU was likely limited due to DSB (decoded uop cache) fetch pipeline] tma_info_botlnk_l2_dsb_bandwidth [Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetch_BW Bottleneck] tma_info_frontend_dsb_coverage [Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)] DSBmiss: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_decoder0_alone [This metric represents fraction of cycles where decoder-0 was the only active decoder] tma_dsb_switches [This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines] tma_info_botlnk_l2_dsb_misses [Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck] tma_info_frontend_dsb_switch_cost [Average number of cycles of a switch from the DSB fetch-unit to MITE fetch unit - see DSB_Switches tree node for details] tma_info_frontend_ipdsb_miss_ret [Instructions per non-speculative DSB miss (lower number means higher occurrence rate)] tma_mite [This metric represents Core fraction of cycles in which CPU was likely limited due to the MITE pipeline (the legacy decode pipeline)] DataSharing: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_contested_accesses [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses] tma_false_sharing [This metric roughly estimates how often CPU was handling synchronizations due to False Sharing] Default: tma_backend_bound [This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend] tma_bad_speculation [This category represents fraction of slots wasted due to incorrect speculations] tma_frontend_bound [This category represents fraction of slots where the processor's Frontend undersupplies its Backend] tma_retiring [This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired] Fed: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_botlnk_l2_dsb_misses [Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck] tma_info_botlnk_l2_ic_misses [Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bottleneck] tma_info_bottleneck_big_code [Total pipeline cost of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and BTB misses)] tma_info_bottleneck_instruction_fetch_bw [Total pipeline cost of instruction fetch bandwidth related bottlenecks (when the front-end could not sustain operations delivery to the back-end)] tma_info_frontend_dsb_coverage [Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)] tma_info_frontend_fetch_upc [Average number of Uops issued by front-end when it issued something] tma_info_frontend_icache_miss_latency [Average Latency for L1 instruction cache misses] tma_info_frontend_ipdsb_miss_ret [Instructions per non-speculative DSB miss (lower number means higher occurrence rate)] tma_info_frontend_ipunknown_branch [Instructions per speculative Unknown Branch Misprediction (BAClear) (lower number means higher occurrence rate)] tma_info_frontend_lsd_coverage [Fraction of Uops delivered by the LSD (Loop Stream Detector; aka Loop Cache)] tma_info_frontend_unknown_branch_cost [Average number of cycles the front-end was delayed due to an Unknown Branch detection] tma_info_inst_mix_bptkbranch [Branch instructions per taken branch] tma_info_inst_mix_ipbranch [Instructions per Branch (lower number means higher occurrence rate)] tma_info_inst_mix_ipcall [Instructions per (near) call (lower number means higher occurrence rate)] tma_info_inst_mix_iptb [Instructions per taken branch] tma_info_memory_tlb_code_stlb_mpki [STLB (2nd level TLB) code speculative misses per kilo instruction (misses of any page-size that complete the page walk)] tma_info_pipeline_fetch_dsb [Average number of uops fetched from DSB per cycle] tma_info_pipeline_fetch_lsd [Average number of uops fetched from LSD per cycle] tma_info_pipeline_fetch_mite [Average number of uops fetched from MITE per cycle] tma_info_thread_uptb [Uops per taken branch] FetchBW: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_decoder0_alone [This metric represents fraction of cycles where decoder-0 was the only active decoder] tma_dsb [This metric represents Core fraction of cycles in which CPU was likely limited due to DSB (decoded uop cache) fetch pipeline] tma_fetch_bandwidth [This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues] tma_info_botlnk_l2_dsb_bandwidth [Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetch_BW Bottleneck] tma_info_bottleneck_instruction_fetch_bw [Total pipeline cost of instruction fetch bandwidth related bottlenecks (when the front-end could not sustain operations delivery to the back-end)] tma_info_frontend_dsb_coverage [Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)] tma_info_frontend_fetch_upc [Average number of Uops issued by front-end when it issued something] tma_info_inst_mix_iptb [Instructions per taken branch] tma_info_pipeline_fetch_dsb [Average number of uops fetched from DSB per cycle] tma_info_pipeline_fetch_lsd [Average number of uops fetched from LSD per cycle] tma_info_pipeline_fetch_mite [Average number of uops fetched from MITE per cycle] tma_info_thread_uptb [Uops per taken branch] tma_lsd [This metric represents Core fraction of cycles in which CPU was likely limited due to LSD (Loop Stream Detector) unit] tma_mite [This metric represents Core fraction of cycles in which CPU was likely limited due to the MITE pipeline (the legacy decode pipeline)] FetchLat: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_branch_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers] tma_dsb_switches [This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines] tma_icache_misses [This metric represents fraction of cycles the CPU was stalled due to instruction cache misses] tma_info_botlnk_l2_ic_misses [Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bottleneck] tma_info_frontend_icache_miss_latency [Average Latency for L1 instruction cache misses] tma_itlb_misses [This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses] tma_lcp [This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs)] tma_ms_switches [This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)] tma_unknown_branches [This metric represents fraction of cycles the CPU was stalled due to new branch address clears] Flops: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_fp_scalar [This metric approximates arithmetic floating-point (FP) scalar uops fraction the CPU has retired] tma_fp_vector [This metric approximates arithmetic floating-point (FP) vector uops fraction the CPU has retired aggregated across all vector widths] tma_fp_vector_128b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors] tma_fp_vector_256b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors] tma_info_core_flopc [Floating Point Operations Per Cycle] tma_info_core_fp_arith_utilization [Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width)] tma_info_inst_mix_iparith [Instructions per FP Arithmetic instruction (lower number means higher occurrence rate)] tma_info_inst_mix_iparith_avx128 [Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number means higher occurrence rate)] tma_info_inst_mix_iparith_avx256 [Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means higher occurrence rate)] tma_info_inst_mix_iparith_scalar_dp [Instructions per FP Arithmetic Scalar Double-Precision instruction (lower number means higher occurrence rate)] tma_info_inst_mix_iparith_scalar_sp [Instructions per FP Arithmetic Scalar Single-Precision instruction (lower number means higher occurrence rate)] tma_info_inst_mix_ipflop [Instructions per Floating Point (FP) Operation (lower number means higher occurrence rate)] tma_info_inst_mix_ippause [Instructions per PAUSE (lower number means higher occurrence rate)] tma_info_system_gflops [Giga Floating Point Operations Per Second] FpScalar: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_inst_mix_iparith_scalar_dp [Instructions per FP Arithmetic Scalar Double-Precision instruction (lower number means higher occurrence rate)] tma_info_inst_mix_iparith_scalar_sp [Instructions per FP Arithmetic Scalar Single-Precision instruction (lower number means higher occurrence rate)] FpVector: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_inst_mix_iparith_avx128 [Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number means higher occurrence rate)] tma_info_inst_mix_iparith_avx256 [Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means higher occurrence rate)] tma_info_inst_mix_ippause [Instructions per PAUSE (lower number means higher occurrence rate)] Frontend: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_fetch_bandwidth [This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues] tma_fetch_latency [This metric represents fraction of slots the CPU was stalled due to Frontend latency issues] tma_info_bottleneck_big_code [Total pipeline cost of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and BTB misses)] tma_info_bottleneck_instruction_fetch_bw [Total pipeline cost of instruction fetch bandwidth related bottlenecks (when the front-end could not sustain operations delivery to the back-end)] tma_info_inst_mix_iptb [Instructions per taken branch] HPC: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_avx_assists [This metric estimates fraction of slots the CPU retired uops as a result of handing SSE to AVX* or AVX* to SSE transition Assists] tma_fp_arith [This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)] tma_fp_assists [This metric roughly estimates fraction of slots the CPU retired uops as a result of handing Floating Point (FP) Assists] tma_info_core_fp_arith_utilization [Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width)] tma_info_system_cpu_utilization [Average CPU Utilization (percentage)] tma_info_system_dram_bw_use [Average external Memory Bandwidth Use for reads and writes [GB / sec]] tma_info_system_gflops [Giga Floating Point Operations Per Second] tma_shuffles_256b [This metric represents fraction of slots where the CPU was retiring Shuffle operations of 256-bit vector size (FP or Integer)] IcMiss: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_icache_misses [This metric represents fraction of cycles the CPU was stalled due to instruction cache misses] tma_info_botlnk_l2_ic_misses [Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bottleneck] tma_info_bottleneck_big_code [Total pipeline cost of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and BTB misses)] tma_info_frontend_icache_miss_latency [Average Latency for L1 instruction cache misses] tma_info_frontend_l2mpki_code [L2 cache true code cacheline misses per kilo instruction] tma_info_frontend_l2mpki_code_all [L2 cache speculative code cacheline misses per kilo instruction] Ifetch: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_bottleneck_%_ifetch_miss_bound_cycles [Percentage of time that allocation and retirement is stalled by the Frontend Cluster due to an Ifetch Miss,either Icache or ITLB Miss] InsType: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_inst_mix_iparith [Instructions per FP Arithmetic instruction (lower number means higher occurrence rate)] tma_info_inst_mix_iparith_avx128 [Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number means higher occurrence rate)] tma_info_inst_mix_iparith_avx256 [Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means higher occurrence rate)] tma_info_inst_mix_iparith_scalar_dp [Instructions per FP Arithmetic Scalar Double-Precision instruction (lower number means higher occurrence rate)] tma_info_inst_mix_iparith_scalar_sp [Instructions per FP Arithmetic Scalar Single-Precision instruction (lower number means higher occurrence rate)] tma_info_inst_mix_ipbranch [Instructions per Branch (lower number means higher occurrence rate)] tma_info_inst_mix_ipflop [Instructions per Floating Point (FP) Operation (lower number means higher occurrence rate)] tma_info_inst_mix_ipload [Instructions per Load (lower number means higher occurrence rate)] tma_info_inst_mix_ippause [Instructions per PAUSE (lower number means higher occurrence rate)] tma_info_inst_mix_ipstore [Instructions per Store (lower number means higher occurrence rate)] IntVector: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_int_vector_128b [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_int_vector_256b [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] LSD: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_frontend_lsd_coverage [Fraction of Uops delivered by the LSD (Loop Stream Detector; aka Loop Cache)] tma_lsd [This metric represents Core fraction of cycles in which CPU was likely limited due to LSD (Loop Stream Detector) unit] Load_Store_Miss: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_bottleneck_%_load_miss_bound_cycles [Percentage of time that retirement is stalled due to an L1 miss] MachineClears: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_clears_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Machine Clears] tma_machine_clears [This metric represents fraction of slots the CPU has wasted due to Machine Clears] Machine_Clears: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_other_nukes [This metric represents fraction of slots the CPU has wasted due to Nukes (Machine Clears) not related to memory ordering] Mem: tma_info_bottleneck_cache_memory_bandwidth [Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenecks] tma_info_bottleneck_cache_memory_latency [Total pipeline cost of external Memory- or Cache-Latency related bottlenecks] tma_info_bottleneck_memory_data_tlbs [Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)] tma_info_bottleneck_memory_synchronization [Total pipeline cost of Memory Synchronization related bottlenecks (data transfers and coherency updates across processors)] tma_info_memory_core_l1d_cache_fill_bw_2t [Average per-core data fill bandwidth to the L1 data cache [GB / sec]] tma_info_memory_core_l2_cache_fill_bw_2t [Average per-core data fill bandwidth to the L2 cache [GB / sec]] tma_info_memory_core_l3_cache_access_bw_2t [Average per-core data access bandwidth to the L3 cache [GB / sec]] tma_info_memory_core_l3_cache_fill_bw_2t [Average per-core data fill bandwidth to the L3 cache [GB / sec]] tma_info_memory_fb_hpki [Fill Buffer (FB) hits per kilo instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)] tma_info_memory_l1d_cache_fill_bw [Average per-thread data fill bandwidth to the L1 data cache [GB / sec]] tma_info_memory_l1mpki [L1 cache true misses per kilo instruction for retired demand loads] tma_info_memory_l1mpki_load [L1 cache true misses per kilo instruction for all demand loads (including speculative)] tma_info_memory_l2_cache_fill_bw [Average per-thread data fill bandwidth to the L2 cache [GB / sec]] tma_info_memory_l2hpki_all [L2 cache hits per kilo instruction for all request types (including speculative)] tma_info_memory_l2hpki_load [L2 cache hits per kilo instruction for all demand loads (including speculative)] tma_info_memory_l2mpki [L2 cache true misses per kilo instruction for retired demand loads] tma_info_memory_l2mpki_all [L2 cache ([RKL+] true) misses per kilo instruction for all request types (including speculative)] tma_info_memory_l2mpki_load [L2 cache ([RKL+] true) misses per kilo instruction for all demand loads (including speculative)] tma_info_memory_l3_cache_access_bw [Average per-thread data access bandwidth to the L3 cache [GB / sec]] tma_info_memory_l3_cache_fill_bw [Average per-thread data fill bandwidth to the L3 cache [GB / sec]] tma_info_memory_l3mpki [L3 cache true misses per kilo instruction for retired demand loads] tma_info_memory_load_miss_real_latency [Actual Average Latency for L1 data-cache miss demand load operations (in core cycles)] tma_info_memory_mix_bus_lock_pki ["Bus lock" per kilo instruction] tma_info_memory_mix_uc_load_pki [Un-cacheable retired load per kilo instruction] tma_info_memory_mlp [Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss] tma_info_memory_tlb_load_stlb_mpki [STLB (2nd level TLB) data load speculative misses per kilo instruction (misses of any page-size that complete the page walk)] tma_info_memory_tlb_page_walks_utilization [Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses] tma_info_memory_tlb_store_stlb_mpki [STLB (2nd level TLB) data store speculative misses per kilo instruction (misses of any page-size that complete the page walk)] tma_info_system_mem_parallel_reads [Average number of parallel data read requests to external memory] tma_info_system_mem_read_latency [Average latency of data read request to external memory (in nanoseconds)] tma_info_thread_cpi [Cycles Per Instruction (per Logical Processor)] MemOffcore: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_system_dram_bw_use [Average external Memory Bandwidth Use for reads and writes [GB / sec]] Mem_Exec: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_bottleneck_%_mem_exec_bound_cycles [Percentage of time that retirement is stalled by the Memory Cluster due to a pipeline stall] MemoryBW: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_fb_full [This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed] tma_info_bottleneck_cache_memory_bandwidth [Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenecks] tma_info_memory_core_l1d_cache_fill_bw_2t [Average per-core data fill bandwidth to the L1 data cache [GB / sec]] tma_info_memory_core_l2_cache_fill_bw_2t [Average per-core data fill bandwidth to the L2 cache [GB / sec]] tma_info_memory_core_l3_cache_access_bw_2t [Average per-core data access bandwidth to the L3 cache [GB / sec]] tma_info_memory_core_l3_cache_fill_bw_2t [Average per-core data fill bandwidth to the L3 cache [GB / sec]] tma_info_memory_l1d_cache_fill_bw [Average per-thread data fill bandwidth to the L1 data cache [GB / sec]] tma_info_memory_l2_cache_fill_bw [Average per-thread data fill bandwidth to the L2 cache [GB / sec]] tma_info_memory_l3_cache_access_bw [Average per-thread data access bandwidth to the L3 cache [GB / sec]] tma_info_memory_l3_cache_fill_bw [Average per-thread data fill bandwidth to the L3 cache [GB / sec]] tma_info_memory_mlp [Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss] tma_info_system_dram_bw_use [Average external Memory Bandwidth Use for reads and writes [GB / sec]] tma_info_system_mem_parallel_reads [Average number of parallel data read requests to external memory] tma_mem_bandwidth [This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM)] tma_sq_full [This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors)] tma_streaming_stores [This metric estimates how often CPU was stalled due to Streaming store memory accesses; Streaming store optimize out a read request required by RFO stores] MemoryBound: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_dram_bound [This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads] tma_info_memory_load_miss_real_latency [Actual Average Latency for L1 data-cache miss demand load operations (in core cycles)] tma_info_memory_mlp [Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss] tma_l1_bound [This metric estimates how often the CPU was stalled without loads missing the L1 data cache] tma_l2_bound [This metric estimates how often the CPU was stalled due to L2 cache accesses by loads] tma_l3_bound [This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core] tma_store_bound [This metric estimates how often CPU was stalled due to RFO store memory accesses; RFO store issue a read-for-ownership request before the write] MemoryLat: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_bottleneck_cache_memory_latency [Total pipeline cost of external Memory- or Cache-Latency related bottlenecks] tma_info_memory_load_miss_real_latency [Actual Average Latency for L1 data-cache miss demand load operations (in core cycles)] tma_info_system_mem_read_latency [Average latency of data read request to external memory (in nanoseconds)] tma_l1_hit_latency [This metric roughly estimates fraction of cycles with demand load accesses that hit the L1 cache] tma_l3_hit_latency [This metric estimates fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited)] tma_mem_latency [This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM)] tma_store_latency [This metric estimates fraction of cycles the CPU spent handling L1D store misses] MemoryTLB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_dtlb_load [This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses] tma_dtlb_store [This metric roughly estimates the fraction of cycles spent handling first-level data TLB store misses] tma_info_bottleneck_big_code [Total pipeline cost of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and BTB misses)] tma_info_bottleneck_memory_data_tlbs [Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)] tma_info_memory_tlb_code_stlb_mpki [STLB (2nd level TLB) code speculative misses per kilo instruction (misses of any page-size that complete the page walk)] tma_info_memory_tlb_load_stlb_mpki [STLB (2nd level TLB) data load speculative misses per kilo instruction (misses of any page-size that complete the page walk)] tma_info_memory_tlb_page_walks_utilization [Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses] tma_info_memory_tlb_store_stlb_mpki [STLB (2nd level TLB) data store speculative misses per kilo instruction (misses of any page-size that complete the page walk)] tma_itlb_misses [This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses] tma_load_stlb_hit [This metric roughly estimates the fraction of cycles where the (first level) DTLB was missed by load accesses,that later on hit in second-level TLB (STLB)] tma_load_stlb_miss [This metric estimates the fraction of cycles where the Second-level TLB (STLB) was missed by load accesses,performing a hardware page walk] tma_store_stlb_hit [This metric roughly estimates the fraction of cycles where the TLB was missed by store accesses,hitting in the second-level TLB (STLB)] tma_store_stlb_miss [This metric estimates the fraction of cycles where the STLB was missed by store accesses,performing a hardware page walk] Memory_BW: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_memory_latency_data_l2_mlp [Average Parallel L2 cache miss data reads] tma_info_memory_latency_load_l2_mlp [Average Parallel L2 cache miss demand Loads] Memory_Lat: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_memory_latency_load_l2_miss_latency [Average Latency for L2 cache miss demand Loads] tma_info_memory_latency_load_l3_miss_latency [Average Latency for L3 cache miss demand Loads] MicroSeq: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_pipeline_ipassist [Instructions per a microcode Assist invocation] tma_info_pipeline_strings_cycles [Estimated fraction of retirement-cycles dealing with repeat instructions] tma_microcode_sequencer [This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit] tma_ms_switches [This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)] OS: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_system_ipfarbranch [Instructions per Far Branch ( Far Branches apply upon transition from application to operating system,handling interrupts,exceptions) [lower number means higher occurrence rate]] tma_info_system_kernel_cpi [Cycles Per Instruction for the Operating System (OS) Kernel mode] tma_info_system_kernel_utilization [Fraction of cycles spent in the Operating System (OS) Kernel mode] Offcore: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_contested_accesses [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses] tma_data_sharing [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses] tma_false_sharing [This metric roughly estimates how often CPU was handling synchronizations due to False Sharing] tma_info_bottleneck_cache_memory_bandwidth [Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenecks] tma_info_bottleneck_cache_memory_latency [Total pipeline cost of external Memory- or Cache-Latency related bottlenecks] tma_info_bottleneck_memory_data_tlbs [Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)] tma_info_bottleneck_memory_synchronization [Total pipeline cost of Memory Synchronization related bottlenecks (data transfers and coherency updates across processors)] tma_info_bottleneck_other_bottlenecks [Total pipeline cost of remaining bottlenecks in the back-end] tma_info_memory_core_l3_cache_access_bw_2t [Average per-core data access bandwidth to the L3 cache [GB / sec]] tma_info_memory_l2mpki_all [L2 cache ([RKL+] true) misses per kilo instruction for all request types (including speculative)] tma_info_memory_l2mpki_rfo [Offcore requests (L2 cache miss) per kilo instruction for demand RFOs] tma_info_memory_l3_cache_access_bw [Average per-thread data access bandwidth to the L3 cache [GB / sec]] tma_info_memory_latency_data_l2_mlp [Average Parallel L2 cache miss data reads] tma_info_memory_latency_load_l2_miss_latency [Average Latency for L2 cache miss demand Loads] tma_info_memory_latency_load_l2_mlp [Average Parallel L2 cache miss demand Loads] tma_info_memory_latency_load_l3_miss_latency [Average Latency for L3 cache miss demand Loads] tma_lock_latency [This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations] tma_mem_bandwidth [This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM)] tma_mem_latency [This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM)] tma_sq_full [This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors)] tma_store_latency [This metric estimates fraction of cycles the CPU spent handling L1D store misses] tma_streaming_stores [This metric estimates how often CPU was stalled due to Streaming store memory accesses; Streaming store optimize out a read request required by RFO stores] PGO: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_frontend_bound [This category represents fraction of slots where the processor's Frontend undersupplies its Backend] tma_info_branches_cond_nt [Fraction of branches that are non-taken conditionals] tma_info_branches_cond_tk [Fraction of branches that are taken conditionals] tma_info_inst_mix_bptkbranch [Branch instructions per taken branch] tma_info_inst_mix_ipcall [Instructions per (near) call (lower number means higher occurrence rate)] tma_info_inst_mix_iptb [Instructions per taken branch] Pipeline: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_fused_instructions [This metric represents fraction of slots where the CPU was retiring fused instructions -- where one uop can represent multiple contiguous instructions] tma_info_core_ilp [Instruction-Level-Parallelism (average number of uops executed when there is execution) per thread (logical-processor)] tma_info_pipeline_execute [Instruction-Level-Parallelism (average number of uops executed when there is execution) per core] tma_info_pipeline_ipassist [Instructions per a microcode Assist invocation] tma_info_pipeline_retire [Average number of Uops retired in cycles where at least one uop has retired] tma_info_pipeline_strings_cycles [Estimated fraction of retirement-cycles dealing with repeat instructions] tma_info_thread_clks [Per-Logical Processor actual clocks when the Logical Processor is active] tma_info_thread_cpi [Cycles Per Instruction (per Logical Processor)] tma_info_thread_execute_per_issue [The ratio of Executed- by Issued-Uops] tma_info_thread_uoppi [Uops Per Instruction] tma_int_operations [This metric represents overall Integer (Int) select operations fraction the CPU has executed (retired)] tma_int_vector_128b [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_int_vector_256b [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_memory_operations [This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses] tma_non_fused_branches [This metric represents fraction of slots where the CPU was retiring branch instructions that were not fused] tma_nop_instructions [This metric represents fraction of slots where the CPU was retiring NOP (no op) instructions] tma_other_light_ops [This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes] tma_shuffles_256b [This metric represents fraction of slots where the CPU was retiring Shuffle operations of 256-bit vector size (FP or Integer)] PortsUtil: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_core_ilp [Instruction-Level-Parallelism (average number of uops executed when there is execution) per thread (logical-processor)] tma_info_pipeline_execute [Instruction-Level-Parallelism (average number of uops executed when there is execution) per core] tma_ports_utilization [This metric estimates fraction of cycles the CPU performance was potentially limited due to Core computation issues (non divider-related)] tma_ports_utilized_0 [This metric represents fraction of cycles CPU executed no uops on any execution port (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_ports_utilized_1 [This metric represents fraction of cycles where the CPU executed total of 1 uop per cycle on all execution ports (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_ports_utilized_2 [This metric represents fraction of cycles CPU executed total of 2 uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)] tma_ports_utilized_3m [This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_serializing_operation [This metric represents fraction of cycles the CPU issue-pipeline was stalled due to serializing operations] Power: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] C10_Pkg_Residency [C10 residency percent per package] C1_Core_Residency [C1 residency percent per core] C2_Pkg_Residency [C2 residency percent per package] C3_Pkg_Residency [C3 residency percent per package] C6_Core_Residency [C6 residency percent per core] C6_Pkg_Residency [C6 residency percent per package] C7_Core_Residency [C7 residency percent per core] C7_Pkg_Residency [C7 residency percent per package] C8_Pkg_Residency [C8 residency percent per package] C9_Pkg_Residency [C9 residency percent per package] tma_info_core_epc [uops Executed per Cycle] tma_info_system_core_frequency [Measured Average Core Frequency for unhalted processors [GHz]] tma_info_system_turbo_utilization [Average Frequency Utilization relative nominal frequency] Prefetches: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_inst_mix_ipswpf [Instructions per Software prefetch instruction (of any type: NTA/T0/T1/T2/Prefetch) (lower number means higher occurrence rate)] Ret: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_bottleneck_branching_overhead [Total pipeline cost of instructions used for program control-flow - a subset of the Retiring category in TMA] tma_info_bottleneck_irregular_overhead [Total pipeline cost of irregular execution (e.g] tma_info_bottleneck_useful_work [Total pipeline cost of "useful operations" - the portion of Retiring category not covered by Branching_Overhead nor Irregular_Overhead] tma_info_core_coreipc [Instructions Per Cycle across hyper-threads (per physical core)] tma_info_core_flopc [Floating Point Operations Per Cycle] tma_info_pipeline_ipassist [Instructions per a microcode Assist invocation] tma_info_pipeline_retire [Average number of Uops retired in cycles where at least one uop has retired] tma_info_pipeline_strings_cycles [Estimated fraction of retirement-cycles dealing with repeat instructions] tma_info_thread_ipc [Instructions Per Cycle (per Logical Processor)] tma_info_thread_uoppi [Uops Per Instruction] Retire: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_heavy_operations [This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences] tma_info_pipeline_ipassist [Instructions per a microcode Assist invocation] tma_info_thread_uoppi [Uops Per Instruction] tma_light_operations [This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation)] SMT: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_botlnk_l0_core_bound_likely [Probability of Core Bound bottleneck hidden by SMT-profiling artifacts] tma_info_core_core_clks [Core actual clocks when any Logical Processor is active on the Physical Core] tma_info_core_coreipc [Instructions Per Cycle across hyper-threads (per physical core)] tma_info_pipeline_execute [Instruction-Level-Parallelism (average number of uops executed when there is execution) per core] tma_info_system_smt_2t_utilization [Fraction of cycles where both hardware Logical Processors were active] tma_info_thread_slots_utilization [Fraction of Physical Core issue-slots utilized by this Logical Processor] Snoop: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_contested_accesses [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses] tma_data_sharing [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses] tma_false_sharing [This metric roughly estimates how often CPU was handling synchronizations due to False Sharing] SoC: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] UNCORE_FREQ [Uncore frequency per die [GHZ]] tma_info_system_dram_bw_use [Average external Memory Bandwidth Use for reads and writes [GB / sec]] tma_info_system_mem_parallel_reads [Average number of parallel data read requests to external memory] tma_info_system_mem_read_latency [Average latency of data read request to external memory (in nanoseconds)] tma_info_system_socket_clks [Socket actual clocks when any core is active on that socket] Summary: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_inst_mix_instructions [Total number of retired Instructions] tma_info_system_core_frequency [Measured Average Core Frequency for unhalted processors [GHz]] tma_info_system_cpu_utilization [Average CPU Utilization (percentage)] tma_info_system_cpus_utilized [Average number of utilized CPUs] tma_info_system_kernel_utilization [Fraction of cycles spent in Kernel mode] tma_info_thread_ipc [Instructions Per Cycle (per Logical Processor)] TmaL1: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_backend_bound [This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend] tma_bad_speculation [This category represents fraction of slots wasted due to incorrect speculations] tma_frontend_bound [This category represents fraction of slots where the processor's Frontend undersupplies its Backend] tma_info_core_coreipc [Instructions Per Cycle across hyper-threads (per physical core)] tma_info_inst_mix_instructions [Total number of retired Instructions] tma_info_thread_slots [Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor ICL onward)] tma_info_thread_slots_utilization [Fraction of Physical Core issue-slots utilized by this Logical Processor] tma_retiring [This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired] TmaL2: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_branch_mispredicts [This metric represents fraction of slots the CPU has wasted due to Branch Misprediction] tma_core_bound [This metric represents fraction of slots where Core non-memory issues were of a bottleneck] tma_fetch_bandwidth [This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues] tma_fetch_latency [This metric represents fraction of slots the CPU was stalled due to Frontend latency issues] tma_heavy_operations [This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences] tma_light_operations [This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation)] tma_machine_clears [This metric represents fraction of slots the CPU has wasted due to Machine Clears] tma_memory_bound [This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck] TmaL3mem: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_dram_bound [This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads] tma_l1_bound [This metric estimates how often the CPU was stalled without loads missing the L1 data cache] tma_l2_bound [This metric estimates how often the CPU was stalled due to L2 cache accesses by loads] tma_l3_bound [This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core] tma_store_bound [This metric estimates how often CPU was stalled due to RFO store memory accesses; RFO store issue a read-for-ownership request before the write] TopdownL1: [Metrics for top-down breakdown at level 1] tma_backend_bound [This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend] tma_bad_speculation [This category represents fraction of slots wasted due to incorrect speculations] tma_frontend_bound [This category represents fraction of slots where the processor's Frontend undersupplies its Backend] tma_retiring [This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired] TopdownL2: [Metrics for top-down breakdown at level 2] tma_branch_mispredicts [This metric represents fraction of slots the CPU has wasted due to Branch Misprediction] tma_core_bound [This metric represents fraction of slots where Core non-memory issues were of a bottleneck] tma_fetch_bandwidth [This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues] tma_fetch_latency [This metric represents fraction of slots the CPU was stalled due to Frontend latency issues] tma_heavy_operations [This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences] tma_ifetch_bandwidth [Counts the number of issue slots that were not delivered by the frontend due to frontend bandwidth restrictions due to decode,predecode,cisc,and other limitations] tma_ifetch_latency [Counts the number of issue slots that were not delivered by the frontend due to frontend latency restrictions due to icache misses,itlb misses, branch detection,and resteer limitations] tma_light_operations [This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation)] tma_machine_clears [This metric represents fraction of slots the CPU has wasted due to Machine Clears] tma_memory_bound [This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck] tma_resource_bound [Counts the number of cycles the core is stalled due to a resource limitation] TopdownL3: [Metrics for top-down breakdown at level 3] tma_allocation_restriction [Counts the number of issue slots that were not consumed by the backend due to certain allocation restrictions] tma_branch_detect [Counts the number of issue slots that were not delivered by the frontend due to BACLEARS,which occurs when the Branch Target Buffer (BTB) prediction or lack thereof,was corrected by a later branch predictor in the frontend] tma_branch_resteer [Counts the number of issue slots that were not delivered by the frontend due to BTCLEARS,which occurs when the Branch Target Buffer (BTB) predicts a taken branch] tma_branch_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers] tma_cisc [Counts the number of issue slots that were not delivered by the frontend due to the microcode sequencer (MS)] tma_decode [Counts the number of issue slots that were not delivered by the frontend due to decode stalls] tma_divider [This metric represents fraction of cycles where the Divider unit was active] tma_dram_bound [This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads] tma_dsb [This metric represents Core fraction of cycles in which CPU was likely limited due to DSB (decoded uop cache) fetch pipeline] tma_dsb_switches [This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines] tma_fast_nuke [Counts the number of issue slots that were not consumed by the backend due to a machine clear that does not require the use of microcode, classified as a fast nuke,due to memory ordering,memory disambiguation and memory renaming] tma_few_uops_instructions [This metric represents fraction of slots where the CPU was retiring instructions that that are decoder into two or up to ([SNB+] four; [ADL+] five) uops] tma_fp_arith [This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)] tma_fused_instructions [This metric represents fraction of slots where the CPU was retiring fused instructions -- where one uop can represent multiple contiguous instructions] tma_icache_misses [This metric represents fraction of cycles the CPU was stalled due to instruction cache misses] tma_int_operations [This metric represents overall Integer (Int) select operations fraction the CPU has executed (retired)] tma_itlb_misses [This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses] tma_l1_bound [This metric estimates how often the CPU was stalled without loads missing the L1 data cache] tma_l2_bound [This metric estimates how often the CPU was stalled due to L2 cache accesses by loads] tma_l3_bound [This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core] tma_lcp [This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs)] tma_lsd [This metric represents Core fraction of cycles in which CPU was likely limited due to LSD (Loop Stream Detector) unit] tma_mem_scheduler [Counts the number of issue slots that were not consumed by the backend due to memory reservation stalls in which a scheduler is not able to accept uops] tma_memory_operations [This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses] tma_microcode_sequencer [This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit] tma_mite [This metric represents Core fraction of cycles in which CPU was likely limited due to the MITE pipeline (the legacy decode pipeline)] tma_ms_switches [This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)] tma_non_fused_branches [This metric represents fraction of slots where the CPU was retiring branch instructions that were not fused] tma_non_mem_scheduler [Counts the number of issue slots that were not consumed by the backend due to IEC or FPC RAT stalls,which can be due to FIQ or IEC reservation stalls in which the integer,floating point or SIMD scheduler is not able to accept uops] tma_nuke [Counts the number of issue slots that were not consumed by the backend due to a machine clear that requires the use of microcode (slow nuke)] tma_other_fb [Counts the number of issue slots that were not delivered by the frontend due to other common frontend stalls not categorized] tma_other_light_ops [This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes] tma_other_mispredicts [This metric estimates fraction of slots the CPU was stalled due to other cases of misprediction (non-retired x86 branches or other types)] tma_other_nukes [This metric represents fraction of slots the CPU has wasted due to Nukes (Machine Clears) not related to memory ordering] tma_ports_utilization [This metric estimates fraction of cycles the CPU performance was potentially limited due to Core computation issues (non divider-related)] tma_predecode [Counts the number of issue slots that were not delivered by the frontend due to wrong predecodes] tma_register [Counts the number of issue slots that were not consumed by the backend due to the physical register file unable to accept an entry (marble stalls)] tma_reorder_buffer [Counts the number of issue slots that were not consumed by the backend due to the reorder buffer being full (ROB stalls)] tma_serialization [Counts the number of issue slots that were not consumed by the backend due to scoreboards from the instruction queue (IQ),jump execution unit (JEU),or microcode sequencer (MS)] tma_serializing_operation [This metric represents fraction of cycles the CPU issue-pipeline was stalled due to serializing operations] tma_store_bound [This metric estimates how often CPU was stalled due to RFO store memory accesses; RFO store issue a read-for-ownership request before the write] TopdownL4: [Metrics for top-down breakdown at level 4] tma_assists [This metric estimates fraction of slots the CPU retired uops delivered by the Microcode_Sequencer as a result of Assists] tma_c01_wait [This metric represents fraction of cycles the CPU was stalled due staying in C0.1 power-performance optimized state (Faster wakeup time; Smaller power savings)] tma_c02_wait [This metric represents fraction of cycles the CPU was stalled due staying in C0.2 power-performance optimized state (Slower wakeup time; Larger power savings)] tma_cisc [This metric estimates fraction of cycles the CPU retired uops originated from CISC (complex instruction set computer) instruction] tma_clears_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Machine Clears] tma_contested_accesses [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses] tma_data_sharing [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses] tma_decoder0_alone [This metric represents fraction of cycles where decoder-0 was the only active decoder] tma_dtlb_load [This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses] tma_dtlb_store [This metric roughly estimates the fraction of cycles spent handling first-level data TLB store misses] tma_false_sharing [This metric roughly estimates how often CPU was handling synchronizations due to False Sharing] tma_fb_full [This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed] tma_fp_scalar [This metric approximates arithmetic floating-point (FP) scalar uops fraction the CPU has retired] tma_fp_vector [This metric approximates arithmetic floating-point (FP) vector uops fraction the CPU has retired aggregated across all vector widths] tma_int_vector_128b [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_int_vector_256b [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_l1_hit_latency [This metric roughly estimates fraction of cycles with demand load accesses that hit the L1 cache] tma_l3_hit_latency [This metric estimates fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited)] tma_lock_latency [This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations] tma_mem_bandwidth [This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM)] tma_mem_latency [This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM)] tma_memory_fence [This metric represents fraction of cycles the CPU was stalled due to LFENCE Instructions] tma_mispredicts_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage] tma_nop_instructions [This metric represents fraction of slots where the CPU was retiring NOP (no op) instructions] tma_ports_utilized_0 [This metric represents fraction of cycles CPU executed no uops on any execution port (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_ports_utilized_1 [This metric represents fraction of cycles where the CPU executed total of 1 uop per cycle on all execution ports (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_ports_utilized_2 [This metric represents fraction of cycles CPU executed total of 2 uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)] tma_ports_utilized_3m [This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_shuffles_256b [This metric represents fraction of slots where the CPU was retiring Shuffle operations of 256-bit vector size (FP or Integer)] tma_slow_pause [This metric represents fraction of cycles the CPU was stalled due to PAUSE Instructions] tma_split_loads [This metric estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache line boundary] tma_split_stores [This metric represents rate of split store accesses] tma_sq_full [This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors)] tma_store_fwd_blk [This metric roughly estimates fraction of cycles when the memory subsystem had loads blocked since they could not forward data from earlier (in program order) overlapping stores] tma_store_latency [This metric estimates fraction of cycles the CPU spent handling L1D store misses] tma_streaming_stores [This metric estimates how often CPU was stalled due to Streaming store memory accesses; Streaming store optimize out a read request required by RFO stores] tma_unknown_branches [This metric represents fraction of cycles the CPU was stalled due to new branch address clears] tma_x87_use [This metric serves as an approximation of legacy x87 usage] TopdownL5: [Metrics for top-down breakdown at level 5] tma_alu_op_utilization [This metric represents Core fraction of cycles CPU dispatched uops on execution ports for ALU operations] tma_avx_assists [This metric estimates fraction of slots the CPU retired uops as a result of handing SSE to AVX* or AVX* to SSE transition Assists] tma_fp_assists [This metric roughly estimates fraction of slots the CPU retired uops as a result of handing Floating Point (FP) Assists] tma_fp_vector_128b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors] tma_fp_vector_256b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors] tma_load_op_utilization [This metric represents Core fraction of cycles CPU dispatched uops on execution port for Load operations] tma_load_stlb_hit [This metric roughly estimates the fraction of cycles where the (first level) DTLB was missed by load accesses,that later on hit in second-level TLB (STLB)] tma_load_stlb_miss [This metric estimates the fraction of cycles where the Second-level TLB (STLB) was missed by load accesses,performing a hardware page walk] tma_mixing_vectors [This metric estimates penalty in terms of percentage of([SKL+] injected blend uops out of all Uops Issued -- the Count Domain; [ADL+] cycles)] tma_page_faults [This metric roughly estimates fraction of slots the CPU retired uops as a result of handing Page Faults] tma_store_op_utilization [This metric represents Core fraction of cycles CPU dispatched uops on execution port for Store operations] tma_store_stlb_hit [This metric roughly estimates the fraction of cycles where the TLB was missed by store accesses,hitting in the second-level TLB (STLB)] tma_store_stlb_miss [This metric estimates the fraction of cycles where the STLB was missed by store accesses,performing a hardware page walk] TopdownL6: [Metrics for top-down breakdown at level 6] tma_port_0 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch)] tma_port_1 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 1 (ALU)] tma_port_6 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 6 ([HSW+] Primary Branch and simple ALU)] load_store_bound: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_info_load_miss_bound_%_loadmissbound_with_l2hit [Percentage of memory bound stalls where retirement is stalled due to an L1 miss that hit the L2] tma_info_load_miss_bound_%_loadmissbound_with_l3hit [Percentage of memory bound stalls where retirement is stalled due to an L1 miss that hit the L3] tma_info_load_miss_bound_%_loadmissbound_with_l3miss [Percentage of memory bound stalls where retirement is stalled due to an L1 miss that subsequently misses the L3] tma_info_load_store_bound_l1_bound [Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a pipeline block] tma_info_load_store_bound_load_bound [Counts the number of cycles that the oldest load of the load buffer is stalled at retirement] tma_info_load_store_bound_store_bound [Counts the number of cycles the core is stalled due to store buffer full] smi: smi_cycles [Percentage of cycles spent in System Management Interrupts] smi_num [Number of SMI interrupts] tma_L1_group: [Metrics for top-down breakdown at level 1] tma_backend_bound [This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend] tma_bad_speculation [This category represents fraction of slots wasted due to incorrect speculations] tma_frontend_bound [This category represents fraction of slots where the processor's Frontend undersupplies its Backend] tma_info_core_coreipc [Instructions Per Cycle across hyper-threads (per physical core)] tma_info_inst_mix_instructions [Total number of retired Instructions] tma_info_thread_slots [Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor ICL onward)] tma_info_thread_slots_utilization [Fraction of Physical Core issue-slots utilized by this Logical Processor] tma_retiring [This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired] tma_L2_group: [Metrics for top-down breakdown at level 2] tma_branch_mispredicts [This metric represents fraction of slots the CPU has wasted due to Branch Misprediction] tma_core_bound [This metric represents fraction of slots where Core non-memory issues were of a bottleneck] tma_fetch_bandwidth [This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues] tma_fetch_latency [This metric represents fraction of slots the CPU was stalled due to Frontend latency issues] tma_heavy_operations [This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences] tma_ifetch_bandwidth [Counts the number of issue slots that were not delivered by the frontend due to frontend bandwidth restrictions due to decode,predecode,cisc,and other limitations] tma_ifetch_latency [Counts the number of issue slots that were not delivered by the frontend due to frontend latency restrictions due to icache misses,itlb misses, branch detection,and resteer limitations] tma_light_operations [This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation)] tma_machine_clears [This metric represents fraction of slots the CPU has wasted due to Machine Clears] tma_memory_bound [This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck] tma_resource_bound [Counts the number of cycles the core is stalled due to a resource limitation] tma_L3_group: [Metrics for top-down breakdown at level 3] tma_allocation_restriction [Counts the number of issue slots that were not consumed by the backend due to certain allocation restrictions] tma_branch_detect [Counts the number of issue slots that were not delivered by the frontend due to BACLEARS,which occurs when the Branch Target Buffer (BTB) prediction or lack thereof,was corrected by a later branch predictor in the frontend] tma_branch_resteer [Counts the number of issue slots that were not delivered by the frontend due to BTCLEARS,which occurs when the Branch Target Buffer (BTB) predicts a taken branch] tma_branch_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers] tma_cisc [Counts the number of issue slots that were not delivered by the frontend due to the microcode sequencer (MS)] tma_decode [Counts the number of issue slots that were not delivered by the frontend due to decode stalls] tma_divider [This metric represents fraction of cycles where the Divider unit was active] tma_dram_bound [This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads] tma_dsb [This metric represents Core fraction of cycles in which CPU was likely limited due to DSB (decoded uop cache) fetch pipeline] tma_dsb_switches [This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines] tma_fast_nuke [Counts the number of issue slots that were not consumed by the backend due to a machine clear that does not require the use of microcode, classified as a fast nuke,due to memory ordering,memory disambiguation and memory renaming] tma_few_uops_instructions [This metric represents fraction of slots where the CPU was retiring instructions that that are decoder into two or up to ([SNB+] four; [ADL+] five) uops] tma_fp_arith [This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)] tma_fused_instructions [This metric represents fraction of slots where the CPU was retiring fused instructions -- where one uop can represent multiple contiguous instructions] tma_icache_misses [This metric represents fraction of cycles the CPU was stalled due to instruction cache misses] tma_int_operations [This metric represents overall Integer (Int) select operations fraction the CPU has executed (retired)] tma_itlb_misses [This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses] tma_l1_bound [This metric estimates how often the CPU was stalled without loads missing the L1 data cache] tma_l2_bound [This metric estimates how often the CPU was stalled due to L2 cache accesses by loads] tma_l3_bound [This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core] tma_lcp [This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs)] tma_lsd [This metric represents Core fraction of cycles in which CPU was likely limited due to LSD (Loop Stream Detector) unit] tma_mem_scheduler [Counts the number of issue slots that were not consumed by the backend due to memory reservation stalls in which a scheduler is not able to accept uops] tma_memory_operations [This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses] tma_microcode_sequencer [This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit] tma_mite [This metric represents Core fraction of cycles in which CPU was likely limited due to the MITE pipeline (the legacy decode pipeline)] tma_ms_switches [This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)] tma_non_fused_branches [This metric represents fraction of slots where the CPU was retiring branch instructions that were not fused] tma_non_mem_scheduler [Counts the number of issue slots that were not consumed by the backend due to IEC or FPC RAT stalls,which can be due to FIQ or IEC reservation stalls in which the integer,floating point or SIMD scheduler is not able to accept uops] tma_nuke [Counts the number of issue slots that were not consumed by the backend due to a machine clear that requires the use of microcode (slow nuke)] tma_other_fb [Counts the number of issue slots that were not delivered by the frontend due to other common frontend stalls not categorized] tma_other_light_ops [This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes] tma_other_mispredicts [This metric estimates fraction of slots the CPU was stalled due to other cases of misprediction (non-retired x86 branches or other types)] tma_other_nukes [This metric represents fraction of slots the CPU has wasted due to Nukes (Machine Clears) not related to memory ordering] tma_ports_utilization [This metric estimates fraction of cycles the CPU performance was potentially limited due to Core computation issues (non divider-related)] tma_predecode [Counts the number of issue slots that were not delivered by the frontend due to wrong predecodes] tma_register [Counts the number of issue slots that were not consumed by the backend due to the physical register file unable to accept an entry (marble stalls)] tma_reorder_buffer [Counts the number of issue slots that were not consumed by the backend due to the reorder buffer being full (ROB stalls)] tma_serialization [Counts the number of issue slots that were not consumed by the backend due to scoreboards from the instruction queue (IQ),jump execution unit (JEU),or microcode sequencer (MS)] tma_serializing_operation [This metric represents fraction of cycles the CPU issue-pipeline was stalled due to serializing operations] tma_store_bound [This metric estimates how often CPU was stalled due to RFO store memory accesses; RFO store issue a read-for-ownership request before the write] tma_L4_group: [Metrics for top-down breakdown at level 4] tma_assists [This metric estimates fraction of slots the CPU retired uops delivered by the Microcode_Sequencer as a result of Assists] tma_c01_wait [This metric represents fraction of cycles the CPU was stalled due staying in C0.1 power-performance optimized state (Faster wakeup time; Smaller power savings)] tma_c02_wait [This metric represents fraction of cycles the CPU was stalled due staying in C0.2 power-performance optimized state (Slower wakeup time; Larger power savings)] tma_cisc [This metric estimates fraction of cycles the CPU retired uops originated from CISC (complex instruction set computer) instruction] tma_clears_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Machine Clears] tma_contested_accesses [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses] tma_data_sharing [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses] tma_decoder0_alone [This metric represents fraction of cycles where decoder-0 was the only active decoder] tma_dtlb_load [This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses] tma_dtlb_store [This metric roughly estimates the fraction of cycles spent handling first-level data TLB store misses] tma_false_sharing [This metric roughly estimates how often CPU was handling synchronizations due to False Sharing] tma_fb_full [This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed] tma_fp_scalar [This metric approximates arithmetic floating-point (FP) scalar uops fraction the CPU has retired] tma_fp_vector [This metric approximates arithmetic floating-point (FP) vector uops fraction the CPU has retired aggregated across all vector widths] tma_int_vector_128b [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_int_vector_256b [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_l1_hit_latency [This metric roughly estimates fraction of cycles with demand load accesses that hit the L1 cache] tma_l3_hit_latency [This metric estimates fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited)] tma_lock_latency [This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations] tma_mem_bandwidth [This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM)] tma_mem_latency [This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM)] tma_memory_fence [This metric represents fraction of cycles the CPU was stalled due to LFENCE Instructions] tma_mispredicts_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage] tma_nop_instructions [This metric represents fraction of slots where the CPU was retiring NOP (no op) instructions] tma_ports_utilized_0 [This metric represents fraction of cycles CPU executed no uops on any execution port (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_ports_utilized_1 [This metric represents fraction of cycles where the CPU executed total of 1 uop per cycle on all execution ports (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_ports_utilized_2 [This metric represents fraction of cycles CPU executed total of 2 uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)] tma_ports_utilized_3m [This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_shuffles_256b [This metric represents fraction of slots where the CPU was retiring Shuffle operations of 256-bit vector size (FP or Integer)] tma_slow_pause [This metric represents fraction of cycles the CPU was stalled due to PAUSE Instructions] tma_split_loads [This metric estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache line boundary] tma_split_stores [This metric represents rate of split store accesses] tma_sq_full [This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors)] tma_store_fwd_blk [This metric roughly estimates fraction of cycles when the memory subsystem had loads blocked since they could not forward data from earlier (in program order) overlapping stores] tma_store_latency [This metric estimates fraction of cycles the CPU spent handling L1D store misses] tma_streaming_stores [This metric estimates how often CPU was stalled due to Streaming store memory accesses; Streaming store optimize out a read request required by RFO stores] tma_unknown_branches [This metric represents fraction of cycles the CPU was stalled due to new branch address clears] tma_x87_use [This metric serves as an approximation of legacy x87 usage] tma_L5_group: [Metrics for top-down breakdown at level 5] tma_alu_op_utilization [This metric represents Core fraction of cycles CPU dispatched uops on execution ports for ALU operations] tma_avx_assists [This metric estimates fraction of slots the CPU retired uops as a result of handing SSE to AVX* or AVX* to SSE transition Assists] tma_fp_assists [This metric roughly estimates fraction of slots the CPU retired uops as a result of handing Floating Point (FP) Assists] tma_fp_vector_128b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors] tma_fp_vector_256b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors] tma_load_op_utilization [This metric represents Core fraction of cycles CPU dispatched uops on execution port for Load operations] tma_load_stlb_hit [This metric roughly estimates the fraction of cycles where the (first level) DTLB was missed by load accesses,that later on hit in second-level TLB (STLB)] tma_load_stlb_miss [This metric estimates the fraction of cycles where the Second-level TLB (STLB) was missed by load accesses,performing a hardware page walk] tma_mixing_vectors [This metric estimates penalty in terms of percentage of([SKL+] injected blend uops out of all Uops Issued -- the Count Domain; [ADL+] cycles)] tma_page_faults [This metric roughly estimates fraction of slots the CPU retired uops as a result of handing Page Faults] tma_store_op_utilization [This metric represents Core fraction of cycles CPU dispatched uops on execution port for Store operations] tma_store_stlb_hit [This metric roughly estimates the fraction of cycles where the TLB was missed by store accesses,hitting in the second-level TLB (STLB)] tma_store_stlb_miss [This metric estimates the fraction of cycles where the STLB was missed by store accesses,performing a hardware page walk] tma_L6_group: [Metrics for top-down breakdown at level 6] tma_port_0 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch)] tma_port_1 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 1 (ALU)] tma_port_6 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 6 ([HSW+] Primary Branch and simple ALU)] tma_alu_op_utilization_group: [Metrics contributing to tma_alu_op_utilization category] tma_port_0 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch)] tma_port_1 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 1 (ALU)] tma_port_6 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 6 ([HSW+] Primary Branch and simple ALU)] tma_assists_group: [Metrics contributing to tma_assists category] tma_avx_assists [This metric estimates fraction of slots the CPU retired uops as a result of handing SSE to AVX* or AVX* to SSE transition Assists] tma_fp_assists [This metric roughly estimates fraction of slots the CPU retired uops as a result of handing Floating Point (FP) Assists] tma_page_faults [This metric roughly estimates fraction of slots the CPU retired uops as a result of handing Page Faults] tma_backend_bound_group: [Metrics contributing to tma_backend_bound category] tma_core_bound [This metric represents fraction of slots where Core non-memory issues were of a bottleneck] tma_memory_bound [This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck] tma_resource_bound [Counts the number of cycles the core is stalled due to a resource limitation] tma_bad_speculation_group: [Metrics contributing to tma_bad_speculation category] tma_branch_mispredicts [This metric represents fraction of slots the CPU has wasted due to Branch Misprediction] tma_machine_clears [This metric represents fraction of slots the CPU has wasted due to Machine Clears] tma_branch_mispredicts_group: [Metrics contributing to tma_branch_mispredicts category] tma_other_mispredicts [This metric estimates fraction of slots the CPU was stalled due to other cases of misprediction (non-retired x86 branches or other types)] tma_branch_resteers_group: [Metrics contributing to tma_branch_resteers category] tma_clears_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Machine Clears] tma_mispredicts_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage] tma_unknown_branches [This metric represents fraction of cycles the CPU was stalled due to new branch address clears] tma_core_bound_group: [Metrics contributing to tma_core_bound category] tma_allocation_restriction [Counts the number of issue slots that were not consumed by the backend due to certain allocation restrictions] tma_divider [This metric represents fraction of cycles where the Divider unit was active] tma_ports_utilization [This metric estimates fraction of cycles the CPU performance was potentially limited due to Core computation issues (non divider-related)] tma_serializing_operation [This metric represents fraction of cycles the CPU issue-pipeline was stalled due to serializing operations] tma_dram_bound_group: [Metrics contributing to tma_dram_bound category] tma_mem_bandwidth [This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM)] tma_mem_latency [This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM)] tma_dtlb_load_group: [Metrics contributing to tma_dtlb_load category] tma_load_stlb_hit [This metric roughly estimates the fraction of cycles where the (first level) DTLB was missed by load accesses,that later on hit in second-level TLB (STLB)] tma_load_stlb_miss [This metric estimates the fraction of cycles where the Second-level TLB (STLB) was missed by load accesses,performing a hardware page walk] tma_dtlb_store_group: [Metrics contributing to tma_dtlb_store category] tma_store_stlb_hit [This metric roughly estimates the fraction of cycles where the TLB was missed by store accesses,hitting in the second-level TLB (STLB)] tma_store_stlb_miss [This metric estimates the fraction of cycles where the STLB was missed by store accesses,performing a hardware page walk] tma_fetch_bandwidth_group: [Metrics contributing to tma_fetch_bandwidth category] tma_dsb [This metric represents Core fraction of cycles in which CPU was likely limited due to DSB (decoded uop cache) fetch pipeline] tma_lsd [This metric represents Core fraction of cycles in which CPU was likely limited due to LSD (Loop Stream Detector) unit] tma_mite [This metric represents Core fraction of cycles in which CPU was likely limited due to the MITE pipeline (the legacy decode pipeline)] tma_fetch_latency_group: [Metrics contributing to tma_fetch_latency category] tma_branch_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers] tma_dsb_switches [This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines] tma_icache_misses [This metric represents fraction of cycles the CPU was stalled due to instruction cache misses] tma_itlb_misses [This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses] tma_lcp [This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs)] tma_ms_switches [This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)] tma_fp_arith_group: [Metrics contributing to tma_fp_arith category] tma_fp_scalar [This metric approximates arithmetic floating-point (FP) scalar uops fraction the CPU has retired] tma_fp_vector [This metric approximates arithmetic floating-point (FP) vector uops fraction the CPU has retired aggregated across all vector widths] tma_x87_use [This metric serves as an approximation of legacy x87 usage] tma_fp_vector_group: [Metrics contributing to tma_fp_vector category] tma_fp_vector_128b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors] tma_fp_vector_256b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors] tma_frontend_bound_group: [Metrics contributing to tma_frontend_bound category] tma_fetch_bandwidth [This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues] tma_fetch_latency [This metric represents fraction of slots the CPU was stalled due to Frontend latency issues] tma_ifetch_bandwidth [Counts the number of issue slots that were not delivered by the frontend due to frontend bandwidth restrictions due to decode,predecode,cisc,and other limitations] tma_ifetch_latency [Counts the number of issue slots that were not delivered by the frontend due to frontend latency restrictions due to icache misses,itlb misses, branch detection,and resteer limitations] tma_heavy_operations_group: [Metrics contributing to tma_heavy_operations category] tma_few_uops_instructions [This metric represents fraction of slots where the CPU was retiring instructions that that are decoder into two or up to ([SNB+] four; [ADL+] five) uops] tma_microcode_sequencer [This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit] tma_ifetch_bandwidth_group: [Metrics contributing to tma_ifetch_bandwidth category] tma_cisc [Counts the number of issue slots that were not delivered by the frontend due to the microcode sequencer (MS)] tma_decode [Counts the number of issue slots that were not delivered by the frontend due to decode stalls] tma_other_fb [Counts the number of issue slots that were not delivered by the frontend due to other common frontend stalls not categorized] tma_predecode [Counts the number of issue slots that were not delivered by the frontend due to wrong predecodes] tma_ifetch_latency_group: [Metrics contributing to tma_ifetch_latency category] tma_branch_detect [Counts the number of issue slots that were not delivered by the frontend due to BACLEARS,which occurs when the Branch Target Buffer (BTB) prediction or lack thereof,was corrected by a later branch predictor in the frontend] tma_branch_resteer [Counts the number of issue slots that were not delivered by the frontend due to BTCLEARS,which occurs when the Branch Target Buffer (BTB) predicts a taken branch] tma_icache_misses [Counts the number of issue slots that were not delivered by the frontend due to instruction cache misses] tma_itlb_misses [Counts the number of issue slots that were not delivered by the frontend due to Instruction Table Lookaside Buffer (ITLB) misses] tma_info_bottleneck_%_dtlb_miss_bound_cycles: tma_info_bottleneck_%_dtlb_miss_bound_cycles [Percentage of time that retirement is stalled due to a first level data TLB miss] tma_info_br_inst_mix_ipbranch: tma_info_br_inst_mix_ipbranch [Instructions per Branch (lower number means higher occurrence rate)] tma_info_br_inst_mix_ipcall: tma_info_br_inst_mix_ipcall [Instruction per (near) call (lower number means higher occurrence rate)] tma_info_br_inst_mix_ipfarbranch: tma_info_br_inst_mix_ipfarbranch [Instructions per Far Branch ( Far Branches apply upon transition from application to operating system,handling interrupts,exceptions) [lower number means higher occurrence rate]] tma_info_br_inst_mix_ipmisp_cond_ntaken: tma_info_br_inst_mix_ipmisp_cond_ntaken [Instructions per retired conditional Branch Misprediction where the branch was not taken] tma_info_br_inst_mix_ipmisp_cond_taken: tma_info_br_inst_mix_ipmisp_cond_taken [Instructions per retired conditional Branch Misprediction where the branch was taken] tma_info_br_inst_mix_ipmisp_indirect: tma_info_br_inst_mix_ipmisp_indirect [Instructions per retired indirect call or jump Branch Misprediction] tma_info_br_inst_mix_ipmisp_ret: tma_info_br_inst_mix_ipmisp_ret [Instructions per retired return Branch Misprediction] tma_info_br_inst_mix_ipmispredict: tma_info_br_inst_mix_ipmispredict [Instructions per retired Branch Misprediction] tma_info_br_mispredict_bound_branch_mispredict_ratio: tma_info_br_mispredict_bound_branch_mispredict_ratio [Ratio of all branches which mispredict] tma_info_br_mispredict_bound_branch_mispredict_to_unknown_branch_ratio: tma_info_br_mispredict_bound_branch_mispredict_to_unknown_branch_ratio [Ratio between Mispredicted branches and unknown branches] tma_info_buffer_stalls_%_load_buffer_stall_cycles: tma_info_buffer_stalls_%_load_buffer_stall_cycles [Percentage of time that allocation is stalled due to load buffer full] tma_info_buffer_stalls_%_mem_rsv_stall_cycles: tma_info_buffer_stalls_%_mem_rsv_stall_cycles [Percentage of time that allocation is stalled due to memory reservation stations full] tma_info_buffer_stalls_%_store_buffer_stall_cycles: tma_info_buffer_stalls_%_store_buffer_stall_cycles [Percentage of time that allocation is stalled due to store buffer full] tma_info_core_cpi: tma_info_core_cpi [Cycles Per Instruction] tma_info_core_ipc: tma_info_core_ipc [Instructions Per Cycle] tma_info_core_upi: tma_info_core_upi [Uops Per Instruction] tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l2hit: tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l2hit [Percentage of ifetch miss bound stalls,where the ifetch miss hits in the L2] tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l3hit: tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l3hit [Percentage of ifetch miss bound stalls,where the ifetch miss hits in the L3] tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l3miss: tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l3miss [Percentage of ifetch miss bound stalls,where the ifetch miss subsequently misses in the L3] tma_info_machine_clear_bound_machine_clears_disamb_pki: tma_info_machine_clear_bound_machine_clears_disamb_pki [Counts the number of machine clears relative to thousands of instructions retired,due to memory disambiguation] tma_info_machine_clear_bound_machine_clears_fp_assist_pki: tma_info_machine_clear_bound_machine_clears_fp_assist_pki [Counts the number of machine clears relative to thousands of instructions retired,due to floating point assists] tma_info_machine_clear_bound_machine_clears_monuke_pki: tma_info_machine_clear_bound_machine_clears_monuke_pki [Counts the number of machine clears relative to thousands of instructions retired,due to memory ordering] tma_info_machine_clear_bound_machine_clears_mrn_pki: tma_info_machine_clear_bound_machine_clears_mrn_pki [Counts the number of machine clears relative to thousands of instructions retired,due to memory renaming] tma_info_machine_clear_bound_machine_clears_page_fault_pki: tma_info_machine_clear_bound_machine_clears_page_fault_pki [Counts the number of machine clears relative to thousands of instructions retired,due to page faults] tma_info_machine_clear_bound_machine_clears_smc_pki: tma_info_machine_clear_bound_machine_clears_smc_pki [Counts the number of machine clears relative to thousands of instructions retired,due to self-modifying code] tma_info_mem_exec_blocks_%_loads_with_adressaliasing: tma_info_mem_exec_blocks_%_loads_with_adressaliasing [Percentage of total non-speculative loads with an address aliasing block] tma_info_mem_exec_blocks_%_loads_with_storefwdblk: tma_info_mem_exec_blocks_%_loads_with_storefwdblk [Percentage of total non-speculative loads with a store forward or unknown store address block] tma_info_mem_exec_bound_%_loadhead_with_l1miss: tma_info_mem_exec_bound_%_loadhead_with_l1miss [Percentage of Memory Execution Bound due to a first level data cache miss] tma_info_mem_exec_bound_%_loadhead_with_otherpipelineblks: tma_info_mem_exec_bound_%_loadhead_with_otherpipelineblks [Percentage of Memory Execution Bound due to other block cases,such as pipeline conflicts,fences,etc] tma_info_mem_exec_bound_%_loadhead_with_pagewalk: tma_info_mem_exec_bound_%_loadhead_with_pagewalk [Percentage of Memory Execution Bound due to a pagewalk] tma_info_mem_exec_bound_%_loadhead_with_stlbhit: tma_info_mem_exec_bound_%_loadhead_with_stlbhit [Percentage of Memory Execution Bound due to a second level TLB miss] tma_info_mem_exec_bound_%_loadhead_with_storefwding: tma_info_mem_exec_bound_%_loadhead_with_storefwding [Percentage of Memory Execution Bound due to a store forward address match] tma_info_mem_mix_ipload: tma_info_mem_mix_ipload [Instructions per Load] tma_info_mem_mix_ipstore: tma_info_mem_mix_ipstore [Instructions per Store] tma_info_mem_mix_load_locks_ratio: tma_info_mem_mix_load_locks_ratio [Percentage of total non-speculative loads that perform one or more locks] tma_info_mem_mix_load_splits_ratio: tma_info_mem_mix_load_splits_ratio [Percentage of total non-speculative loads that are splits] tma_info_mem_mix_memload_ratio: tma_info_mem_mix_memload_ratio [Ratio of mem load uops to all uops] tma_info_serialization _%_tpause_cycles: tma_info_serialization _%_tpause_cycles [Percentage of time that the core is stalled due to a TPAUSE or UMWAIT instruction] tma_info_system_cpu_utilization: tma_info_system_cpu_utilization [Average CPU Utilization] tma_info_uop_mix_fpdiv_uop_ratio: tma_info_uop_mix_fpdiv_uop_ratio [Percentage of all uops which are FPDiv uops] tma_info_uop_mix_idiv_uop_ratio: tma_info_uop_mix_idiv_uop_ratio [Percentage of all uops which are IDiv uops] tma_info_uop_mix_microcode_uop_ratio: tma_info_uop_mix_microcode_uop_ratio [Percentage of all uops which are microcode ops] tma_info_uop_mix_x87_uop_ratio: tma_info_uop_mix_x87_uop_ratio [Percentage of all uops which are x87 uops] tma_int_operations_group: [Metrics contributing to tma_int_operations category] tma_int_vector_128b [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_int_vector_256b [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_issue2P: [Metrics related by the issue $issue2P] tma_fp_scalar [This metric approximates arithmetic floating-point (FP) scalar uops fraction the CPU has retired] tma_fp_vector [This metric approximates arithmetic floating-point (FP) vector uops fraction the CPU has retired aggregated across all vector widths] tma_fp_vector_128b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors] tma_fp_vector_256b [This metric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors] tma_int_vector_128b [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_int_vector_256b [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired] tma_port_0 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch)] tma_port_1 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 1 (ALU)] tma_port_6 [This metric represents Core fraction of cycles CPU dispatched uops on execution port 6 ([HSW+] Primary Branch and simple ALU)] tma_ports_utilized_2 [This metric represents fraction of cycles CPU executed total of 2 uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)] tma_issueBM: [Metrics related by the issue $issueBM] tma_branch_mispredicts [This metric represents fraction of slots the CPU has wasted due to Branch Misprediction] tma_info_bad_spec_branch_misprediction_cost [Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)] tma_info_bottleneck_mispredictions [Total pipeline cost of Branch Misprediction related bottlenecks] tma_mispredicts_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage] tma_issueBW: [Metrics related by the issue $issueBW] tma_fb_full [This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed] tma_info_bottleneck_cache_memory_bandwidth [Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenecks] tma_info_system_dram_bw_use [Average external Memory Bandwidth Use for reads and writes [GB / sec]] tma_mem_bandwidth [This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM)] tma_sq_full [This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors)] tma_issueComp: [Metrics related by the issue $issueComp] tma_info_bottleneck_compute_bound_est [Total pipeline cost when the execution is compute-bound - an estimation] tma_issueD0: [Metrics related by the issue $issueD0] tma_decoder0_alone [This metric represents fraction of cycles where decoder-0 was the only active decoder] tma_few_uops_instructions [This metric represents fraction of slots where the CPU was retiring instructions that that are decoder into two or up to ([SNB+] four; [ADL+] five) uops] tma_issueFB: [Metrics related by the issue $issueFB] tma_dsb_switches [This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines] tma_fetch_bandwidth [This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues] tma_info_botlnk_l2_dsb_bandwidth [Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetch_BW Bottleneck] tma_info_botlnk_l2_dsb_misses [Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck] tma_info_frontend_dsb_coverage [Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)] tma_info_inst_mix_iptb [Instructions per taken branch] tma_lcp [This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs)] tma_issueFL: [Metrics related by the issue $issueFL] tma_info_botlnk_l2_ic_misses [Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bottleneck] tma_issueL1: [Metrics related by the issue $issueL1] tma_l1_bound [This metric estimates how often the CPU was stalled without loads missing the L1 data cache] tma_ports_utilized_1 [This metric represents fraction of cycles where the CPU executed total of 1 uop per cycle on all execution ports (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_issueLat: [Metrics related by the issue $issueLat] tma_info_bottleneck_cache_memory_latency [Total pipeline cost of external Memory- or Cache-Latency related bottlenecks] tma_l3_hit_latency [This metric estimates fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited)] tma_mem_latency [This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM)] tma_issueMC: [Metrics related by the issue $issueMC] tma_clears_resteers [This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Machine Clears] tma_l1_bound [This metric estimates how often the CPU was stalled without loads missing the L1 data cache] tma_machine_clears [This metric represents fraction of slots the CPU has wasted due to Machine Clears] tma_microcode_sequencer [This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit] tma_ms_switches [This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)] tma_issueMS: [Metrics related by the issue $issueMS] tma_info_bottleneck_irregular_overhead [Total pipeline cost of irregular execution (e.g] tma_microcode_sequencer [This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit] tma_ms_switches [This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)] tma_issueMV: [Metrics related by the issue $issueMV] tma_mixing_vectors [This metric estimates penalty in terms of percentage of([SKL+] injected blend uops out of all Uops Issued -- the Count Domain; [ADL+] cycles)] tma_ms_switches [This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)] tma_issueRFO: [Metrics related by the issue $issueRFO] tma_lock_latency [This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations] tma_store_latency [This metric estimates fraction of cycles the CPU spent handling L1D store misses] tma_issueSL: [Metrics related by the issue $issueSL] tma_fb_full [This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed] tma_store_latency [This metric estimates fraction of cycles the CPU spent handling L1D store misses] tma_issueSO: [Metrics related by the issue $issueSO] tma_ms_switches [This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)] tma_serializing_operation [This metric represents fraction of cycles the CPU issue-pipeline was stalled due to serializing operations] tma_issueSmSt: [Metrics related by the issue $issueSmSt] tma_fb_full [This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed] tma_streaming_stores [This metric estimates how often CPU was stalled due to Streaming store memory accesses; Streaming store optimize out a read request required by RFO stores] tma_issueSpSt: [Metrics related by the issue $issueSpSt] tma_split_stores [This metric represents rate of split store accesses] tma_issueSyncxn: [Metrics related by the issue $issueSyncxn] tma_contested_accesses [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses] tma_data_sharing [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses] tma_false_sharing [This metric roughly estimates how often CPU was handling synchronizations due to False Sharing] tma_machine_clears [This metric represents fraction of slots the CPU has wasted due to Machine Clears] tma_issueTLB: [Metrics related by the issue $issueTLB] tma_dtlb_load [This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses] tma_dtlb_store [This metric roughly estimates the fraction of cycles spent handling first-level data TLB store misses] tma_info_bottleneck_memory_data_tlbs [Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)] tma_info_bottleneck_memory_synchronization [Total pipeline cost of Memory Synchronization related bottlenecks (data transfers and coherency updates across processors)] tma_l1_bound_group: [Metrics contributing to tma_l1_bound category] tma_dtlb_load [This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses] tma_fb_full [This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed] tma_l1_hit_latency [This metric roughly estimates fraction of cycles with demand load accesses that hit the L1 cache] tma_lock_latency [This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations] tma_split_loads [This metric estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache line boundary] tma_store_fwd_blk [This metric roughly estimates fraction of cycles when the memory subsystem had loads blocked since they could not forward data from earlier (in program order) overlapping stores] tma_l3_bound_group: [Metrics contributing to tma_l3_bound category] tma_contested_accesses [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses] tma_data_sharing [This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses] tma_l3_hit_latency [This metric estimates fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited)] tma_sq_full [This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors)] tma_light_operations_group: [Metrics contributing to tma_light_operations category] tma_fp_arith [This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)] tma_fused_instructions [This metric represents fraction of slots where the CPU was retiring fused instructions -- where one uop can represent multiple contiguous instructions] tma_int_operations [This metric represents overall Integer (Int) select operations fraction the CPU has executed (retired)] tma_memory_operations [This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses] tma_non_fused_branches [This metric represents fraction of slots where the CPU was retiring branch instructions that were not fused] tma_other_light_ops [This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes] tma_machine_clears_group: [Metrics contributing to tma_machine_clears category] tma_fast_nuke [Counts the number of issue slots that were not consumed by the backend due to a machine clear that does not require the use of microcode, classified as a fast nuke,due to memory ordering,memory disambiguation and memory renaming] tma_nuke [Counts the number of issue slots that were not consumed by the backend due to a machine clear that requires the use of microcode (slow nuke)] tma_other_nukes [This metric represents fraction of slots the CPU has wasted due to Nukes (Machine Clears) not related to memory ordering] tma_memory_bound_group: [Metrics contributing to tma_memory_bound category] tma_dram_bound [This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads] tma_l1_bound [This metric estimates how often the CPU was stalled without loads missing the L1 data cache] tma_l2_bound [This metric estimates how often the CPU was stalled due to L2 cache accesses by loads] tma_l3_bound [This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core] tma_store_bound [This metric estimates how often CPU was stalled due to RFO store memory accesses; RFO store issue a read-for-ownership request before the write] tma_microcode_sequencer_group: [Metrics contributing to tma_microcode_sequencer category] tma_assists [This metric estimates fraction of slots the CPU retired uops delivered by the Microcode_Sequencer as a result of Assists] tma_cisc [This metric estimates fraction of cycles the CPU retired uops originated from CISC (complex instruction set computer) instruction] tma_mite_group: [Metrics contributing to tma_mite category] tma_decoder0_alone [This metric represents fraction of cycles where decoder-0 was the only active decoder] tma_other_light_ops_group: [Metrics contributing to tma_other_light_ops category] tma_nop_instructions [This metric represents fraction of slots where the CPU was retiring NOP (no op) instructions] tma_shuffles_256b [This metric represents fraction of slots where the CPU was retiring Shuffle operations of 256-bit vector size (FP or Integer)] tma_ports_utilization_group: [Metrics contributing to tma_ports_utilization category] tma_ports_utilized_0 [This metric represents fraction of cycles CPU executed no uops on any execution port (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_ports_utilized_1 [This metric represents fraction of cycles where the CPU executed total of 1 uop per cycle on all execution ports (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_ports_utilized_2 [This metric represents fraction of cycles CPU executed total of 2 uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)] tma_ports_utilized_3m [This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL,Physical Core cycles otherwise)] tma_ports_utilized_0_group: [Metrics contributing to tma_ports_utilized_0 category] tma_mixing_vectors [This metric estimates penalty in terms of percentage of([SKL+] injected blend uops out of all Uops Issued -- the Count Domain; [ADL+] cycles)] tma_ports_utilized_3m_group: [Metrics contributing to tma_ports_utilized_3m category] tma_alu_op_utilization [This metric represents Core fraction of cycles CPU dispatched uops on execution ports for ALU operations] tma_load_op_utilization [This metric represents Core fraction of cycles CPU dispatched uops on execution port for Load operations] tma_store_op_utilization [This metric represents Core fraction of cycles CPU dispatched uops on execution port for Store operations] tma_resource_bound_group: [Metrics contributing to tma_resource_bound category] tma_mem_scheduler [Counts the number of issue slots that were not consumed by the backend due to memory reservation stalls in which a scheduler is not able to accept uops] tma_non_mem_scheduler [Counts the number of issue slots that were not consumed by the backend due to IEC or FPC RAT stalls,which can be due to FIQ or IEC reservation stalls in which the integer,floating point or SIMD scheduler is not able to accept uops] tma_register [Counts the number of issue slots that were not consumed by the backend due to the physical register file unable to accept an entry (marble stalls)] tma_reorder_buffer [Counts the number of issue slots that were not consumed by the backend due to the reorder buffer being full (ROB stalls)] tma_serialization [Counts the number of issue slots that were not consumed by the backend due to scoreboards from the instruction queue (IQ),jump execution unit (JEU),or microcode sequencer (MS)] tma_retiring_group: [Metrics contributing to tma_retiring category] tma_heavy_operations [This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences] tma_light_operations [This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation)] tma_serializing_operation_group: [Metrics contributing to tma_serializing_operation category] tma_c01_wait [This metric represents fraction of cycles the CPU was stalled due staying in C0.1 power-performance optimized state (Faster wakeup time; Smaller power savings)] tma_c02_wait [This metric represents fraction of cycles the CPU was stalled due staying in C0.2 power-performance optimized state (Slower wakeup time; Larger power savings)] tma_memory_fence [This metric represents fraction of cycles the CPU was stalled due to LFENCE Instructions] tma_slow_pause [This metric represents fraction of cycles the CPU was stalled due to PAUSE Instructions] tma_store_bound_group: [Metrics contributing to tma_store_bound category] tma_dtlb_store [This metric roughly estimates the fraction of cycles spent handling first-level data TLB store misses] tma_false_sharing [This metric roughly estimates how often CPU was handling synchronizations due to False Sharing] tma_split_stores [This metric represents rate of split store accesses] tma_store_latency [This metric estimates fraction of cycles the CPU spent handling L1D store misses] tma_streaming_stores [This metric estimates how often CPU was stalled due to Streaming store memory accesses; Streaming store optimize out a read request required by RFO stores] transaction: tsx_aborted_cycles [Percentage of cycles in aborted transactions] tsx_cycles_per_elision [Number of cycles within a transaction divided by the number of elisions] tsx_cycles_per_transaction [Number of cycles within a transaction divided by the number of transactions] tsx_transactional_cycles [Percentage of cycles within a transaction region]
Examine the output of the following in a terminal:
perf top
perf top -z
perf top -e cache-misses
perf top -e cache-misses,cycles
In [3]:
%%writefile tmp/transpose.c
#include <stdio.h>
#include <stdlib.h>
int main()
{
const int m = 1024;
const int n = 1024;
int *matrix = malloc(sizeof(int) * m * n);
int *transpose = malloc(sizeof(int) * m * n);
for (int c = 0; c < m; c++)
for(int d = 0; d < n; d++)
matrix[c*m + d] = c+d;
for (int i = 0; i < 300; ++i)
for (int c = 0; c < m; c++)
for(int d = 0 ; d < n ; d++)
transpose[d*n + c] = matrix[c*m + d];
printf("Transpose of the matrix:\n");
int sum = 0;
for (int c = 0; c < n; c++)
for (int d = 0; d < m; d++)
sum += transpose[d*n + c];
printf("sum: %d\n", sum);
return 0;
}
Overwriting tmp/transpose.c
In [4]:
!(cd tmp; gcc transpose.c -O3 -o transpose)
!bash -c "time ./tmp/transpose"
Transpose of the matrix: sum: 1072693248 real 0m1,016s user 0m1,012s sys 0m0,004s
In [5]:
!perf record -e cycles,instructions ./tmp/transpose
Transpose of the matrix: sum: 1072693248 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0,427 MB perf.data (8461 samples) ]
- Examine
perf report
in the terminal. - Now retry, this time building with
-g
instead of-O3
In [6]:
%%writefile tmp/matvec.py
import numpy as np
n = 4096
A = np.random.randn(n, n)
b = np.random.randn(n)
for i in range(10):
A @ b
Writing tmp/matvec.py
In [9]:
!OPENBLAS_NUM_THREADS=1 perf record python tmp/matvec.py
[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0,112 MB perf.data (1501 samples) ]
In [23]:
%%writefile tmp/matmat.py
import numpy as np
n = 2048
A = np.random.randn(n, n)
B = np.random.randn(n, n)
for i in range(20):
A @ B
Overwriting tmp/matmat.py
In [24]:
!OPENBLAS_NUM_THREADS=1 perf record python tmp/matmat.py
[ perf record: Woken up 4 times to write data ] [ perf record: Captured and wrote 1,377 MB perf.data (29133 samples) ]
Run in shell separately:
perf record \
-e cycles,L1-dcache-load-misses \
-e fp_arith_inst_retired.256b_packed_double \
-c 10 \
python tmp/matvec.py
- Also try
-c 100
Look at:
perf help
perf help record
Aspects to mention:
- Measuring parts of a program?
- Granularity for ratios?
- Scope of collection
- Call graph collection (
-g
) - Precise events
Using pmu-tools / toplev¶
This uses toplev.py
from Andi Kleen's pmu-tools.
- Try the command below for a few different levels.
- Try the command below for the matvec and the matmat.
In [25]:
%%bash
OPENBLAS_NUM_THREADS=1 python ~/pack/pmu-tools/toplev.py -l4 python tmp/matmat.py
Consider disabling nmi watchdog to minimize multiplexing (echo 0 | sudo tee /proc/sys/kernel/nmi_watchdog or echo kernel.nmi_watchdog=0 >> /etc/sysctl.conf ; sysctl -p as root) BR_MISP_RETIRED.COND_NTAKEN_COST event not found for cpu_core BR_MISP_RETIRED.COND_TAKEN_COST event not found for cpu_core BR_MISP_RETIRED.INDIRECT_CALL_COST event not found for cpu_core BR_MISP_RETIRED.INDIRECT_COST event not found for cpu_core BR_MISP_RETIRED.RET_COST event not found for cpu_core MEM_INST_RETIRED.STLB_HIT_LOADS event not found for cpu_core MEM_INST_RETIRED.STLB_HIT_STORES event not found for cpu_core TOPDOWN_FE_BOUND.ALL_P event not found for cpu_atom TOPDOWN_FE_BOUND.ITLB_MISS event not found for cpu_atom TOPDOWN_BAD_SPECULATION.ALL_P event not found for cpu_atom TOPDOWN_BE_BOUND.ALL_P event not found for cpu_atom TOPDOWN_RETIRING.ALL_P event not found for cpu_atom 14 events not counted # 5.01-full-perf, 4 on 13th Gen Intel(R) Core(TM) i7-1365U [mtl] core BE Backend_Bound % Slots 44.9 [ 8.0%] core BE/Core Backend_Bound.Core_Bound % Slots 31.8 [ 8.0%] core RET Retiring.Light_Operations.FP_Arith.FP_Scalar % Uops 0.2 [ 4.0%] core RET Retiring.Light_Operations.Int_Operations.Int_Vector_128b % Uops 0.0 [ 4.0%] core RET Retiring.Light_Operations.Int_Operations.Int_Vector_256b % Uops 0.0 [ 4.0%] core BE/Core Backend_Bound.Core_Bound.Ports_Utilization % Clocks 19.6 [ 4.0%] core BE/Core Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_2 % Clocks 25.1 [ 4.0%] This metric represents fraction of cycles CPU executed total of 2 uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)... Sampling events: exe_activity.2_ports_util core BE/Core Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m % Clocks 57.5 [ 4.0%]<== This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)... Sampling events: uops_executed.cycles_ge_3 No node for atom crossed threshold Run toplev --describe Ports_Utilized_3m^ to get more information on bottleneck for core Some events not found. Consider running event_download to update event lists
Mismeasured (out of bound values):FP_Arith FP_Vector 13 nodes had zero counts: Branch_Detect Branch_Resteer Cisc Decode Fast_Nuke Mem_Scheduler Non_Mem_Scheduler Nuke Other_FB Predecode Register Reorder_Buffer Serialization Add --run-sample to find locations Add --nodes '!+Ports_Utilized_3m*/5,+MUX' for breakdown.
Using LIKWID¶
In [14]:
!likwid-perfctr -e
This architecture has 39 counters. Counter tags(name, type<, options>): BBOX0C1, Home Agent box 0, EDGEDETECT|THRESHOLD|INVERT BBOX0C2, Home Agent box 0, EDGEDETECT|THRESHOLD|INVERT BBOX0C3, Home Agent box 0, EDGEDETECT|THRESHOLD|INVERT BBOX1C1, Home Agent box 1, EDGEDETECT|THRESHOLD|INVERT BBOX1C2, Home Agent box 1, EDGEDETECT|THRESHOLD|INVERT BBOX1C3, Home Agent box 1, EDGEDETECT|THRESHOLD|INVERT MBOX2C1, Memory Controller 0 Channel 2, EDGEDETECT|THRESHOLD|INVERT MBOX2C2, Memory Controller 0 Channel 2, EDGEDETECT|THRESHOLD|INVERT MBOX2C3, Memory Controller 0 Channel 2, EDGEDETECT|THRESHOLD|INVERT MBOX2FIX, Memory Controller 0 Channel 2 Fixed Counter, INVERT MBOX3C1, Memory Controller 0 Channel 3, EDGEDETECT|THRESHOLD|INVERT MBOX3C2, Memory Controller 0 Channel 3, EDGEDETECT|THRESHOLD|INVERT MBOX3C3, Memory Controller 0 Channel 3, EDGEDETECT|THRESHOLD|INVERT MBOX3FIX, Memory Controller 0 Channel 3 Fixed Counter, INVERT MBOX6C1, Memory Controller 1 Channel 2, EDGEDETECT|THRESHOLD|INVERT MBOX6C2, Memory Controller 1 Channel 2, EDGEDETECT|THRESHOLD|INVERT MBOX6C3, Memory Controller 1 Channel 2, EDGEDETECT|THRESHOLD|INVERT MBOX6FIX, Memory Controller 1 Channel 2 Fixed Counter, INVERT MBOX7C1, Memory Controller 1 Channel 3, EDGEDETECT|THRESHOLD|INVERT MBOX7C2, Memory Controller 1 Channel 3, EDGEDETECT|THRESHOLD|INVERT MBOX7C3, Memory Controller 1 Channel 3, EDGEDETECT|THRESHOLD|INVERT MBOX7FIX, Memory Controller 1 Channel 3 Fixed Counter, INVERT PBOX1, Physical Layer box, EDGEDETECT|THRESHOLD|INVERT PBOX2, Physical Layer box, EDGEDETECT|THRESHOLD|INVERT PBOX3, Physical Layer box, EDGEDETECT|THRESHOLD|INVERT RBOX0C1, Routing box 0, EDGEDETECT|THRESHOLD|INVERT RBOX0C2, Routing box 0, EDGEDETECT|THRESHOLD|INVERT RBOX1C1, Routing box 1, EDGEDETECT|THRESHOLD|INVERT RBOX1C2, Routing box 1, EDGEDETECT|THRESHOLD|INVERT QBOX0C1, QPI Link Layer 0, EDGEDETECT|THRESHOLD|INVERT QBOX0C2, QPI Link Layer 0, EDGEDETECT|THRESHOLD|INVERT QBOX0C3, QPI Link Layer 0, EDGEDETECT|THRESHOLD|INVERT QBOX1C1, QPI Link Layer 1, EDGEDETECT|THRESHOLD|INVERT QBOX1C2, QPI Link Layer 1, EDGEDETECT|THRESHOLD|INVERT QBOX1C3, QPI Link Layer 1, EDGEDETECT|THRESHOLD|INVERT QBOX0FIX1, QPI Link Layer rate status 0 QBOX0FIX2, QPI Link Layer rate status 0 QBOX1FIX1, QPI Link Layer rate status 1 QBOX1FIX2, QPI Link Layer rate status 1 This architecture has 1668 events. Event tags (tag, id, umask, counters<, options>): TEMP_CORE, 0x0, 0x0, TMP0 PWR_PKG_ENERGY, 0x2, 0x0, PWR0 PWR_PP0_ENERGY, 0x1, 0x0, PWR1 PWR_PP1_ENERGY, 0x4, 0x0, PWR2 PWR_DRAM_ENERGY, 0x3, 0x0, PWR3 INSTR_RETIRED_ANY, 0x0, 0x0, FIXC0 CPU_CLK_UNHALTED_CORE, 0x0, 0x0, FIXC1 CPU_CLK_UNHALTED_REF, 0x0, 0x0, FIXC2 LD_BLOCKS_STORE_FORWARD, 0x3, 0x2, PMC LD_BLOCKS_NO_SR, 0x3, 0x8, PMC MISALIGN_MEM_REF_LOADS, 0x5, 0x1, PMC MISALIGN_MEM_REF_STORES, 0x5, 0x2, PMC MISALIGN_MEM_REF_ANY, 0x5, 0x3, PMC LD_BLOCKS_PARTIAL_ADDRESS_ALIAS, 0x7, 0x1, PMC DTLB_LOAD_MISSES_CAUSES_A_WALK, 0x8, 0x1, PMC DTLB_LOAD_MISSES_STLB_HIT, 0x8, 0x60, PMC DTLB_LOAD_MISSES_WALK_COMPLETED, 0x8, 0xE, PMC DTLB_LOAD_MISSES_STLB_HIT_4K, 0x8, 0x20, PMC DTLB_LOAD_MISSES_WALK_COMPLETED_4K, 0x8, 0x2, PMC DTLB_LOAD_MISSES_WALK_DURATION, 0x8, 0x10, PMC INT_MISC_RECOVERY_CYCLES, 0xD, 0x3, PMC INT_MISC_RECOVERY_COUNT, 0xD, 0x3, PMC INT_MISC_RAT_STALL_CYCLES, 0xD, 0x8, PMC INT_MISC_RAT_STALL_COUNT, 0xD, 0x8, PMC UOPS_ISSUED_ANY, 0xE, 0x1, PMC UOPS_ISSUED_FLAGS_MERGE, 0xE, 0x10, PMC UOPS_ISSUED_SLOW_LEA, 0xE, 0x20, PMC UOPS_ISSUED_SINGLE_MUL, 0xE, 0x40, PMC UOPS_ISSUED_USED_CYCLES, 0xE, 0x1, PMC UOPS_ISSUED_STALL_CYCLES, 0xE, 0x1, PMC UOPS_ISSUED_TOTAL_CYCLES, 0xE, 0x1, PMC UOPS_ISSUED_CORE_USED_CYCLES, 0xE, 0x1, PMC UOPS_ISSUED_CORE_STALL_CYCLES, 0xE, 0x1, PMC UOPS_ISSUED_CORE_TOTAL_CYCLES, 0xE, 0x1, PMC UOPS_ISSUED_CYCLES_GE_1_UOPS_EXEC, 0xE, 0x1, PMC UOPS_ISSUED_CYCLES_GE_2_UOPS_EXEC, 0xE, 0x1, PMC UOPS_ISSUED_CYCLES_GE_3_UOPS_EXEC, 0xE, 0x1, PMC UOPS_ISSUED_CYCLES_GE_4_UOPS_EXEC, 0xE, 0x1, PMC UOPS_ISSUED_CYCLES_GE_5_UOPS_EXEC, 0xE, 0x1, PMC UOPS_ISSUED_CYCLES_GE_6_UOPS_EXEC, 0xE, 0x1, PMC ARITH_FPU_DIV_ACTIVE, 0x14, 0x1, PMC L2_RQSTS_DEMAND_DATA_RD_MISS, 0x24, 0x21, PMC L2_RQSTS_DEMAND_DATA_RD_HIT, 0x24, 0x41, PMC L2_RQSTS_RFO_MISS, 0x24, 0x22, PMC L2_RQSTS_RFO_HIT, 0x24, 0x42, PMC L2_RQSTS_CODE_RD_MISS, 0x24, 0x24, PMC L2_RQSTS_CODE_RD_HIT, 0x24, 0x44, PMC L2_RQSTS_L2_PF_HIT, 0x24, 0x50, PMC L2_RQSTS_L2_PF_MISS, 0x24, 0x30, PMC L2_RQSTS_ALL_DEMAND_DATA_RD, 0x24, 0xE1, PMC L2_RQSTS_ALL_DEMAND_MISS, 0x24, 0x27, PMC L2_RQSTS_ALL_RFO, 0x24, 0xE2, PMC L2_RQSTS_ALL_CODE_RD, 0x24, 0xE4, PMC L2_RQSTS_ALL_DEMAND_REFERENCES, 0x24, 0xE7, PMC L2_RQSTS_ALL_PF, 0x24, 0xF8, PMC L2_RQSTS_MISS, 0x24, 0x3F, PMC L2_RQSTS_REFERENCES, 0x24, 0xFF, PMC L2_DEMAND_RQST_WB_HIT, 0x27, 0x50, PMC LONGEST_LAT_CACHE_REFERENCE, 0x2E, 0x4F, PMC LONGEST_LAT_CACHE_MISS, 0x2E, 0x41, PMC CPU_CLOCK_UNHALTED_THREAD_P, 0x3C, 0x0, PMC CPU_CLOCK_UNHALTED_REF_XCLK, 0x3C, 0x1, PMC CPU_CLOCK_UNHALTED_ONE_THREAD_ACTIVE, 0x3C, 0x2, PMC L1D_PEND_MISS_PENDING, 0x48, 0x1, PMC2 L1D_PEND_MISS_PENDING_CYCLES, 0x48, 0x1, PMC2 L1D_PEND_MISS_OCCURRENCES, 0x48, 0x1, PMC2 DTLB_STORE_MISSES_CAUSES_A_WALK, 0x49, 0x1, PMC DTLB_STORE_MISSES_STLB_HIT, 0x49, 0x60, PMC DTLB_STORE_MISSES_WALK_COMPLETED, 0x49, 0xE, PMC DTLB_STORE_MISSES_STLB_HIT_4K, 0x49, 0x20, PMC DTLB_STORE_MISSES_WALK_COMPLETED_4K, 0x49, 0x2, PMC DTLB_STORE_MISSES_WALK_DURATION, 0x49, 0x10, PMC LOAD_HIT_PRE_HW_PF, 0x4C, 0x2, PMC EPT_WALK_CYCLES, 0x4F, 0x10, PMC L1D_REPLACEMENT, 0x51, 0x1, PMC L1D_M_EVICT, 0x51, 0x4, PMC TX_MEM_ABORT_CONFLICT, 0x54, 0x1, PMC TX_MEM_ABORT_CAPACITY_WRITE, 0x54, 0x2, PMC TX_MEM_ABORT_HLE_STORE_TO_ELIDED_LOCK, 0x54, 0x4, PMC TX_MEM_ABORT_HLE_ELISION_BUFFER_NOT_EMPTY, 0x54, 0x8, PMC TX_MEM_ABORT_HLE_ELISION_BUFFER_MISMATCH, 0x54, 0x10, PMC TX_MEM_ABORT_HLE_ELISION_BUFFER_UNSUPPORTED_ALIGNMENT, 0x54, 0x20, PMC TX_MEM_HLE_ELISION_BUFFER_FULL, 0x54, 0x40, PMC MOVE_ELIMINATION_INT_NOT_ELIMINATED, 0x58, 0x4, PMC MOVE_ELIMINATION_SIMD_NOT_ELIMINATED, 0x58, 0x8, PMC MOVE_ELIMINATION_INT_ELIMINATED, 0x58, 0x1, PMC MOVE_ELIMINATION_SIMD_ELIMINATED, 0x58, 0x2, PMC CPL_CYCLES_RING0, 0x5C, 0x1, PMC CPL_CYCLES_RING123, 0x5C, 0x2, PMC CPL_CYCLES_RING0_TRANS, 0x5C, 0x1, PMC RS_EVENTS_EMPTY_CYCLES, 0x5E, 0x1, PMC OFFCORE_REQUESTS_OUTSTANDING_DEMAND_DATA_RD, 0x60, 0x1, PMC OFFCORE_REQUESTS_OUTSTANDING_DEMAND_CODE_RD, 0x60, 0x2, PMC OFFCORE_REQUESTS_OUTSTANDING_DEMAND_RFO, 0x60, 0x4, PMC OFFCORE_REQUESTS_OUTSTANDING_ALL_DATA_RD, 0x60, 0x8, PMC LOCK_CYCLES_SPLIT_LOCK_UC_LOCK_DURATION, 0x63, 0x1, PMC LOCK_CYCLES_CACHE_LOCK_DURATION, 0x63, 0x2, PMC IDQ_EMPTY, 0x79, 0x2, PMC IDQ_MITE_UOPS, 0x79, 0x4, PMC IDQ_DSB_UOPS, 0x79, 0x8, PMC IDQ_MS_DSB_UOPS, 0x79, 0x10, PMC IDQ_MS_MITE_UOPS, 0x79, 0x20, PMC IDQ_MS_UOPS, 0x79, 0x30, PMC IDQ_DSB_UOPS, 0x79, 0x18, PMC IDQ_MITE_ALL_UOPS, 0x79, 0x24, PMC IDQ_ALL_UOPS, 0x79, 0x3C, PMC IDQ_MITE_CYCLES, 0x79, 0x4, PMC IDQ_MITE_CYCLES_1_UOPS, 0x79, 0x4, PMC IDQ_MITE_CYCLES_2_UOPS, 0x79, 0x4, PMC IDQ_MITE_CYCLES_3_UOPS, 0x79, 0x4, PMC IDQ_MITE_CYCLES_4_UOPS, 0x79, 0x4, PMC IDQ_DSB_CYCLES, 0x79, 0x8, PMC IDQ_DSB_CYCLES_1_UOPS, 0x79, 0x8, PMC IDQ_DSB_CYCLES_2_UOPS, 0x79, 0x8, PMC IDQ_DSB_CYCLES_3_UOPS, 0x79, 0x8, PMC IDQ_DSB_CYCLES_4_UOPS, 0x79, 0x8, PMC IDQ_MS_DSB_CYCLES, 0x79, 0x10, PMC IDQ_MS_DSB_CYCLES_1_UOPS, 0x79, 0x10, PMC IDQ_MS_DSB_CYCLES_2_UOPS, 0x79, 0x10, PMC IDQ_MS_DSB_CYCLES_3_UOPS, 0x79, 0x10, PMC IDQ_MS_DSB_CYCLES_4_UOPS, 0x79, 0x10, PMC IDQ_MS_DSB_OCCUR, 0x79, 0x10, PMC IDQ_MS_MITE_CYCLES, 0x79, 0x20, PMC IDQ_MS_MITE_CYCLES_1_UOPS, 0x79, 0x20, PMC IDQ_MS_MITE_CYCLES_2_UOPS, 0x79, 0x20, PMC IDQ_MS_MITE_CYCLES_3_UOPS, 0x79, 0x20, PMC IDQ_MS_MITE_CYCLES_4_UOPS, 0x79, 0x20, PMC IDQ_MS_CYCLES, 0x79, 0x30, PMC IDQ_MS_CYCLES_1_UOPS, 0x79, 0x30, PMC IDQ_MS_CYCLES_2_UOPS, 0x79, 0x30, PMC IDQ_MS_CYCLES_3_UOPS, 0x79, 0x30, PMC IDQ_MS_CYCLES_4_UOPS, 0x79, 0x30, PMC IDQ_MS_SWITCHES, 0x79, 0x30, PMC IDQ_ALL_DSB_CYCLES_ANY_UOPS, 0x79, 0x18, PMC IDQ_ALL_DSB_CYCLES_1_UOPS, 0x79, 0x18, PMC IDQ_ALL_DSB_CYCLES_2_UOPS, 0x79, 0x18, PMC IDQ_ALL_DSB_CYCLES_3_UOPS, 0x79, 0x18, PMC IDQ_ALL_DSB_CYCLES_4_UOPS, 0x79, 0x18, PMC IDQ_ALL_MITE_CYCLES_ANY_UOPS, 0x79, 0x24, PMC IDQ_ALL_MITE_CYCLES_1_UOPS, 0x79, 0x24, PMC IDQ_ALL_MITE_CYCLES_2_UOPS, 0x79, 0x24, PMC IDQ_ALL_MITE_CYCLES_3_UOPS, 0x79, 0x24, PMC IDQ_ALL_MITE_CYCLES_4_UOPS, 0x79, 0x24, PMC IDQ_ALL_CYCLES_ANY_UOPS, 0x79, 0x3C, PMC IDQ_ALL_CYCLES_1_UOPS, 0x79, 0x3C, PMC IDQ_ALL_CYCLES_2_UOPS, 0x79, 0x3C, PMC IDQ_ALL_CYCLES_3_UOPS, 0x79, 0x3C, PMC IDQ_ALL_CYCLES_4_UOPS, 0x79, 0x3C, PMC ICACHE_HIT, 0x80, 0x1, PMC ICACHE_MISSES, 0x80, 0x2, PMC ICACHE_ACCESSES, 0x80, 0x3, PMC ITLB_MISSES_CAUSES_A_WALK, 0x85, 0x1, PMC ITLB_MISSES_STLB_HIT, 0x85, 0x60, PMC ITLB_MISSES_WALK_COMPLETED, 0x85, 0xE, PMC ITLB_MISSES_STLB_HIT_4K, 0x85, 0x20, PMC ITLB_MISSES_WALK_COMPLETED_4K, 0x85, 0x2, PMC ITLB_MISSES_WALK_DURATION, 0x85, 0x10, PMC ILD_STALL_LCP, 0x87, 0x1, PMC BR_INST_EXEC_COND_TAKEN, 0x88, 0x81, PMC BR_INST_EXEC_COND_NON_TAKEN, 0x88, 0x41, PMC BR_INST_EXEC_DIRECT_JMP_TAKEN, 0x88, 0x82, PMC BR_INST_EXEC_INDIRECT_JMP_NON_CALL_RET_TAKEN, 0x88, 0x84, PMC BR_INST_EXEC_RETURN_NEAR_TAKEN, 0x88, 0x88, PMC BR_INST_EXEC_DIRECT_NEAR_CALL_TAKEN, 0x88, 0x90, PMC BR_INST_EXEC_INDIRECT_NEAR_CALL_TAKEN, 0x88, 0xA0, PMC BR_INST_EXEC_ALL_CONDITIONAL, 0x88, 0xC1, PMC BR_INST_EXEC_ALL_DIRECT_JMP, 0x88, 0xC2, PMC BR_INST_EXEC_ALL_DIRECT_NEAR_CALL, 0x88, 0xD0, PMC BR_INST_EXEC_ALL_INDIRECT_JUMP_NON_CALL_RET, 0x88, 0xC4, PMC BR_INST_EXEC_ALL_INDIRECT_NEAR_RETURN, 0x88, 0xC8, PMC BR_INST_EXEC_ALL_BRANCHES, 0x88, 0xFF, PMC BR_MISP_EXEC_COND_TAKEN, 0x89, 0x81, PMC BR_MISP_EXEC_COND_NON_TAKEN, 0x89, 0x41, PMC BR_MISP_EXEC_INDIRECT_JMP_NON_CALL_RET_TAKEN, 0x89, 0x84, PMC BR_MISP_EXEC_RETURN_NEAR_TAKEN, 0x89, 0x88, PMC BR_MISP_EXEC_DIRECT_NEAR_CALL_TAKEN, 0x89, 0x90, PMC BR_MISP_EXEC_INDIRECT_NEAR_CALL_TAKEN, 0x89, 0xA0, PMC BR_MISP_EXEC_ALL_CONDITIONAL, 0x89, 0xC1, PMC BR_MISP_EXEC_ALL_INDIRECT_JUMP_NON_CALL_RET, 0x89, 0xC4, PMC BR_MISP_EXEC_ALL_BRANCHES, 0x89, 0xFF, PMC IDQ_UOPS_NOT_DELIVERED_CORE, 0x9C, 0x1, PMC IDQ_UOPS_NOT_DELIVERED_CYCLES_0_UOPS_DELIV_CORE, 0x9C, 0x1, PMC IDQ_UOPS_NOT_DELIVERED_CYCLES_LE_1_UOP_DELIV_CORE, 0x9C, 0x1, PMC IDQ_UOPS_NOT_DELIVERED_CYCLES_LE_2_UOP_DELIV_CORE, 0x9C, 0x1, PMC IDQ_UOPS_NOT_DELIVERED_CYCLES_LE_3_UOP_DELIV_CORE, 0x9C, 0x1, PMC IDQ_UOPS_NOT_DELIVERED_CYCLES_FE_WAS_OK, 0x9C, 0x1, PMC UOP_DISPATCHES_CANCELLED_SIMD_PRF, 0xA0, 0x3, PMC UOPS_EXECUTED_PORT_PORT_0, 0xA1, 0x1, PMC UOPS_EXECUTED_PORT_PORT_1, 0xA1, 0x2, PMC UOPS_EXECUTED_PORT_PORT_2, 0xA1, 0x4, PMC UOPS_EXECUTED_PORT_PORT_3, 0xA1, 0x8, PMC UOPS_EXECUTED_PORT_PORT_4, 0xA1, 0x10, PMC UOPS_EXECUTED_PORT_PORT_5, 0xA1, 0x20, PMC UOPS_EXECUTED_PORT_PORT_6, 0xA1, 0x40, PMC UOPS_EXECUTED_PORT_PORT_7, 0xA1, 0x80, PMC UOPS_EXECUTED_PORT_PORT_0_CORE, 0xA1, 0x1, PMC UOPS_EXECUTED_PORT_PORT_1_CORE, 0xA1, 0x2, PMC UOPS_EXECUTED_PORT_PORT_2_CORE, 0xA1, 0x4, PMC UOPS_EXECUTED_PORT_PORT_3_CORE, 0xA1, 0x8, PMC UOPS_EXECUTED_PORT_PORT_4_CORE, 0xA1, 0x10, PMC UOPS_EXECUTED_PORT_PORT_5_CORE, 0xA1, 0x20, PMC UOPS_EXECUTED_PORT_PORT_6_CORE, 0xA1, 0x40, PMC UOPS_EXECUTED_PORT_PORT_7_CORE, 0xA1, 0x80, PMC RESOURCE_STALLS_ANY, 0xA2, 0x1, PMC RESOURCE_STALLS_RS, 0xA2, 0x4, PMC RESOURCE_STALLS_SB, 0xA2, 0x8, PMC RESOURCE_STALLS_ROB, 0xA2, 0x10, PMC CYCLE_ACTIVITY_CYCLES_L1D_MISS, 0xA3, 0x8, PMC2 CYCLE_ACTIVITY_CYCLES_L2_MISS, 0xA3, 0x1, PMC CYCLE_ACTIVITY_CYCLES_L2_PENDING, 0xA3, 0x1, PMC CYCLE_ACTIVITY_CYCLES_MEM_ANY, 0xA3, 0x2, PMC CYCLE_ACTIVITY_CYCLES_LDM_PENDING, 0xA3, 0x2, PMC CYCLE_ACTIVITY_CYCLES_NO_EXECUTE, 0xA3, 0x4, PMC CYCLE_ACTIVITY_STALLS_L1D_MISS, 0xA3, 0xC, PMC2 CYCLE_ACTIVITY_STALLS_L2_MISS, 0xA3, 0x5, PMC CYCLE_ACTIVITY_STALLS_L2_PENDING, 0xA3, 0x5, PMC CYCLE_ACTIVITY_STALLS_MEM_ANY, 0xA3, 0x6, PMC CYCLE_ACTIVITY_STALLS_LDM_PENDING, 0xA3, 0x6, PMC LSD_UOPS, 0xA8, 0x1, PMC LSD_CYCLES_1_UOPS, 0xA8, 0x1, PMC LSD_CYCLES_2_UOPS, 0xA8, 0x1, PMC LSD_CYCLES_3_UOPS, 0xA8, 0x1, PMC LSD_CYCLES_4_UOPS, 0xA8, 0x1, PMC LSD_CYCLES_ACTIVE, 0xA8, 0x1, PMC LSD_CYCLES_INACTIVE, 0xA8, 0x1, PMC DSB2MITE_SWITCHES_PENALTY_CYCLES, 0xAB, 0x2, PMC ITLB_ITLB_FLUSH, 0xAE, 0x1, PMC OFFCORE_REQUESTS_DEMAND_DATA_RD, 0xB0, 0x1, PMC OFFCORE_REQUESTS_DEMAND_CODE_RD, 0xB0, 0x2, PMC OFFCORE_REQUESTS_DEMAND_RFO, 0xB0, 0x4, PMC OFFCORE_REQUESTS_ALL_DATA_RD, 0xB0, 0x8, PMC UOPS_EXECUTED_THREAD, 0xB1, 0x1, PMC UOPS_EXECUTED_USED_CYCLES, 0xB1, 0x1, PMC UOPS_EXECUTED_STALL_CYCLES, 0xB1, 0x1, PMC UOPS_EXECUTED_TOTAL_CYCLES, 0xB1, 0x1, PMC UOPS_EXECUTED_CYCLES_GE_1_UOPS_EXEC, 0xB1, 0x1, PMC UOPS_EXECUTED_CYCLES_GE_2_UOPS_EXEC, 0xB1, 0x1, PMC UOPS_EXECUTED_CYCLES_GE_3_UOPS_EXEC, 0xB1, 0x1, PMC UOPS_EXECUTED_CYCLES_GE_4_UOPS_EXEC, 0xB1, 0x1, PMC UOPS_EXECUTED_CYCLES_GE_5_UOPS_EXEC, 0xB1, 0x1, PMC UOPS_EXECUTED_CYCLES_GE_6_UOPS_EXEC, 0xB1, 0x1, PMC UOPS_EXECUTED_CYCLES_GE_7_UOPS_EXEC, 0xB1, 0x1, PMC UOPS_EXECUTED_CYCLES_GE_8_UOPS_EXEC, 0xB1, 0x1, PMC UOPS_EXECUTED_CORE, 0xB1, 0x2, PMC UOPS_EXECUTED_CORE_USED_CYCLES, 0xB1, 0x2, PMC UOPS_EXECUTED_CORE_STALL_CYCLES, 0xB1, 0x2, PMC UOPS_EXECUTED_CORE_TOTAL_CYCLES, 0xB1, 0x2, PMC UOPS_EXECUTED_CORE_CYCLES_GE_1_UOPS_EXEC, 0xB1, 0x2, PMC UOPS_EXECUTED_CORE_CYCLES_GE_2_UOPS_EXEC, 0xB1, 0x2, PMC UOPS_EXECUTED_CORE_CYCLES_GE_3_UOPS_EXEC, 0xB1, 0x2, PMC UOPS_EXECUTED_CORE_CYCLES_GE_4_UOPS_EXEC, 0xB1, 0x2, PMC UOPS_EXECUTED_CORE_CYCLES_GE_5_UOPS_EXEC, 0xB1, 0x2, PMC UOPS_EXECUTED_CORE_CYCLES_GE_6_UOPS_EXEC, 0xB1, 0x2, PMC UOPS_EXECUTED_CORE_CYCLES_GE_7_UOPS_EXEC, 0xB1, 0x2, PMC UOPS_EXECUTED_CORE_CYCLES_GE_8_UOPS_EXEC, 0xB1, 0x2, PMC OFFCORE_REQUESTS_BUFFER_SQ_FULL, 0xB2, 0x1, PMC PAGE_WALKER_LOADS_DTLB_L1, 0xBC, 0x11, PMC PAGE_WALKER_LOADS_ITLB_L1, 0xBC, 0x21, PMC PAGE_WALKER_LOADS_DTLB_L2, 0xBC, 0x12, PMC PAGE_WALKER_LOADS_ITLB_L2, 0xBC, 0x22, PMC PAGE_WALKER_LOADS_DTLB_L3, 0xBC, 0x14, PMC PAGE_WALKER_LOADS_ITLB_L3, 0xBC, 0x24, PMC PAGE_WALKER_LOADS_DTLB_MEMORY, 0xBC, 0x18, PMC INST_RETIRED_ANY_P, 0xC0, 0x0, PMC INST_RETIRED_X87, 0xC0, 0x2, PMC INST_RETIRED_PREC_DIST, 0xC0, 0x1, PMC1 OTHER_ASSISTS_AVX_TO_SSE, 0xC1, 0x8, PMC OTHER_ASSISTS_SSE_TO_AVX, 0xC1, 0x10, PMC OTHER_ASSISTS_ANY_WB_ASSIST, 0xC1, 0x40, PMC UOPS_RETIRED_ALL, 0xC2, 0x1, PMC UOPS_RETIRED_CORE_ALL, 0xC2, 0x1, PMC UOPS_RETIRED_RETIRE_SLOTS, 0xC2, 0x2, PMC UOPS_RETIRED_CORE_RETIRE_SLOTS, 0xC2, 0x2, PMC UOPS_RETIRED_USED_CYCLES, 0xC2, 0x1, PMC UOPS_RETIRED_STALL_CYCLES, 0xC2, 0x1, PMC UOPS_RETIRED_TOTAL_CYCLES, 0xC2, 0x1, PMC UOPS_RETIRED_CORE_ALL, 0xC2, 0x1, PMC UOPS_RETIRED_CORE_RETIRE_SLOTS, 0xC2, 0x2, PMC UOPS_RETIRED_CORE_USED_CYCLES, 0xC2, 0x1, PMC UOPS_RETIRED_CORE_STALL_CYCLES, 0xC2, 0x1, PMC UOPS_RETIRED_CORE_TOTAL_CYCLES, 0xC2, 0x1, PMC UOPS_RETIRED_CYCLES_GE_1_UOPS_EXEC, 0xC2, 0x1, PMC UOPS_RETIRED_CYCLES_GE_2_UOPS_EXEC, 0xC2, 0x1, PMC UOPS_RETIRED_CYCLES_GE_3_UOPS_EXEC, 0xC2, 0x1, PMC UOPS_RETIRED_CYCLES_GE_4_UOPS_EXEC, 0xC2, 0x1, PMC UOPS_RETIRED_CYCLES_GE_5_UOPS_EXEC, 0xC2, 0x1, PMC UOPS_RETIRED_CYCLES_GE_6_UOPS_EXEC, 0xC2, 0x1, PMC UOPS_RETIRED_CYCLES_GE_7_UOPS_EXEC, 0xC2, 0x1, PMC UOPS_RETIRED_CYCLES_GE_8_UOPS_EXEC, 0xC2, 0x1, PMC MACHINE_CLEARS_COUNT, 0xC3, 0x1, PMC MACHINE_CLEARS_CYCLES, 0xC3, 0x1, PMC MACHINE_CLEARS_MEMORY_ORDERING, 0xC3, 0x2, PMC MACHINE_CLEARS_SMC, 0xC3, 0x4, PMC MACHINE_CLEARS_MASKMOV, 0xC3, 0x20, PMC BR_INST_RETIRED_ALL_BRANCHES, 0xC4, 0x0, PMC BR_INST_RETIRED_CONDITIONAL, 0xC4, 0x1, PMC BR_INST_RETIRED_NEAR_CALL, 0xC4, 0x2, PMC BR_INST_RETIRED_NEAR_RETURN, 0xC4, 0x8, PMC BR_INST_RETIRED_NOT_TAKEN, 0xC4, 0x10, PMC BR_INST_RETIRED_NEAR_TAKEN, 0xC4, 0x20, PMC BR_INST_RETIRED_FAR_BRANCH, 0xC4, 0x40, PMC BR_MISP_RETIRED_ALL_BRANCHES, 0xC5, 0x0, PMC BR_MISP_RETIRED_CONDITIONAL, 0xC5, 0x1, PMC BR_MISP_RETIRED_NEAR_TAKEN, 0xC5, 0x20, PMC FP_ARITH_INST_RETIRED_SCALAR_DOUBLE, 0xC7, 0x1, PMC FP_ARITH_INST_RETIRED_SCALAR_SINGLE, 0xC7, 0x2, PMC FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE, 0xC7, 0x4, PMC FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE, 0xC7, 0x8, PMC FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE, 0xC7, 0x10, PMC FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE, 0xC7, 0x20, PMC FP_ARITH_INST_RETIRED_SCALAR, 0xC7, 0x3, PMC FP_ARITH_INST_RETIRED_PACKED, 0xC7, 0x3C, PMC FP_ARITH_INST_RETIRED_DOUBLE, 0xC7, 0x15, PMC FP_ARITH_INST_RETIRED_SINGLE, 0xC7, 0x2A, PMC HLE_RETIRED_START, 0xC8, 0x1, PMC HLE_RETIRED_COMMIT, 0xC8, 0x2, PMC HLE_RETIRED_ABORTED, 0xC8, 0x4, PMC HLE_RETIRED_ABORTED_MISC1, 0xC8, 0x8, PMC HLE_RETIRED_ABORTED_MISC2, 0xC8, 0x10, PMC HLE_RETIRED_ABORTED_MISC3, 0xC8, 0x20, PMC HLE_RETIRED_ABORTED_MISC4, 0xC8, 0x40, PMC HLE_RETIRED_ABORTED_MISC5, 0xC8, 0x80, PMC RTM_RETIRED_START, 0xC9, 0x1, PMC RTM_RETIRED_COMMIT, 0xC9, 0x2, PMC RTM_RETIRED_ABORTED, 0xC9, 0x4, PMC RTM_RETIRED_ABORTED_MISC1, 0xC9, 0x8, PMC RTM_RETIRED_ABORTED_MISC2, 0xC9, 0x10, PMC RTM_RETIRED_ABORTED_MISC3, 0xC9, 0x20, PMC RTM_RETIRED_ABORTED_MISC4, 0xC9, 0x40, PMC RTM_RETIRED_ABORTED_MISC5, 0xC9, 0x80, PMC FP_ASSIST_X87_OUTPUT, 0xCA, 0x2, PMC FP_ASSIST_X87_INPUT, 0xCA, 0x4, PMC FP_ASSIST_SIMD_OUTPUT, 0xCA, 0x8, PMC FP_ASSIST_SIMD_INPUT, 0xCA, 0x10, PMC FP_ASSIST_ANY, 0xCA, 0x1E, PMC ROB_MISC_EVENT_LBR_INSERTS, 0xCC, 0x20, PMC MEM_UOPS_RETIRED_LOADS_ALL, 0xD0, 0x81, PMC MEM_UOPS_RETIRED_STORES_ALL, 0xD0, 0x82, PMC MEM_UOPS_RETIRED_LOADS_LOCK, 0xD0, 0x21, PMC MEM_UOPS_RETIRED_LOADS_STLB_MISS, 0xD0, 0x11, PMC MEM_UOPS_RETIRED_STORES_STLB_MISS, 0xD0, 0x12, PMC MEM_UOPS_RETIRED_LOADS_SPLIT, 0xD0, 0x41, PMC MEM_UOPS_RETIRED_STORES_SPLIT, 0xD0, 0x42, PMC MEM_LOAD_UOPS_RETIRED_L1_HIT, 0xD1, 0x1, PMC MEM_LOAD_UOPS_RETIRED_L1_MISS, 0xD1, 0x8, PMC MEM_LOAD_UOPS_RETIRED_L1_ALL, 0xD1, 0x9, PMC MEM_LOAD_UOPS_RETIRED_L2_HIT, 0xD1, 0x2, PMC MEM_LOAD_UOPS_RETIRED_L2_MISS, 0xD1, 0x10, PMC MEM_LOAD_UOPS_RETIRED_L2_ALL, 0xD1, 0x12, PMC MEM_LOAD_UOPS_RETIRED_L3_HIT, 0xD1, 0x4, PMC MEM_LOAD_UOPS_RETIRED_L3_MISS, 0xD1, 0x20, PMC MEM_LOAD_UOPS_RETIRED_L3_ALL, 0xD1, 0x24, PMC MEM_LOAD_UOPS_RETIRED_HIT_LFB, 0xD1, 0x40, PMC MEM_LOAD_UOPS_L3_HIT_RETIRED_XSNP_MISS, 0xD2, 0x1, PMC MEM_LOAD_UOPS_L3_HIT_RETIRED_XSNP_HIT, 0xD2, 0x2, PMC MEM_LOAD_UOPS_L3_HIT_RETIRED_XSNP_HITM, 0xD2, 0x4, PMC MEM_LOAD_UOPS_L3_HIT_RETIRED_XSNP_NONE, 0xD2, 0x8, PMC MEM_LOAD_UOPS_L3_MISS_RETIRED_LOCAL_DRAM, 0xD3, 0x1, PMC MEM_LOAD_UOPS_L3_MISS_RETIRED_REMOTE_DRAM, 0xD3, 0x4, PMC MEM_LOAD_UOPS_L3_MISS_RETIRED_REMOTE_HITM, 0xD3, 0x10, PMC MEM_LOAD_UOPS_L3_MISS_RETIRED_REMOTE_FWD, 0xD3, 0x20, PMC BACLEARS_ANY, 0xE6, 0x1F, PMC L2_TRANS_DEMAND_DATA_RD, 0xF0, 0x1, PMC L2_TRANS_RFO, 0xF0, 0x2, PMC L2_TRANS_CODE_RD, 0xF0, 0x4, PMC L2_TRANS_ALL_PF, 0xF0, 0x8, PMC L2_TRANS_L1D_WB, 0xF0, 0x10, PMC L2_TRANS_L2_FILL, 0xF0, 0x20, PMC L2_TRANS_L2_WB, 0xF0, 0x40, PMC L2_TRANS_ALL_REQUESTS, 0xF0, 0x80, PMC L2_LINES_IN_I, 0xF1, 0x1, PMC L2_LINES_IN_S, 0xF1, 0x2, PMC L2_LINES_IN_E, 0xF1, 0x4, PMC L2_LINES_IN_ALL, 0xF1, 0x7, PMC L2_LINES_OUT_DEMAND_CLEAN, 0xF2, 0x5, PMC L2_LINES_OUT_DEMAND_DIRTY, 0xF2, 0x6, PMC OFFCORE_RESPONSE_0_OPTIONS, 0xB7, 0x1, PMC, MATCH0|MATCH1 OFFCORE_RESPONSE_0_DMND_DATA_RD_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_DMND_RFO_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_DMND_CODE_RD_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_WB_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_PF_L2_DATA_RD_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_PF_L2_RFO_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_PF_L2_CODE_RD_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_PF_L3_DATA_RD_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_PF_L3_RFO_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_PF_L3_CODE_RD_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_SPLIT_LOCK_UC_LOCK_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_STREAMING_STORES_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_0_OTHER_ANY, 0xB7, 0x1, PMC OFFCORE_RESPONSE_1_OPTIONS, 0xBB, 0x1, PMC, MATCH0|MATCH1 OFFCORE_RESPONSE_1_DMND_DATA_RD_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_DMND_RFO_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_DMND_CODE_RD_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_WB_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_PF_L2_DATA_RD_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_PF_L2_RFO_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_PF_L2_CODE_RD_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_PF_L3_DATA_RD_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_PF_L3_RFO_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_PF_L3_CODE_RD_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_SPLIT_LOCK_UC_LOCK_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_STREAMING_STORES_ANY, 0xBB, 0x1, PMC OFFCORE_RESPONSE_1_OTHER_ANY, 0xBB, 0x1, PMC EVENT_MSG_DOORBELL_RCVD, 0x42, 0x8, UBOX PHOLD_CYCLES_ASSERT_TO_ACK, 0x45, 0x1, UBOX RACU_REQUESTS, 0x46, 0x0, UBOX UNCORE_CLOCK, 0x0, 0x0, UBOXFIX CBOX_CLOCKTICKS, 0x0, 0x0, CBOX TXR_INSERTS_AD_CACHE, 0x2, 0x1, CBOX TXR_INSERTS_AK_CACHE, 0x2, 0x2, CBOX TXR_INSERTS_BL_CACHE, 0x2, 0x4, CBOX TXR_INSERTS_IV_CACHE, 0x2, 0x8, CBOX TXR_INSERTS_AD_CORE, 0x2, 0x10, CBOX TXR_INSERTS_AK_CORE, 0x2, 0x20, CBOX TXR_INSERTS_BL_CORE, 0x2, 0x40, CBOX TXR_ADS_USED_AD, 0x4, 0x1, CBOX TXR_ADS_USED_AK, 0x4, 0x2, CBOX TXR_ADS_USED_BL, 0x4, 0x4, CBOX RING_BOUNCES_AD, 0x5, 0x1, CBOX RING_BOUNCES_AK, 0x5, 0x2, CBOX RING_BOUNCES_BL, 0x5, 0x4, CBOX RING_BOUNCES_IV, 0x5, 0x10, CBOX RING_SRC_THRTL, 0x7, 0x0, CBOX FAST_ASSERTED, 0x9, 0x0, CBOX0C0|CBOX0C1|CBOX1C0|CBOX1C1|CBOX2C0|CBOX2C1|CBOX3C0|CBOX3C1|CBOX4C0|CBOX4C1|CBOX5C0|CBOX5C1|CBOX6C0|CBOX6C1|CBOX7C0|CBOX7C1|CBOX8C0|CBOX8C1|CBOX9C0|CBOX9C1|CBOX10C0|CBOX10C1|CBOX11C0|CBOX11C1|CBOX12C0|CBOX12C1|CBOX13C0|CBOX13C1|CBOX14C0|CBOX14C1|CBOX15C0|CBOX15C1|CBOX16C0|CBOX16C1|CBOX17C0|CBOX17C1|CBOX18C0|CBOX18C1|CBOX19C0|CBOX19C1|CBOX20C0|CBOX20C1|CBOX21C0|CBOX21C1|CBOX22C0|CBOX22C1|CBOX23C0|CBOX23C1 BOUNCE_CONTROL, 0xA, 0x0, CBOX RING_AD_USED_UP_EVEN, 0x1B, 0x1, CBOX RING_AD_USED_UP_ODD, 0x1B, 0x2, CBOX RING_AD_USED_UP, 0x1B, 0x3, CBOX RING_AD_USED_DOWN_EVEN, 0x1B, 0x4, CBOX RING_AD_USED_DOWN_ODD, 0x1B, 0x8, CBOX RING_AD_USED_DOWN, 0x1B, 0xC, CBOX RING_AD_USED_ANY, 0x1B, 0xF, CBOX RING_AK_USED_UP_EVEN, 0x1C, 0x1, CBOX RING_AK_USED_UP_ODD, 0x1C, 0x2, CBOX RING_AK_USED_UP, 0x1C, 0x3, CBOX RING_AK_USED_DOWN_EVEN, 0x1C, 0x4, CBOX RING_AK_USED_DOWN_ODD, 0x1C, 0x8, CBOX RING_AK_USED_DOWN, 0x1C, 0xC, CBOX RING_AK_USED_ANY, 0x1C, 0xF, CBOX RING_BL_USED_UP_EVEN, 0x1D, 0x1, CBOX RING_BL_USED_UP_ODD, 0x1D, 0x2, CBOX RING_BL_USED_UP, 0x1D, 0x3, CBOX RING_BL_USED_DOWN_EVEN, 0x1D, 0x4, CBOX RING_BL_USED_DOWN_ODD, 0x1D, 0x8, CBOX RING_BL_USED_DOWN, 0x1D, 0xC, CBOX RING_BL_USED_ANY, 0x1D, 0xF, CBOX RING_IV_USED_UP, 0x1E, 0x3, CBOX RING_IV_USED_DN, 0x1E, 0xC, CBOX RING_IV_USED_ANY, 0x1E, 0xF, CBOX RING_IV_USED_DOWN, 0x1E, 0x33, CBOX COUNTER0_OCCUPANCY, 0x1F, 0x0, CBOX COUNTER0_OCCUPANCY_COUNT, 0x1F, 0x0, CBOX RXR_OCCUPANCY_IRQ, 0x11, 0x1, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 RXR_OCCUPANCY_IRQ_REJ, 0x11, 0x2, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 RXR_OCCUPANCY_IPQ, 0x11, 0x4, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 RXR_OCCUPANCY_PRQ_REJ, 0x11, 0x20, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 RXR_EXT_STARVED_IRQ, 0x12, 0x1, CBOX RXR_EXT_STARVED_IPQ, 0x12, 0x2, CBOX RXR_EXT_STARVED_PRQ, 0x12, 0x4, CBOX RXR_EXT_STARVED_ISMQ_BIDS, 0x12, 0x8, CBOX RXR_INSERTS_IRQ, 0x13, 0x1, CBOX RXR_INSERTS_IRQ_REJ, 0x13, 0x2, CBOX RXR_INSERTS_IPQ, 0x13, 0x4, CBOX RXR_INSERTS_PRQ, 0x13, 0x10, CBOX RXR_INSERTS_PRQ_REJ, 0x13, 0x20, CBOX RXR_IPQ_RETRY_ANY, 0x31, 0x1, CBOX RXR_IPQ_RETRY_FULL, 0x31, 0x2, CBOX RXR_IPQ_RETRY_ADDR_CONFLICT, 0x31, 0x4, CBOX RXR_IPQ_RETRY_QPI_CREDITS, 0x31, 0x10, CBOX RXR_IPQ_RETRY2_AD_SBO, 0x28, 0x1, CBOX RXR_IPQ_RETRY2_TARGET, 0x28, 0x40, CBOX, NID RXR_IRQ_RETRY_ANY, 0x32, 0x1, CBOX RXR_IRQ_RETRY_FULL, 0x32, 0x2, CBOX RXR_IRQ_RETRY_ADDR_CONFLICT, 0x32, 0x4, CBOX RXR_IRQ_RETRY_RTID, 0x32, 0x8, CBOX RXR_IRQ_RETRY_QPI_CREDITS, 0x32, 0x10, CBOX RXR_IRQ_RETRY_IIO_CREDITS, 0x32, 0x20, CBOX RXR_IRQ_RETRY_NID, 0x32, 0x40, CBOX, NID RXR_IRQ_RETRY2_AD_SBO, 0x29, 0x1, CBOX RXR_IRQ_RETRY2_BL_SBO, 0x29, 0x2, CBOX RXR_IRQ_RETRY2_TARGET, 0x29, 0x40, CBOX, NID RXR_ISMQ_RETRY_ANY, 0x33, 0x1, CBOX RXR_ISMQ_RETRY_FULL, 0x33, 0x2, CBOX RXR_ISMQ_RETRY_RTID, 0x33, 0x8, CBOX RXR_ISMQ_RETRY_QPI_CREDITS, 0x33, 0x10, CBOX RXR_ISMQ_RETRY_IIO_CREDITS, 0x33, 0x20, CBOX RXR_ISMQ_RETRY_NID, 0x33, 0x40, CBOX, NID RXR_ISMQ_RETRY_WB_CREDITS, 0x33, 0x80, CBOX, NID RXR_ISMQ_RETRY2_AD_SBO, 0x2A, 0x1, CBOX RXR_ISMQ_RETRY2_BL_SBO, 0x2A, 0x2, CBOX RXR_ISMQ_RETRY2_TARGET, 0x2A, 0x40, CBOX, NID LLC_LOOKUP_DATA_READ, 0x34, 0x3, CBOX, STATE LLC_LOOKUP_WRITE, 0x34, 0x5, CBOX, STATE LLC_LOOKUP_REMOTE_SNOOP, 0x34, 0x9, CBOX, STATE LLC_LOOKUP_ANY, 0x34, 0x11, CBOX, STATE LLC_LOOKUP_READ, 0x34, 0x21, CBOX, STATE LLC_LOOKUP_NID, 0x34, 0x41, CBOX, NID|STATE LLC_VICTIMS_M, 0x37, 0x1, CBOX LLC_VICTIMS_E, 0x37, 0x2, CBOX LLC_VICTIMS_S, 0x37, 0x4, CBOX LLC_VICTIMS_F, 0x37, 0x8, CBOX LLC_VICTIMS_MISS, 0x37, 0x10, CBOX LLC_VICTIMS_NID, 0x37, 0x40, CBOX, NID|STATE TOR_INSERTS_OPCODE, 0x35, 0x1, CBOX, OPCODE TOR_INSERTS_MISS_OPCODE, 0x35, 0x3, CBOX, OPCODE TOR_INSERTS_EVICTION, 0x35, 0x4, CBOX TOR_INSERTS_ALL, 0x35, 0x8, CBOX TOR_INSERTS_WB, 0x35, 0x10, CBOX TOR_INSERTS_LOCAL_OPCODE, 0x35, 0x21, CBOX, OPCODE TOR_INSERTS_MISS_LOCAL_OPCODE, 0x35, 0x23, CBOX, OPCODE TOR_INSERTS_LOCAL, 0x35, 0x28, CBOX TOR_INSERTS_MISS_LOCAL, 0x35, 0x2A, CBOX TOR_INSERTS_NID_OPCODE, 0x35, 0x41, CBOX, OPCODE|NID TOR_INSERTS_NID_MISS_OPCODE, 0x35, 0x43, CBOX, OPCODE|NID TOR_INSERTS_NID_EVICION, 0x35, 0x44, CBOX, NID TOR_INSERTS_NID_ALL, 0x35, 0x48, CBOX, NID TOR_INSERTS_NID_MISS_ALL, 0x35, 0x4A, CBOX, NID TOR_INSERTS_NID_WB, 0x35, 0x50, CBOX, NID TOR_INSERTS_REMOTE_OPCODE, 0x35, 0x81, CBOX, OPCODE TOR_INSERTS_MISS_REMOTE_OPCODE, 0x35, 0x83, CBOX, OPCODE TOR_INSERTS_REMOTE, 0x35, 0x88, CBOX TOR_INSERTS_MISS_REMOTE, 0x35, 0x8A, CBOX TOR_OCCUPANCY_OPCODE, 0x36, 0x1, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE TOR_OCCUPANCY_MISS_OPCODE, 0x36, 0x3, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE TOR_OCCUPANCY_EVICTION, 0x36, 0x4, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 TOR_OCCUPANCY_ALL, 0x36, 0x8, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 TOR_OCCUPANCY_MISS_ALL, 0x36, 0xA, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 TOR_OCCUPANCY_WB, 0x36, 0x10, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 TOR_OCCUPANCY_LOCAL_OPCODE, 0x36, 0x21, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 TOR_OCCUPANCY_MISS_LOCAL_OPCODE, 0x36, 0x23, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 TOR_OCCUPANCY_LOCAL, 0x36, 0x28, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 TOR_OCCUPANCY_MISS_LOCAL, 0x36, 0x2A, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 TOR_OCCUPANCY_NID_OPCODE, 0x36, 0x41, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE|NID TOR_OCCUPANCY_NID_MISS_OPCODE, 0x36, 0x43, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE|NID TOR_OCCUPANCY_NID_EVICTION, 0x36, 0x44, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, NID TOR_OCCUPANCY_NID_ALL, 0x36, 0x48, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, NID TOR_OCCUPANCY_NID_MISS_ALL, 0x36, 0x4A, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, NID TOR_OCCUPANCY_NID_WB, 0x36, 0x50, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, NID TOR_OCCUPANCY_REMOTE_OPCODE, 0x36, 0x81, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE TOR_OCCUPANCY_MISS_REMOTE_OPCODE, 0x36, 0x83, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE TOR_OCCUPANCY_REMOTE, 0x36, 0x88, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 TOR_OCCUPANCY_MISS_REMOTE, 0x36, 0x8A, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0 MISC_RSPI_WAS_FSE, 0x39, 0x1, CBOX MISC_WC_ALIASING, 0x39, 0x2, CBOX MISC_STARTED, 0x39, 0x4, CBOX MISC_RFO_HIT_S, 0x39, 0x8, CBOX MISC_CVZERO_PREFETCH_VICTIM, 0x39, 0x10, CBOX MISC_CVZERO_PREFETCH_MISS, 0x39, 0x20, CBOX SBO_CREDITS_ACQUIRED_AD, 0x3D, 0x1, CBOX SBO_CREDITS_ACQUIRED_BL, 0x3D, 0x2, CBOX SBO_CREDITS_ACQUIRED_ANY, 0x3D, 0x3, CBOX SBO_CREDIT_OCCUPANCY_AD, 0x3E, 0x1, CBOX SBO_CREDIT_OCCUPANCY_BL, 0x3E, 0x2, CBOX SBO_CREDIT_OCCUPANCY_ANY, 0x3E, 0x3, CBOX WBOX_CLOCKTICKS, 0x0, 0x0, WBOX CORE0_TRANSITION_CYCLES, 0x60, 0x0, WBOX CORE1_TRANSITION_CYCLES, 0x61, 0x0, WBOX CORE2_TRANSITION_CYCLES, 0x62, 0x0, WBOX CORE3_TRANSITION_CYCLES, 0x63, 0x0, WBOX CORE4_TRANSITION_CYCLES, 0x64, 0x0, WBOX CORE5_TRANSITION_CYCLES, 0x65, 0x0, WBOX CORE6_TRANSITION_CYCLES, 0x66, 0x0, WBOX CORE7_TRANSITION_CYCLES, 0x67, 0x0, WBOX CORE8_TRANSITION_CYCLES, 0x68, 0x0, WBOX CORE9_TRANSITION_CYCLES, 0x69, 0x0, WBOX CORE10_TRANSITION_CYCLES, 0x6A, 0x0, WBOX CORE11_TRANSITION_CYCLES, 0x6B, 0x0, WBOX CORE12_TRANSITION_CYCLES, 0x6C, 0x0, WBOX CORE13_TRANSITION_CYCLES, 0x6D, 0x0, WBOX CORE14_TRANSITION_CYCLES, 0x6E, 0x0, WBOX CORE15_TRANSITION_CYCLES, 0x6F, 0x0, WBOX CORE16_TRANSITION_CYCLES, 0x70, 0x0, WBOX CORE17_TRANSITION_CYCLES, 0x71, 0x0, WBOX FIVR_PS_PS0_CYCLES, 0x75, 0x0, WBOX FIVR_PS_PS1_CYCLES, 0x75, 0x0, WBOX FIVR_PS_PS2_CYCLES, 0x75, 0x0, WBOX FIVR_PS_PS3_CYCLES, 0x75, 0x0, WBOX DEMOTIONS_CORE0, 0x30, 0x0, WBOX DEMOTIONS_CORE1, 0x31, 0x0, WBOX DEMOTIONS_CORE2, 0x32, 0x0, WBOX DEMOTIONS_CORE3, 0x33, 0x0, WBOX DEMOTIONS_CORE4, 0x34, 0x0, WBOX DEMOTIONS_CORE5, 0x35, 0x0, WBOX DEMOTIONS_CORE6, 0x36, 0x0, WBOX DEMOTIONS_CORE7, 0x37, 0x0, WBOX DEMOTIONS_CORE8, 0x38, 0x0, WBOX DEMOTIONS_CORE9, 0x39, 0x0, WBOX DEMOTIONS_CORE10, 0x3A, 0x0, WBOX DEMOTIONS_CORE11, 0x3B, 0x0, WBOX DEMOTIONS_CORE12, 0x3C, 0x0, WBOX DEMOTIONS_CORE13, 0x3D, 0x0, WBOX DEMOTIONS_CORE14, 0x3E, 0x0, WBOX DEMOTIONS_CORE15, 0x3F, 0x0, WBOX DEMOTIONS_CORE16, 0x40, 0x0, WBOX DEMOTIONS_CORE17, 0x41, 0x0, WBOX FREQ_BAND0_CYCLES, 0xB, 0x0, WBOX, OCCUPANCY_FILTER FREQ_BAND1_CYCLES, 0xC, 0x0, WBOX, OCCUPANCY_FILTER FREQ_BAND2_CYCLES, 0xD, 0x0, WBOX, OCCUPANCY_FILTER FREQ_BAND3_CYCLES, 0xE, 0x0, WBOX, OCCUPANCY_FILTER FREQ_MAX_LIMIT_THERMAL_CYCLES, 0x4, 0x0, WBOX FREQ_MAX_OS_CYCLES, 0x6, 0x0, WBOX FREQ_MAX_POWER_CYCLES, 0x5, 0x0, WBOX FREQ_MIN_IO_P_CYCLES, 0x73, 0x0, WBOX FREQ_TRANS_CYCLES, 0x74, 0x0, WBOX MEMORY_PHASE_SHEDDING_CYCLES, 0x2F, 0x0, WBOX POWER_STATE_OCCUPANCY_CORES_C0, 0x80, 0x40, WBOX POWER_STATE_OCCUPANCY_CORES_C3, 0x80, 0x80, WBOX POWER_STATE_OCCUPANCY_CORES_C6, 0x80, 0xC0, WBOX PROCHOT_EXTERNAL_CYCLES, 0xA, 0x0, WBOX PROCHOT_INTERNAL_CYCLES, 0x9, 0x0, WBOX TOTAL_TRANSITION_CYCLES, 0x72, 0x0, WBOX VR_HOT_CYCLES, 0x42, 0x0, WBOX UFS_BANDWIDTH_MAX_RANGE, 0x7E, 0x0, WBOX UFS_TRANSITIONS_DOWN, 0x7C, 0x0, WBOX UFS_TRANSITIONS_IO_P_LIMIT, 0x7D, 0x0, WBOX UFS_TRANSITIONS_NO_CHANGE, 0x79, 0x0, WBOX UFS_TRANSITIONS_UP_RING, 0x7A, 0x0, WBOX UFS_TRANSITIONS_UP_STALL, 0x7B, 0x0, WBOX CORES_IN_C3, 0x0, 0x0, WBOX0FIX CORES_IN_C6, 0x0, 0x0, WBOX1FIX BBOX_CLOCKTICKS, 0x0, 0x0, BBOX ADDR_OPC_MATCH_ADDR, 0x20, 0x1, BBOX, MATCH0|MATCH1 ADDR_OPC_MATCH_OPC, 0x20, 0x2, BBOX, OPCODE ADDR_OPC_MATCH_FILT, 0x20, 0x3, BBOX, OPCODE|MATCH0|MATCH1 ADDR_OPC_MATCH_AD, 0x20, 0x4, BBOX, OPCODE ADDR_OPC_MATCH_BL, 0x20, 0x8, BBOX, OPCODE ADDR_OPC_MATCH_AK, 0x20, 0x10, BBOX, OPCODE BT_CYCLES_NE, 0x42, 0x0, BBOX BT_OCCUPANCY, 0x43, 0x0, BBOX BYPASS_IMC_TAKEN, 0x14, 0x1, BBOX BYPASS_IMC_NOT_TAKEN, 0x14, 0x2, BBOX CONFLICT_CYCLES, 0xB, 0x0, BBOX0C1|BBOX1C1 DIRECT2CORE_COUNT, 0x11, 0x0, BBOX DIRECT2CORE_CYCLES_DISABLED, 0x12, 0x0, BBOX DIRECT2CORE_TXN_OVERRIDE, 0x13, 0x0, BBOX DIRECTORY_LAT_OPT, 0x41, 0x0, BBOX DIRECTORY_LOOKUP_SNP, 0xC, 0x1, BBOX DIRECTORY_LOOKUP_NO_SNP, 0xC, 0x2, BBOX DIRECTORY_UPDATE_SET, 0xD, 0x1, BBOX DIRECTORY_UPDATE_CLEAR, 0xD, 0x2, BBOX DIRECTORY_UPDATE_ANY, 0xD, 0x3, BBOX HITME_LOOKUP_READ_OR_INVITOE, 0x70, 0x1, BBOX HITME_LOOKUP_WBMTOI, 0x70, 0x2, BBOX HITME_LOOKUP_ACKCNFLTWBI, 0x70, 0x4, BBOX HITME_LOOKUP_WBMTOE_OR_S, 0x70, 0x8, BBOX HITME_LOOKUP_HOM, 0x70, 0xF, BBOX HITME_LOOKUP_RSPFWDI_REMOTE, 0x70, 0x10, BBOX HITME_LOOKUP_RSPFWDI_LOCAL, 0x70, 0x20, BBOX HITME_LOOKUP_INVALS, 0x70, 0x26, BBOX HITME_LOOKUP_RSPFWDS, 0x70, 0x40, BBOX HITME_LOOKUP_ALLOCS, 0x70, 0x70, BBOX HITME_LOOKUP_RSP, 0x70, 0x80, BBOX HITME_LOOKUP_ALL, 0x70, 0xFF, BBOX HITME_HIT_READ_OR_INVITOE, 0x71, 0x1, BBOX HITME_HIT_WBMTOI, 0x71, 0x2, BBOX HITME_HIT_ACKCNFLTWBI, 0x71, 0x4, BBOX HITME_HIT_WBMTOE_OR_S, 0x71, 0x8, BBOX HITME_HIT_HOM, 0x71, 0xF, BBOX HITME_HIT_RSPFWDI_REMOTE, 0x71, 0x10, BBOX HITME_HIT_RSPFWDI_LOCAL, 0x71, 0x20, BBOX HITME_HIT_INVALS, 0x71, 0x26, BBOX HITME_HIT_RSPFWDS, 0x71, 0x40, BBOX HITME_HIT_EVICTS, 0x71, 0x42, BBOX HITME_HIT_ALLOCS, 0x71, 0x70, BBOX HITME_HIT_RSP, 0x71, 0x80, BBOX HITME_HIT_ALL, 0x71, 0xFF, BBOX HITME_HIT_PV_BITS_SET_READ_OR_INVITOE, 0x72, 0x1, BBOX HITME_HIT_PV_BITS_SET_WBMTOI, 0x72, 0x2, BBOX HITME_HIT_PV_BITS_SET_ACKCNFLTWBI, 0x72, 0x4, BBOX HITME_HIT_PV_BITS_SET_WBMTOE_OR_S, 0x72, 0x8, BBOX HITME_HIT_PV_BITS_SET_HOM, 0x72, 0xF, BBOX HITME_HIT_PV_BITS_SET_RSPFWDI_REMOTE, 0x72, 0x10, BBOX HITME_HIT_PV_BITS_SET_RSPFWDI_LOCAL, 0x72, 0x20, BBOX HITME_HIT_PV_BITS_SET_RSPFWDS, 0x72, 0x40, BBOX HITME_HIT_PV_BITS_SET_RSP, 0x72, 0x80, BBOX HITME_HIT_PV_BITS_SET_ALL, 0x72, 0xFF, BBOX IGR_NO_CREDIT_CYCLES_AD_QPI0, 0x22, 0x1, BBOX IGR_NO_CREDIT_CYCLES_AD_QPI1, 0x22, 0x2, BBOX IGR_NO_CREDIT_CYCLES_AD_QPI2, 0x22, 0x10, BBOX IGR_NO_CREDIT_CYCLES_BL_QPI0, 0x22, 0x4, BBOX IGR_NO_CREDIT_CYCLES_BL_QPI1, 0x22, 0x8, BBOX IGR_NO_CREDIT_CYCLES_BL_QPI2, 0x22, 0x20, BBOX IMC_READS_NORMAL, 0x17, 0x1, BBOX IMC_RETRY, 0x1E, 0x0, BBOX IMC_WRITES_FULL, 0x1A, 0x1, BBOX IMC_WRITES_PARTIAL, 0x1A, 0x2, BBOX IMC_WRITES_FULL_ISOCH, 0x1A, 0x4, BBOX IMC_WRITES_PARTIAL_ISOCH, 0x1A, 0x8, BBOX IMC_WRITES_ALL, 0x1A, 0xF, BBOX OSB_READS_LOCAL, 0x53, 0x2, BBOX OSB_INVITOE_LOCAL, 0x53, 0x4, BBOX OSB_REMOTE, 0x53, 0x8, BBOX OSB_CANCELLED, 0x53, 0x10, BBOX OSB_READS_LOCAL_USEFUL, 0x53, 0x20, BBOX OSB_REMOTE_USEFUL, 0x53, 0x40, BBOX OSB_EDR_ALL, 0x54, 0x1, BBOX OSB_EDR_READS_LOCAL_I, 0x54, 0x2, BBOX OSB_EDR_READS_REMOTE_I, 0x54, 0x4, BBOX OSB_EDR_READS_LOCAL_S, 0x54, 0x8, BBOX OSB_EDR_READS_REMOTE_S, 0x54, 0x10, BBOX REQUESTS_READS_LOCAL, 0x1, 0x1, BBOX REQUESTS_READS_REMOTE, 0x1, 0x2, BBOX REQUESTS_READS, 0x1, 0x3, BBOX REQUESTS_WRITES_LOCAL, 0x1, 0x4, BBOX REQUESTS_WRITES_REMOTE, 0x1, 0x8, BBOX REQUESTS_WRITES, 0x1, 0xC, BBOX REQUESTS_INVITOE_LOCAL, 0x1, 0x10, BBOX REQUESTS_INVITOE_REMOTE, 0x1, 0x20, BBOX REQUESTS_ALL_LOCAL, 0x1, 0x15, BBOX REQUESTS_ALL_REMOTE, 0x1, 0x2A, BBOX REQUESTS_ALL, 0x1, 0x3F, BBOX RING_AD_USED_CW_EVEN, 0x3E, 0x1, BBOX RING_AD_USED_CW_ODD, 0x3E, 0x2, BBOX RING_AD_USED_CW, 0x3E, 0x3, BBOX RING_AD_USED_CCW_EVEN, 0x3E, 0x4, BBOX RING_AD_USED_CCW_ODD, 0x3E, 0x8, BBOX RING_AD_USED_CCW, 0x3E, 0xC, BBOX RING_AK_USED_CW_EVEN, 0x3F, 0x1, BBOX RING_AK_USED_CW_ODD, 0x3F, 0x2, BBOX RING_AK_USED_CW, 0x3F, 0x3, BBOX RING_AK_USED_CCW_EVEN, 0x3F, 0x4, BBOX RING_AK_USED_CCW_ODD, 0x3F, 0x8, BBOX RING_AK_USED_CCW, 0x3F, 0xC, BBOX RING_BL_USED_CW_EVEN, 0x40, 0x1, BBOX RING_BL_USED_CW_ODD, 0x40, 0x2, BBOX RING_BL_USED_CW, 0x40, 0x3, BBOX RING_BL_USED_CCW_EVEN, 0x40, 0x4, BBOX RING_BL_USED_CCW_ODD, 0x40, 0x8, BBOX RING_BL_USED_CCW, 0x40, 0xC, BBOX RPQ_CYCLES_NO_REG_CREDITS_CHN0, 0x15, 0x1, BBOX RPQ_CYCLES_NO_REG_CREDITS_CHN1, 0x15, 0x2, BBOX RPQ_CYCLES_NO_REG_CREDITS_CHN2, 0x15, 0x4, BBOX RPQ_CYCLES_NO_REG_CREDITS_CHN3, 0x15, 0x8, BBOX RPQ_CYCLES_NO_REG_CREDITS_ALL, 0x15, 0xF, BBOX WPQ_CYCLES_NO_REG_CREDITS_CHN0, 0x18, 0x1, BBOX WPQ_CYCLES_NO_REG_CREDITS_CHN1, 0x18, 0x2, BBOX WPQ_CYCLES_NO_REG_CREDITS_CHN2, 0x18, 0x4, BBOX WPQ_CYCLES_NO_REG_CREDITS_CHN3, 0x18, 0x8, BBOX WPQ_CYCLES_NO_REG_CREDITS_ALL, 0x18, 0xF, BBOX SBO0_CREDITS_ACQUIRED_AD, 0x68, 0x1, BBOX SBO0_CREDITS_ACQUIRED_BL, 0x68, 0x2, BBOX SBO0_CREDIT_OCCUPANCY_AD, 0x6A, 0x1, BBOX SBO0_CREDIT_OCCUPANCY_BL, 0x6A, 0x2, BBOX SBO1_CREDITS_ACQUIRED_AD, 0x69, 0x1, BBOX SBO1_CREDITS_ACQUIRED_BL, 0x69, 0x2, BBOX SBO1_CREDIT_OCCUPANCY_AD, 0x6B, 0x1, BBOX SBO1_CREDIT_OCCUPANCY_BL, 0x6B, 0x2, BBOX SNOOPS_RSP_AFTER_DATA_LOCAL, 0xA, 0x1, BBOX SNOOPS_RSP_AFTER_DATA_REMOTE, 0xA, 0x2, BBOX SNOOP_CYCLES_NE_LOCAL, 0x8, 0x1, BBOX SNOOP_CYCLES_NE_REMOTE, 0x8, 0x2, BBOX SNOOP_CYCLES_NE_ALL, 0x8, 0x3, BBOX SNOOP_OCCUPANCY_LOCAL, 0x9, 0x1, BBOX SNOOP_OCCUPANCY_REMOTE, 0x9, 0x2, BBOX SNOOP_RESP_RSPI, 0x21, 0x1, BBOX SNOOP_RESP_RSPS, 0x21, 0x2, BBOX SNOOP_RESP_RSPIFWD, 0x21, 0x4, BBOX SNOOP_RESP_RSPSFWD, 0x21, 0x8, BBOX SNOOP_RESP_RSP_WB, 0x21, 0x10, BBOX SNOOP_RESP_RSP_FWD_WB, 0x21, 0x20, BBOX SNOOP_RESP_RSPCNFLCT, 0x21, 0x40, BBOX SNP_RESP_RECV_LOCAL_RSPI, 0x60, 0x1, BBOX SNP_RESP_RECV_LOCAL_RSPS, 0x60, 0x2, BBOX SNP_RESP_RECV_LOCAL_RSPIFWD, 0x60, 0x4, BBOX SNP_RESP_RECV_LOCAL_RSPSFWD, 0x60, 0x8, BBOX SNP_RESP_RECV_LOCAL_RSPXWB, 0x60, 0x10, BBOX SNP_RESP_RECV_LOCAL_RSPXFWDXWB, 0x60, 0x20, BBOX SNP_RESP_RECV_LOCAL_RSPCNFLCT, 0x60, 0x40, BBOX SNP_RESP_RECV_LOCAL_OTHER, 0x60, 0x80, BBOX STALL_NO_SBO_CREDIT_SBO0_AD, 0x6C, 0x1, BBOX STALL_NO_SBO_CREDIT_SBO1_AD, 0x6C, 0x2, BBOX STALL_NO_SBO_CREDIT_SBO0_BL, 0x6C, 0x4, BBOX STALL_NO_SBO_CREDIT_SBO0_BL, 0x6C, 0x8, BBOX TAD_REQUESTS_G0_REGION0, 0x1B, 0x1, BBOX TAD_REQUESTS_G0_REGION1, 0x1B, 0x2, BBOX TAD_REQUESTS_G0_REGION2, 0x1B, 0x4, BBOX TAD_REQUESTS_G0_REGION3, 0x1B, 0x8, BBOX TAD_REQUESTS_G0_REGION4, 0x1B, 0x10, BBOX TAD_REQUESTS_G0_REGION5, 0x1B, 0x20, BBOX TAD_REQUESTS_G0_REGION6, 0x1B, 0x40, BBOX TAD_REQUESTS_G0_REGION7, 0x1B, 0x80, BBOX TAD_REQUESTS_G1_REGION8, 0x1C, 0x1, BBOX TAD_REQUESTS_G1_REGION9, 0x1C, 0x2, BBOX TAD_REQUESTS_G1_REGION10, 0x1C, 0x4, BBOX TAD_REQUESTS_G1_REGION11, 0x1C, 0x8, BBOX TRACKER_CYCLES_FULL_GP, 0x2, 0x1, BBOX TRACKER_CYCLES_FULL_ALL, 0x2, 0x2, BBOX TRACKER_CYCLES_NE_LOCAL, 0x3, 0x1, BBOX TRACKER_CYCLES_NE_REMOTE, 0x3, 0x2, BBOX TRACKER_CYCLES_NE_ALL, 0x3, 0x3, BBOX TRACKER_OCCUPANCY_READS_LOCAL, 0x4, 0x4, BBOX TRACKER_OCCUPANCY_READS_REMOTE, 0x4, 0x8, BBOX TRACKER_OCCUPANCY_WRITES_LOCAL, 0x4, 0x10, BBOX TRACKER_OCCUPANCY_WRITES_REMOTE, 0x4, 0x20, BBOX TRACKER_OCCUPANCY_RW_LOCAL, 0x4, 0x14, BBOX TRACKER_OCCUPANCY_RW_REMOTE, 0x4, 0x28, BBOX TRACKER_OCCUPANCY_INVITOE_LOCAL, 0x4, 0x40, BBOX TRACKER_OCCUPANCY_INVITOE_REMOTE, 0x4, 0x80, BBOX TRACKER_OCCUPANCY_ALL_LOCAL, 0x4, 0x54, BBOX TRACKER_OCCUPANCY_ALL_REMOTE, 0x4, 0xA8, BBOX TRACKER_PENDING_OCCUPANCY_LOCAL, 0x5, 0x1, BBOX TRACKER_PENDING_OCCUPANCY_REMOTE, 0x5, 0x2, BBOX TRACKER_PENDING_OCCUPANCY_ALL, 0x5, 0x3, BBOX TXR_AD_CYCLES_FULL_SCHED0, 0x2A, 0x1, BBOX TXR_AD_CYCLES_FULL_SCHED1, 0x2A, 0x2, BBOX TXR_AD_CYCLES_FULL_ALL, 0x2A, 0x3, BBOX TXR_AK, 0xE, 0x0, BBOX TXR_AK_CYCLES_FULL_SCHED0, 0x32, 0x1, BBOX TXR_AK_CYCLES_FULL_SCHED1, 0x32, 0x2, BBOX TXR_AK_CYCLES_FULL_ALL, 0x32, 0x3, BBOX TXR_BL_DRS_CACHE, 0x10, 0x1, BBOX TXR_BL_DRS_CORE, 0x10, 0x2, BBOX TXR_BL_DRS_QPI, 0x10, 0x4, BBOX TXR_BL_CYCLES_FULL_SCHED0, 0x36, 0x1, BBOX TXR_BL_CYCLES_FULL_SCHED1, 0x36, 0x2, BBOX TXR_BL_CYCLES_FULL_ALL, 0x36, 0x3, BBOX TXR_BL_OCCUPANCY, 0x34, 0x0, BBOX TXR_STARVED_AK, 0x6D, 0x1, BBOX TXR_STARVED_BL, 0x6D, 0x2, BBOX DRAM_CLOCKTICKS, 0x0, 0x0, MBOX ACT_COUNT_RD, 0x1, 0x1, MBOX ACT_COUNT_WR, 0x1, 0x2, MBOX ACT_COUNT_BYP, 0x1, 0x8, MBOX BYP_CMDS_ACT, 0xA1, 0x1, MBOX BYP_CMDS_CAS, 0xA1, 0x2, MBOX BYP_CMDS_PRE, 0xA1, 0x4, MBOX CAS_COUNT_RD_REG, 0x4, 0x1, MBOX CAS_COUNT_RD_UNDERFILL, 0x4, 0x2, MBOX CAS_COUNT_RD, 0x4, 0x3, MBOX CAS_COUNT_RD_WMM, 0x4, 0x10, MBOX CAS_COUNT_RD_RMM, 0x4, 0x20, MBOX CAS_COUNT_WR_WMM, 0x4, 0x4, MBOX CAS_COUNT_WR_RMM, 0x4, 0x8, MBOX CAS_COUNT_WR, 0x4, 0xC, MBOX CAS_COUNT_ALL, 0x4, 0xF, MBOX DRAM_PRE_ALL, 0x6, 0x0, MBOX DRAM_REFRESH_PANIC, 0x5, 0x2, MBOX DRAM_REFRESH_HIGH, 0x5, 0x4, MBOX ECC_CORRECTABLE_ERRORS, 0x9, 0x0, MBOX MAJOR_MODES_READ, 0x7, 0x1, MBOX MAJOR_MODES_WRITE, 0x7, 0x2, MBOX MAJOR_MODES_PARTIAL, 0x7, 0x3, MBOX MAJOR_MODES_ISOCH, 0x7, 0x4, MBOX POWER_CHANNEL_DLLOFF, 0x84, 0x0, MBOX POWER_CHANNEL_PPD, 0x85, 0x0, MBOX POWER_CKE_CYCLES_RANK0, 0x83, 0x1, MBOX POWER_CKE_CYCLES_RANK1, 0x83, 0x2, MBOX POWER_CKE_CYCLES_RANK2, 0x83, 0x4, MBOX POWER_CKE_CYCLES_RANK3, 0x83, 0x8, MBOX POWER_CKE_CYCLES_RANK4, 0x83, 0x10, MBOX POWER_CKE_CYCLES_RANK5, 0x83, 0x20, MBOX POWER_CKE_CYCLES_RANK6, 0x83, 0x40, MBOX POWER_CKE_CYCLES_RANK7, 0x83, 0x80, MBOX POWER_CRITICAL_THROTTLE_CYCLES, 0x86, 0x0, MBOX POWER_PCU_THROTTLING, 0x42, 0x0, MBOX POWER_SELF_REFRESH, 0x43, 0x0, MBOX POWER_THROTTLE_CYCLES_RANK0, 0x41, 0x1, MBOX POWER_THROTTLE_CYCLES_RANK1, 0x41, 0x2, MBOX POWER_THROTTLE_CYCLES_RANK2, 0x41, 0x4, MBOX POWER_THROTTLE_CYCLES_RANK3, 0x41, 0x8, MBOX POWER_THROTTLE_CYCLES_RANK4, 0x41, 0x10, MBOX POWER_THROTTLE_CYCLES_RANK5, 0x41, 0x20, MBOX POWER_THROTTLE_CYCLES_RANK6, 0x41, 0x40, MBOX POWER_THROTTLE_CYCLES_RANK7, 0x41, 0x80, MBOX PREEMPTION_RD_PREEMPT_RD, 0x8, 0x1, MBOX PREEMPTION_RD_PREEMPT_WR, 0x8, 0x2, MBOX PRE_COUNT_PAGE_MISS, 0x2, 0x1, MBOX PRE_COUNT_PAGE_CLOSE, 0x2, 0x2, MBOX PRE_COUNT_RD, 0x2, 0x4, MBOX PRE_COUNT_WR, 0x2, 0x8, MBOX PRE_COUNT_BYP, 0x2, 0x10, MBOX RD_CAS_PRIO_LOW, 0xA0, 0x1, MBOX RD_CAS_PRIO_MED, 0xA0, 0x2, MBOX RD_CAS_PRIO_HIGH, 0xA0, 0x4, MBOX RD_CAS_PRIO_PANIC, 0xA0, 0x8, MBOX RD_CAS_RANK0_BANK0, 0xB0, 0x0, MBOX RD_CAS_RANK0_BANK1, 0xB0, 0x1, MBOX RD_CAS_RANK0_BANK2, 0xB0, 0x2, MBOX RD_CAS_RANK0_BANK3, 0xB0, 0x3, MBOX RD_CAS_RANK0_BANK4, 0xB0, 0x4, MBOX RD_CAS_RANK0_BANK5, 0xB0, 0x5, MBOX RD_CAS_RANK0_BANK6, 0xB0, 0x6, MBOX RD_CAS_RANK0_BANK7, 0xB0, 0x7, MBOX RD_CAS_RANK0_BANK8, 0xB0, 0x8, MBOX RD_CAS_RANK0_BANK9, 0xB0, 0x9, MBOX RD_CAS_RANK0_BANK10, 0xB0, 0xA, MBOX RD_CAS_RANK0_BANK11, 0xB0, 0xB, MBOX RD_CAS_RANK0_BANK12, 0xB0, 0xC, MBOX RD_CAS_RANK0_BANK13, 0xB0, 0xD, MBOX RD_CAS_RANK0_BANK14, 0xB0, 0xE, MBOX RD_CAS_RANK0_BANK15, 0xB0, 0xF, MBOX RD_CAS_RANK0_ALLBANKS, 0xB0, 0x10, MBOX RD_CAS_RANK0_BANKG0, 0xB0, 0x11, MBOX RD_CAS_RANK0_BANKG1, 0xB0, 0x12, MBOX RD_CAS_RANK0_BANKG2, 0xB0, 0x13, MBOX RD_CAS_RANK0_BANKG3, 0xB0, 0x14, MBOX RD_CAS_RANK1_BANK0, 0xB1, 0x0, MBOX RD_CAS_RANK1_BANK1, 0xB1, 0x1, MBOX RD_CAS_RANK1_BANK2, 0xB1, 0x2, MBOX RD_CAS_RANK1_BANK3, 0xB1, 0x3, MBOX RD_CAS_RANK1_BANK4, 0xB1, 0x4, MBOX RD_CAS_RANK1_BANK5, 0xB1, 0x5, MBOX RD_CAS_RANK1_BANK6, 0xB1, 0x6, MBOX RD_CAS_RANK1_BANK7, 0xB1, 0x7, MBOX RD_CAS_RANK1_BANK8, 0xB1, 0x8, MBOX RD_CAS_RANK1_BANK9, 0xB1, 0x9, MBOX RD_CAS_RANK1_BANK10, 0xB1, 0xA, MBOX RD_CAS_RANK1_BANK11, 0xB1, 0xB, MBOX RD_CAS_RANK1_BANK12, 0xB1, 0xC, MBOX RD_CAS_RANK1_BANK13, 0xB1, 0xD, MBOX RD_CAS_RANK1_BANK14, 0xB1, 0xE, MBOX RD_CAS_RANK1_BANK15, 0xB1, 0xF, MBOX RD_CAS_RANK1_ALLBANKS, 0xB1, 0x10, MBOX RD_CAS_RANK1_BANKG0, 0xB1, 0x11, MBOX RD_CAS_RANK1_BANKG1, 0xB1, 0x12, MBOX RD_CAS_RANK1_BANKG2, 0xB1, 0x13, MBOX RD_CAS_RANK1_BANKG3, 0xB1, 0x14, MBOX RD_CAS_RANK2_BANK0, 0xB2, 0x0, MBOX RD_CAS_RANK2_BANK1, 0xB2, 0x1, MBOX RD_CAS_RANK2_BANK2, 0xB2, 0x2, MBOX RD_CAS_RANK2_BANK3, 0xB2, 0x3, MBOX RD_CAS_RANK2_BANK4, 0xB2, 0x4, MBOX RD_CAS_RANK2_BANK5, 0xB2, 0x5, MBOX RD_CAS_RANK2_BANK6, 0xB2, 0x6, MBOX RD_CAS_RANK2_BANK7, 0xB2, 0x7, MBOX RD_CAS_RANK2_BANK8, 0xB2, 0x8, MBOX RD_CAS_RANK2_BANK9, 0xB2, 0x9, MBOX RD_CAS_RANK2_BANK10, 0xB2, 0xA, MBOX RD_CAS_RANK2_BANK11, 0xB2, 0xB, MBOX RD_CAS_RANK2_BANK12, 0xB2, 0xC, MBOX RD_CAS_RANK2_BANK13, 0xB2, 0xD, MBOX RD_CAS_RANK2_BANK14, 0xB2, 0xE, MBOX RD_CAS_RANK2_BANK15, 0xB2, 0xF, MBOX RD_CAS_RANK2_ALLBANKS, 0xB2, 0x10, MBOX RD_CAS_RANK2_BANKG0, 0xB2, 0x11, MBOX RD_CAS_RANK2_BANKG1, 0xB2, 0x12, MBOX RD_CAS_RANK2_BANKG2, 0xB2, 0x13, MBOX RD_CAS_RANK2_BANKG3, 0xB2, 0x14, MBOX RD_CAS_RANK3_BANK0, 0xB3, 0x0, MBOX RD_CAS_RANK3_BANK1, 0xB3, 0x1, MBOX RD_CAS_RANK3_BANK2, 0xB3, 0x2, MBOX RD_CAS_RANK3_BANK3, 0xB3, 0x3, MBOX RD_CAS_RANK3_BANK4, 0xB3, 0x4, MBOX RD_CAS_RANK3_BANK5, 0xB3, 0x5, MBOX RD_CAS_RANK3_BANK6, 0xB3, 0x6, MBOX RD_CAS_RANK3_BANK7, 0xB3, 0x7, MBOX RD_CAS_RANK3_BANK8, 0xB3, 0x8, MBOX RD_CAS_RANK3_BANK9, 0xB3, 0x9, MBOX RD_CAS_RANK3_BANK10, 0xB3, 0xA, MBOX RD_CAS_RANK3_BANK11, 0xB3, 0xB, MBOX RD_CAS_RANK3_BANK12, 0xB3, 0xC, MBOX RD_CAS_RANK3_BANK13, 0xB3, 0xD, MBOX RD_CAS_RANK3_BANK14, 0xB3, 0xE, MBOX RD_CAS_RANK3_BANK15, 0xB3, 0xF, MBOX RD_CAS_RANK3_ALLBANKS, 0xB3, 0x10, MBOX RD_CAS_RANK3_BANKG0, 0xB3, 0x11, MBOX RD_CAS_RANK3_BANKG1, 0xB3, 0x12, MBOX RD_CAS_RANK3_BANKG2, 0xB3, 0x13, MBOX RD_CAS_RANK3_BANKG3, 0xB3, 0x14, MBOX RD_CAS_RANK4_BANK0, 0xB4, 0x0, MBOX RD_CAS_RANK4_BANK1, 0xB4, 0x1, MBOX RD_CAS_RANK4_BANK2, 0xB4, 0x2, MBOX RD_CAS_RANK4_BANK3, 0xB4, 0x3, MBOX RD_CAS_RANK4_BANK4, 0xB4, 0x4, MBOX RD_CAS_RANK4_BANK5, 0xB4, 0x5, MBOX RD_CAS_RANK4_BANK6, 0xB4, 0x6, MBOX RD_CAS_RANK4_BANK7, 0xB4, 0x7, MBOX RD_CAS_RANK4_BANK8, 0xB4, 0x8, MBOX RD_CAS_RANK4_BANK9, 0xB4, 0x9, MBOX RD_CAS_RANK4_BANK10, 0xB4, 0xA, MBOX RD_CAS_RANK4_BANK11, 0xB4, 0xB, MBOX RD_CAS_RANK4_BANK12, 0xB4, 0xC, MBOX RD_CAS_RANK4_BANK13, 0xB4, 0xD, MBOX RD_CAS_RANK4_BANK14, 0xB4, 0xE, MBOX RD_CAS_RANK4_BANK15, 0xB4, 0xF, MBOX RD_CAS_RANK4_ALLBANKS, 0xB4, 0x10, MBOX RD_CAS_RANK4_BANKG0, 0xB4, 0x11, MBOX RD_CAS_RANK4_BANKG1, 0xB4, 0x12, MBOX RD_CAS_RANK4_BANKG2, 0xB4, 0x13, MBOX RD_CAS_RANK4_BANKG3, 0xB4, 0x14, MBOX RD_CAS_RANK5_BANK0, 0xB5, 0x0, MBOX RD_CAS_RANK5_BANK1, 0xB5, 0x1, MBOX RD_CAS_RANK5_BANK2, 0xB5, 0x2, MBOX RD_CAS_RANK5_BANK3, 0xB5, 0x3, MBOX RD_CAS_RANK5_BANK4, 0xB5, 0x4, MBOX RD_CAS_RANK5_BANK5, 0xB5, 0x5, MBOX RD_CAS_RANK5_BANK6, 0xB5, 0x6, MBOX RD_CAS_RANK5_BANK7, 0xB5, 0x7, MBOX RD_CAS_RANK5_BANK8, 0xB5, 0x8, MBOX RD_CAS_RANK5_BANK9, 0xB5, 0x9, MBOX RD_CAS_RANK5_BANK10, 0xB5, 0xA, MBOX RD_CAS_RANK5_BANK11, 0xB5, 0xB, MBOX RD_CAS_RANK5_BANK12, 0xB5, 0xC, MBOX RD_CAS_RANK5_BANK13, 0xB5, 0xD, MBOX RD_CAS_RANK5_BANK14, 0xB5, 0xE, MBOX RD_CAS_RANK5_BANK15, 0xB5, 0xF, MBOX RD_CAS_RANK5_ALLBANKS, 0xB5, 0x10, MBOX RD_CAS_RANK5_BANKG0, 0xB5, 0x11, MBOX RD_CAS_RANK5_BANKG1, 0xB5, 0x12, MBOX RD_CAS_RANK5_BANKG2, 0xB5, 0x13, MBOX RD_CAS_RANK5_BANKG3, 0xB5, 0x14, MBOX RD_CAS_RANK6_BANK0, 0xB6, 0x0, MBOX RD_CAS_RANK6_BANK1, 0xB6, 0x1, MBOX RD_CAS_RANK6_BANK2, 0xB6, 0x2, MBOX RD_CAS_RANK6_BANK3, 0xB6, 0x3, MBOX RD_CAS_RANK6_BANK4, 0xB6, 0x4, MBOX RD_CAS_RANK6_BANK5, 0xB6, 0x5, MBOX RD_CAS_RANK6_BANK6, 0xB6, 0x6, MBOX RD_CAS_RANK6_BANK7, 0xB6, 0x7, MBOX RD_CAS_RANK6_BANK8, 0xB6, 0x8, MBOX RD_CAS_RANK6_BANK9, 0xB6, 0x9, MBOX RD_CAS_RANK6_BANK10, 0xB6, 0xA, MBOX RD_CAS_RANK6_BANK11, 0xB6, 0xB, MBOX RD_CAS_RANK6_BANK12, 0xB6, 0xC, MBOX RD_CAS_RANK6_BANK13, 0xB6, 0xD, MBOX RD_CAS_RANK6_BANK14, 0xB6, 0xE, MBOX RD_CAS_RANK6_BANK15, 0xB6, 0xF, MBOX RD_CAS_RANK6_ALLBANKS, 0xB6, 0x10, MBOX RD_CAS_RANK6_BANKG0, 0xB6, 0x11, MBOX RD_CAS_RANK6_BANKG1, 0xB6, 0x12, MBOX RD_CAS_RANK6_BANKG2, 0xB6, 0x13, MBOX RD_CAS_RANK6_BANKG3, 0xB6, 0x14, MBOX RD_CAS_RANK7_BANK0, 0xB7, 0x0, MBOX RD_CAS_RANK7_BANK1, 0xB7, 0x1, MBOX RD_CAS_RANK7_BANK2, 0xB7, 0x2, MBOX RD_CAS_RANK7_BANK3, 0xB7, 0x3, MBOX RD_CAS_RANK7_BANK4, 0xB7, 0x4, MBOX RD_CAS_RANK7_BANK5, 0xB7, 0x5, MBOX RD_CAS_RANK7_BANK6, 0xB7, 0x6, MBOX RD_CAS_RANK7_BANK7, 0xB7, 0x7, MBOX RD_CAS_RANK7_BANK8, 0xB7, 0x8, MBOX RD_CAS_RANK7_BANK9, 0xB7, 0x9, MBOX RD_CAS_RANK7_BANK10, 0xB7, 0xA, MBOX RD_CAS_RANK7_BANK11, 0xB7, 0xB, MBOX RD_CAS_RANK7_BANK12, 0xB7, 0xC, MBOX RD_CAS_RANK7_BANK13, 0xB7, 0xD, MBOX RD_CAS_RANK7_BANK14, 0xB7, 0xE, MBOX RD_CAS_RANK7_BANK15, 0xB7, 0xF, MBOX RD_CAS_RANK7_ALLBANKS, 0xB7, 0x10, MBOX RD_CAS_RANK7_BANKG0, 0xB7, 0x11, MBOX RD_CAS_RANK7_BANKG1, 0xB7, 0x12, MBOX RD_CAS_RANK7_BANKG2, 0xB7, 0x13, MBOX RD_CAS_RANK7_BANKG3, 0xB7, 0x14, MBOX RPQ_CYCLES_NE, 0x11, 0x0, MBOX RPQ_INSERTS, 0x10, 0x0, MBOX RPQ_CYCLES_FULL, 0x12, 0x0, MBOX VMSE_MXB_WR_OCCUPANCY, 0x91, 0x0, MBOX VMSE_WR_PUSH_WMM, 0x90, 0x1, MBOX VMSE_WR_PUSH_RMM, 0x90, 0x2, MBOX WMM_TO_RMM_LOW_THRESH, 0xC0, 0x1, MBOX WMM_TO_RMM_STARVE, 0xC0, 0x2, MBOX WMM_TO_RMM_VMSE_RETRY, 0xC0, 0x4, MBOX WPQ_INSERTS, 0x20, 0x0, MBOX WPQ_CYCLES_FULL, 0x22, 0x0, MBOX WPQ_CYCLES_NE, 0x21, 0x0, MBOX WPQ_READ_HIT, 0x23, 0x0, MBOX WPQ_WRITE_HIT, 0x24, 0x0, MBOX WRONG_MM, 0xC1, 0x0, MBOX WR_CAS_RANK0_BANK0, 0xB8, 0x0, MBOX WR_CAS_RANK0_BANK1, 0xB8, 0x1, MBOX WR_CAS_RANK0_BANK2, 0xB8, 0x2, MBOX WR_CAS_RANK0_BANK3, 0xB8, 0x3, MBOX WR_CAS_RANK0_BANK4, 0xB8, 0x4, MBOX WR_CAS_RANK0_BANK5, 0xB8, 0x5, MBOX WR_CAS_RANK0_BANK6, 0xB8, 0x6, MBOX WR_CAS_RANK0_BANK7, 0xB8, 0x7, MBOX WR_CAS_RANK0_BANK8, 0xB8, 0x8, MBOX WR_CAS_RANK0_BANK9, 0xB8, 0x9, MBOX WR_CAS_RANK0_BANK10, 0xB8, 0xA, MBOX WR_CAS_RANK0_BANK11, 0xB8, 0xB, MBOX WR_CAS_RANK0_BANK12, 0xB8, 0xC, MBOX WR_CAS_RANK0_BANK13, 0xB8, 0xD, MBOX WR_CAS_RANK0_BANK14, 0xB8, 0xE, MBOX WR_CAS_RANK0_BANK15, 0xB8, 0xF, MBOX WR_CAS_RANK0_ALLBANKS, 0xB8, 0x10, MBOX WR_CAS_RANK0_BANKG0, 0xB8, 0x11, MBOX WR_CAS_RANK0_BANKG1, 0xB8, 0x12, MBOX WR_CAS_RANK0_BANKG2, 0xB8, 0x13, MBOX WR_CAS_RANK0_BANKG3, 0xB8, 0x14, MBOX WR_CAS_RANK1_BANK0, 0xB9, 0x0, MBOX WR_CAS_RANK1_BANK1, 0xB9, 0x1, MBOX WR_CAS_RANK1_BANK2, 0xB9, 0x2, MBOX WR_CAS_RANK1_BANK3, 0xB9, 0x3, MBOX WR_CAS_RANK1_BANK4, 0xB9, 0x4, MBOX WR_CAS_RANK1_BANK5, 0xB9, 0x5, MBOX WR_CAS_RANK1_BANK6, 0xB9, 0x6, MBOX WR_CAS_RANK1_BANK7, 0xB9, 0x7, MBOX WR_CAS_RANK1_BANK8, 0xB9, 0x8, MBOX WR_CAS_RANK1_BANK9, 0xB9, 0x9, MBOX WR_CAS_RANK1_BANK10, 0xB9, 0xA, MBOX WR_CAS_RANK1_BANK11, 0xB9, 0xB, MBOX WR_CAS_RANK1_BANK12, 0xB9, 0xC, MBOX WR_CAS_RANK1_BANK13, 0xB9, 0xD, MBOX WR_CAS_RANK1_BANK14, 0xB9, 0xE, MBOX WR_CAS_RANK1_BANK15, 0xB9, 0xF, MBOX WR_CAS_RANK1_ALLBANKS, 0xB9, 0x10, MBOX WR_CAS_RANK1_BANKG0, 0xB9, 0x11, MBOX WR_CAS_RANK1_BANKG1, 0xB9, 0x12, MBOX WR_CAS_RANK1_BANKG2, 0xB9, 0x13, MBOX WR_CAS_RANK1_BANKG3, 0xB9, 0x14, MBOX WR_CAS_RANK2_BANK0, 0xBA, 0x0, MBOX WR_CAS_RANK2_BANK1, 0xBA, 0x1, MBOX WR_CAS_RANK2_BANK2, 0xBA, 0x2, MBOX WR_CAS_RANK2_BANK3, 0xBA, 0x3, MBOX WR_CAS_RANK2_BANK4, 0xBA, 0x4, MBOX WR_CAS_RANK2_BANK5, 0xBA, 0x5, MBOX WR_CAS_RANK2_BANK6, 0xBA, 0x6, MBOX WR_CAS_RANK2_BANK7, 0xBA, 0x7, MBOX WR_CAS_RANK2_BANK8, 0xBA, 0x8, MBOX WR_CAS_RANK2_BANK9, 0xBA, 0x9, MBOX WR_CAS_RANK2_BANK10, 0xBA, 0xA, MBOX WR_CAS_RANK2_BANK11, 0xBA, 0xB, MBOX WR_CAS_RANK2_BANK12, 0xBA, 0xC, MBOX WR_CAS_RANK2_BANK13, 0xBA, 0xD, MBOX WR_CAS_RANK2_BANK14, 0xBA, 0xE, MBOX WR_CAS_RANK2_BANK15, 0xBA, 0xF, MBOX WR_CAS_RANK2_ALLBANKS, 0xBA, 0x10, MBOX WR_CAS_RANK2_BANKG0, 0xBA, 0x11, MBOX WR_CAS_RANK2_BANKG1, 0xBA, 0x12, MBOX WR_CAS_RANK2_BANKG2, 0xBA, 0x13, MBOX WR_CAS_RANK2_BANKG3, 0xBA, 0x14, MBOX WR_CAS_RANK3_BANK0, 0xBB, 0x0, MBOX WR_CAS_RANK3_BANK1, 0xBB, 0x1, MBOX WR_CAS_RANK3_BANK2, 0xBB, 0x2, MBOX WR_CAS_RANK3_BANK3, 0xBB, 0x3, MBOX WR_CAS_RANK3_BANK4, 0xBB, 0x4, MBOX WR_CAS_RANK3_BANK5, 0xBB, 0x5, MBOX WR_CAS_RANK3_BANK6, 0xBB, 0x6, MBOX WR_CAS_RANK3_BANK7, 0xBB, 0x7, MBOX WR_CAS_RANK3_BANK8, 0xBB, 0x8, MBOX WR_CAS_RANK3_BANK9, 0xBB, 0x9, MBOX WR_CAS_RANK3_BANK10, 0xBB, 0xA, MBOX WR_CAS_RANK3_BANK11, 0xBB, 0xB, MBOX WR_CAS_RANK3_BANK12, 0xBB, 0xC, MBOX WR_CAS_RANK3_BANK13, 0xBB, 0xD, MBOX WR_CAS_RANK3_BANK14, 0xBB, 0xE, MBOX WR_CAS_RANK3_BANK15, 0xBB, 0xF, MBOX WR_CAS_RANK3_ALLBANKS, 0xBB, 0x10, MBOX WR_CAS_RANK3_BANKG0, 0xBB, 0x11, MBOX WR_CAS_RANK3_BANKG1, 0xBB, 0x12, MBOX WR_CAS_RANK3_BANKG2, 0xBB, 0x13, MBOX WR_CAS_RANK3_BANKG3, 0xBB, 0x14, MBOX WR_CAS_RANK4_BANK0, 0xBC, 0x0, MBOX WR_CAS_RANK4_BANK1, 0xBC, 0x1, MBOX WR_CAS_RANK4_BANK2, 0xBC, 0x2, MBOX WR_CAS_RANK4_BANK3, 0xBC, 0x3, MBOX WR_CAS_RANK4_BANK4, 0xBC, 0x4, MBOX WR_CAS_RANK4_BANK5, 0xBC, 0x5, MBOX WR_CAS_RANK4_BANK6, 0xBC, 0x6, MBOX WR_CAS_RANK4_BANK7, 0xBC, 0x7, MBOX WR_CAS_RANK4_BANK8, 0xBC, 0x8, MBOX WR_CAS_RANK4_BANK9, 0xBC, 0x9, MBOX WR_CAS_RANK4_BANK10, 0xBC, 0xA, MBOX WR_CAS_RANK4_BANK11, 0xBC, 0xB, MBOX WR_CAS_RANK4_BANK12, 0xBC, 0xC, MBOX WR_CAS_RANK4_BANK13, 0xBC, 0xD, MBOX WR_CAS_RANK4_BANK14, 0xBC, 0xE, MBOX WR_CAS_RANK4_BANK15, 0xBC, 0xF, MBOX WR_CAS_RANK4_ALLBANKS, 0xBC, 0x10, MBOX WR_CAS_RANK4_BANKG0, 0xBC, 0x11, MBOX WR_CAS_RANK4_BANKG1, 0xBC, 0x12, MBOX WR_CAS_RANK4_BANKG2, 0xBC, 0x13, MBOX WR_CAS_RANK4_BANKG3, 0xBC, 0x14, MBOX WR_CAS_RANK5_BANK0, 0xBD, 0x0, MBOX WR_CAS_RANK5_BANK1, 0xBD, 0x1, MBOX WR_CAS_RANK5_BANK2, 0xBD, 0x2, MBOX WR_CAS_RANK5_BANK3, 0xBD, 0x3, MBOX WR_CAS_RANK5_BANK4, 0xBD, 0x4, MBOX WR_CAS_RANK5_BANK5, 0xBD, 0x5, MBOX WR_CAS_RANK5_BANK6, 0xBD, 0x6, MBOX WR_CAS_RANK5_BANK7, 0xBD, 0x7, MBOX WR_CAS_RANK5_BANK8, 0xBD, 0x8, MBOX WR_CAS_RANK5_BANK9, 0xBD, 0x9, MBOX WR_CAS_RANK5_BANK10, 0xBD, 0xA, MBOX WR_CAS_RANK5_BANK11, 0xBD, 0xB, MBOX WR_CAS_RANK5_BANK12, 0xBD, 0xC, MBOX WR_CAS_RANK5_BANK13, 0xBD, 0xD, MBOX WR_CAS_RANK5_BANK14, 0xBD, 0xE, MBOX WR_CAS_RANK5_BANK15, 0xBD, 0xF, MBOX WR_CAS_RANK5_ALLBANKS, 0xBD, 0x10, MBOX WR_CAS_RANK5_BANKG0, 0xBD, 0x11, MBOX WR_CAS_RANK5_BANKG1, 0xBD, 0x12, MBOX WR_CAS_RANK5_BANKG2, 0xBD, 0x13, MBOX WR_CAS_RANK5_BANKG3, 0xBD, 0x14, MBOX WR_CAS_RANK6_BANK0, 0xBE, 0x0, MBOX WR_CAS_RANK6_BANK1, 0xBE, 0x1, MBOX WR_CAS_RANK6_BANK2, 0xBE, 0x2, MBOX WR_CAS_RANK6_BANK3, 0xBE, 0x3, MBOX WR_CAS_RANK6_BANK4, 0xBE, 0x4, MBOX WR_CAS_RANK6_BANK5, 0xBE, 0x5, MBOX WR_CAS_RANK6_BANK6, 0xBE, 0x6, MBOX WR_CAS_RANK6_BANK7, 0xBE, 0x7, MBOX WR_CAS_RANK6_BANK8, 0xBE, 0x8, MBOX WR_CAS_RANK6_BANK9, 0xBE, 0x9, MBOX WR_CAS_RANK6_BANK10, 0xBE, 0xA, MBOX WR_CAS_RANK6_BANK11, 0xBE, 0xB, MBOX WR_CAS_RANK6_BANK12, 0xBE, 0xC, MBOX WR_CAS_RANK6_BANK13, 0xBE, 0xD, MBOX WR_CAS_RANK6_BANK14, 0xBE, 0xE, MBOX WR_CAS_RANK6_BANK15, 0xBE, 0xF, MBOX WR_CAS_RANK6_ALLBANKS, 0xBE, 0x10, MBOX WR_CAS_RANK6_BANKG0, 0xBE, 0x11, MBOX WR_CAS_RANK6_BANKG1, 0xBE, 0x12, MBOX WR_CAS_RANK6_BANKG2, 0xBE, 0x13, MBOX WR_CAS_RANK6_BANKG3, 0xBE, 0x14, MBOX WR_CAS_RANK7_BANK0, 0xBF, 0x0, MBOX WR_CAS_RANK7_BANK1, 0xBF, 0x1, MBOX WR_CAS_RANK7_BANK2, 0xBF, 0x2, MBOX WR_CAS_RANK7_BANK3, 0xBF, 0x3, MBOX WR_CAS_RANK7_BANK4, 0xBF, 0x4, MBOX WR_CAS_RANK7_BANK5, 0xBF, 0x5, MBOX WR_CAS_RANK7_BANK6, 0xBF, 0x6, MBOX WR_CAS_RANK7_BANK7, 0xBF, 0x7, MBOX WR_CAS_RANK7_BANK8, 0xBF, 0x8, MBOX WR_CAS_RANK7_BANK9, 0xBF, 0x9, MBOX WR_CAS_RANK7_BANK10, 0xBF, 0xA, MBOX WR_CAS_RANK7_BANK11, 0xBF, 0xB, MBOX WR_CAS_RANK7_BANK12, 0xBF, 0xC, MBOX WR_CAS_RANK7_BANK13, 0xBF, 0xD, MBOX WR_CAS_RANK7_BANK14, 0xBF, 0xE, MBOX WR_CAS_RANK7_BANK15, 0xBF, 0xF, MBOX WR_CAS_RANK7_ALLBANKS, 0xBF, 0x10, MBOX WR_CAS_RANK7_BANKG0, 0xBF, 0x11, MBOX WR_CAS_RANK7_BANKG1, 0xBF, 0x12, MBOX WR_CAS_RANK7_BANKG2, 0xBF, 0x13, MBOX WR_CAS_RANK7_BANKG3, 0xBF, 0x14, MBOX PBOX_CLOCKTICKS, 0x1, 0x0, PBOX IIO_CREDIT_PRQ_QPI0, 0x2D, 0x1, PBOX0|PBOX1 IIO_CREDIT_PRQ_QPI1, 0x2D, 0x2, PBOX0|PBOX1 IIO_CREDIT_ISOCH_QPI0, 0x2D, 0x4, PBOX0|PBOX1 IIO_CREDIT_ISOCH_QPI1, 0x2D, 0x8, PBOX0|PBOX1 RING_AD_USED_CW_EVEN, 0x7, 0x1, PBOX RING_AD_USED_CW_ODD, 0x7, 0x2, PBOX RING_AD_USED_CW, 0x7, 0x3, PBOX RING_AD_USED_CCW_EVEN, 0x7, 0x4, PBOX RING_AD_USED_CCW_ODD, 0x7, 0x8, PBOX RING_AD_USED_CCW, 0x7, 0xC, PBOX RING_AK_BOUNCES_UP, 0x12, 0x1, PBOX RING_AK_BOUNCES_DN, 0x12, 0x2, PBOX RING_AK_USED_CW_EVEN, 0x8, 0x1, PBOX RING_AK_USED_CW_ODD, 0x8, 0x2, PBOX RING_AK_USED_CW, 0x8, 0x3, PBOX RING_AK_USED_CCW_EVEN, 0x8, 0x4, PBOX RING_AK_USED_CCW_ODD, 0x8, 0x8, PBOX RING_AK_USED_CCW, 0x8, 0xC, PBOX RING_BL_USED_CW_EVEN, 0x9, 0x1, PBOX RING_BL_USED_CW_ODD, 0x9, 0x2, PBOX RING_BL_USED_CW, 0x9, 0x3, PBOX RING_BL_USED_CCW_EVEN, 0x9, 0x4, PBOX RING_BL_USED_CCW_ODD, 0x9, 0x8, PBOX RING_BL_USED_CCW, 0x9, 0xC, PBOX RING_IV_USED_CW, 0xA, 0x3, PBOX RING_IV_USED_CCW, 0xA, 0xC, PBOX RING_IV_USED_ANY, 0xA, 0xF, PBOX RXR_CYCLES_NE_NCB, 0x10, 0x10, PBOX0|PBOX1 RXR_CYCLES_NE_NCS, 0x10, 0x20, PBOX0|PBOX1 RXR_INSERTS_NCB, 0x11, 0x10, PBOX0|PBOX1 RXR_INSERTS_NCS, 0x11, 0x20, PBOX0|PBOX1 RXR_OCCUPANCY_DRS, 0x13, 0x8, PBOX0 TXR_CYCLES_FULL_AD, 0x25, 0x1, PBOX0 TXR_CYCLES_FULL_AK, 0x25, 0x2, PBOX0 TXR_CYCLES_FULL_BL, 0x25, 0x4, PBOX0 TXR_CYCLES_NE_AD, 0x23, 0x1, PBOX0 TXR_CYCLES_NE_AK, 0x23, 0x2, PBOX0 TXR_CYCLES_NE_BL, 0x23, 0x4, PBOX0 TXR_NACK_CW_DN_AD, 0x26, 0x1, PBOX0|PBOX1 TXR_NACK_CW_DN_BL, 0x26, 0x2, PBOX0|PBOX1 TXR_NACK_CW_DN_AK, 0x26, 0x4, PBOX0|PBOX1 TXR_NACK_CW_UP_AD, 0x26, 0x8, PBOX0|PBOX1 TXR_NACK_CW_UP_BL, 0x26, 0x10, PBOX0|PBOX1 TXR_NACK_CW_UP_AK, 0x26, 0x20, PBOX0|PBOX1 SBO0_CREDITS_ACQUIRED_AD, 0x28, 0x1, PBOX0|PBOX1 SBO0_CREDITS_ACQUIRED_BL, 0x28, 0x2, PBOX0|PBOX1 STALL_NO_SBO_CREDIT_SBO0_AD, 0x2C, 0x1, PBOX0|PBOX1 STALL_NO_SBO_CREDIT_SBO1_AD, 0x2C, 0x2, PBOX0|PBOX1 STALL_NO_SBO_CREDIT_SBO0_BL, 0x2C, 0x4, PBOX0|PBOX1 STALL_NO_SBO_CREDIT_SBO0_BL, 0x2C, 0x8, PBOX0|PBOX1 CACHE_TOTAL_OCCUPANCY_ANY, 0x12, 0x1, IBOX CACHE_TOTAL_OCCUPANCY_SOURCE, 0x12, 0x2, IBOX COHERENT_OPS_PCIRDCUR, 0x13, 0x1, IBOX COHERENT_OPS_CRD, 0x13, 0x2, IBOX COHERENT_OPS_DRD, 0x13, 0x4, IBOX COHERENT_OPS_RFO, 0x13, 0x8, IBOX COHERENT_OPS_PCITOM, 0x13, 0x10, IBOX COHERENT_OPS_PCIDCAHINT, 0x13, 0x20, IBOX COHERENT_OPS_WBMTOI, 0x13, 0x40, IBOX COHERENT_OPS_CLFLUSH, 0x13, 0x80, IBOX MISC0_FAST_REQ, 0x14, 0x1, IBOX MISC0_FAST_REJ, 0x14, 0x2, IBOX MISC0_2ND_RD_INSERT, 0x14, 0x4, IBOX MISC0_2ND_WR_INSERT, 0x14, 0x8, IBOX MISC0_2ND_ATOMIC_INSERT, 0x14, 0x10, IBOX MISC0_FAST_XFER, 0x14, 0x20, IBOX MISC0_PF_ACK_HINT, 0x14, 0x40, IBOX MISC0_PF_TIMEOUT, 0x14, 0x80, IBOX MISC1_SLOW_I, 0x15, 0x1, IBOX MISC1_SLOW_S, 0x15, 0x2, IBOX MISC1_SLOW_E, 0x15, 0x4, IBOX MISC1_SLOW_M, 0x15, 0x8, IBOX MISC1_LOST_FWD, 0x15, 0x10, IBOX MISC1_SEC_RCVD_INVLD, 0x15, 0x20, IBOX MISC1_SEC_RCVD_VLD, 0x15, 0x40, IBOX MISC1_DATA_THROTTLE, 0x15, 0x80, IBOX SNOOP_RESP_MISS, 0x17, 0x1, IBOX SNOOP_RESP_HIT_I, 0x17, 0x2, IBOX SNOOP_RESP_HIT_ES, 0x17, 0x4, IBOX SNOOP_RESP_HIT_M, 0x17, 0x8, IBOX SNOOP_RESP_SNPCODE, 0x17, 0x10, IBOX SNOOP_RESP_SNPDATA, 0x17, 0x20, IBOX SNOOP_RESP_SNPINV, 0x17, 0x40, IBOX TRANSACTIONS_READS, 0x16, 0x1, IBOX TRANSACTIONS_WRITES, 0x16, 0x2, IBOX TRANSACTIONS_RD_PREF, 0x16, 0x4, IBOX TRANSACTIONS_WR_PREF, 0x16, 0x8, IBOX TRANSACTIONS_ALL_READS, 0x16, 0x5, IBOX TRANSACTIONS_ALL_WRITES, 0x16, 0xA, IBOX TRANSACTIONS_ATOMIC, 0x16, 0x10, IBOX TRANSACTIONS_OTHER, 0x16, 0x20, IBOX TRANSACTIONS_ORDERINGQ, 0x16, 0x40, IBOX RXR_AK_INSERTS, 0xA, 0x0, IBOX RXR_BL_DRS_CYCLES_FULL, 0x4, 0x0, IBOX RXR_BL_DRS_INSERTS, 0x1, 0x0, IBOX RXR_BL_DRS_OCCUPANCY, 0x7, 0x0, IBOX RXR_BL_NCB_CYCLES_FULL, 0x5, 0x0, IBOX RXR_BL_NCB_INSERTS, 0x2, 0x0, IBOX RXR_BL_NCB_OCCUPANCY, 0x8, 0x0, IBOX RXR_BL_NCS_CYCLES_FULL, 0x6, 0x0, IBOX RXR_BL_NCS_INSERTS, 0x3, 0x0, IBOX RXR_BL_NCS_OCCUPANCY, 0x9, 0x0, IBOX TXR_AD_STALL_CREDIT_CYCLES, 0x18, 0x0, IBOX TXR_BL_STALL_CREDIT_CYCLES, 0x19, 0x0, IBOX TXR_DATA_INSERTS_NCB, 0xE, 0x0, IBOX TXR_DATA_INSERTS_NCS, 0xF, 0x0, IBOX TXR_REQUEST_OCCUPANCY, 0xD, 0x0, IBOX RBOX_CLOCKTICK, 0x1, 0x0, RBOX C_HI_AD_CREDITS_EMPTY_CBO8, 0x1F, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_HI_AD_CREDITS_EMPTY_CBO9, 0x1F, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_HI_AD_CREDITS_EMPTY_CBO10, 0x1F, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_HI_AD_CREDITS_EMPTY_CBO11, 0x1F, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_HI_AD_CREDITS_EMPTY_CBO12, 0x1F, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_HI_AD_CREDITS_EMPTY_CBO13, 0x1F, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_HI_AD_CREDITS_EMPTY_CBO14_16, 0x1F, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_HI_AD_CREDITS_EMPTY_CBO15_17, 0x1F, 0x80, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_LO_AD_CREDITS_EMPTY_CBO0, 0x22, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_LO_AD_CREDITS_EMPTY_CBO1, 0x22, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_LO_AD_CREDITS_EMPTY_CBO2, 0x22, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_LO_AD_CREDITS_EMPTY_CBO3, 0x22, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_LO_AD_CREDITS_EMPTY_CBO4, 0x22, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_LO_AD_CREDITS_EMPTY_CBO5, 0x22, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_LO_AD_CREDITS_EMPTY_CBO6, 0x22, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 C_LO_AD_CREDITS_EMPTY_CBO7, 0x22, 0x80, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 HA_R2_BL_CREDITS_EMPTY_HA0, 0x2D, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 HA_R2_BL_CREDITS_EMPTY_HA1, 0x2D, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 HA_R2_BL_CREDITS_EMPTY_R2_NCB, 0x2D, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 HA_R2_BL_CREDITS_EMPTY_R2_NCS, 0x2D, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI0_AD_CREDITS_EMPTY_VNA, 0x20, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI0_AD_CREDITS_EMPTY_VN0_HOM, 0x20, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI0_AD_CREDITS_EMPTY_VN0_SNP, 0x20, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI0_AD_CREDITS_EMPTY_VN0_NDR, 0x20, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI0_AD_CREDITS_EMPTY_VN1_HOM, 0x20, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI0_AD_CREDITS_EMPTY_VN1_SNP, 0x20, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI0_AD_CREDITS_EMPTY_VN1_NDR, 0x20, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI1_AD_CREDITS_EMPTY_VNA, 0x2E, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI1_AD_CREDITS_EMPTY_VN1_HOM, 0x2E, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI1_AD_CREDITS_EMPTY_VN1_SNP, 0x2E, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI1_AD_CREDITS_EMPTY_VN1_NDR, 0x2E, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI0_BL_CREDITS_EMPTY_VNA, 0x21, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI0_BL_CREDITS_EMPTY_VN1_HOM, 0x21, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI0_BL_CREDITS_EMPTY_VN1_SNP, 0x21, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI0_BL_CREDITS_EMPTY_VN1_NDR, 0x21, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI1_BL_CREDITS_EMPTY_VNA, 0x2F, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI1_BL_CREDITS_EMPTY_VN0_HOM, 0x2F, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI1_BL_CREDITS_EMPTY_VN0_SNP, 0x2F, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI1_BL_CREDITS_EMPTY_VN0_NDR, 0x2F, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI1_BL_CREDITS_EMPTY_VN1_HOM, 0x2F, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI1_BL_CREDITS_EMPTY_VN1_SNP, 0x2F, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 QPI1_BL_CREDITS_EMPTY_VN1_NDR, 0x2F, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RING_AD_USED_CW_EVEN, 0x7, 0x1, RBOX RING_AD_USED_CW_ODD, 0x7, 0x2, RBOX RING_AD_USED_CW, 0x7, 0x3, RBOX RING_AD_USED_CCW_EVEN, 0x7, 0x4, RBOX RING_AD_USED_CCW_ODD, 0x7, 0x8, RBOX RING_AD_USED_CCW, 0x7, 0xC, RBOX RING_AK_USED_CW_EVEN, 0x8, 0x1, RBOX RING_AK_USED_CW_ODD, 0x8, 0x2, RBOX RING_AK_USED_CW, 0x8, 0x3, RBOX RING_AK_USED_CCW_EVEN, 0x8, 0x4, RBOX RING_AK_USED_CCW_ODD, 0x8, 0x8, RBOX RING_AK_USED_CCW, 0x8, 0xC, RBOX RING_BL_USED_CW_EVEN, 0x9, 0x1, RBOX RING_BL_USED_CW_ODD, 0x9, 0x2, RBOX RING_BL_USED_CW, 0x9, 0x3, RBOX RING_BL_USED_CCW_EVEN, 0x9, 0x4, RBOX RING_BL_USED_CCW_ODD, 0x9, 0x8, RBOX RING_BL_USED_CCW, 0x9, 0xC, RBOX RING_IV_USED_CW, 0xA, 0x3, RBOX RING_IV_USED_CCW, 0xA, 0xC, RBOX RING_IV_USED_ANY, 0xA, 0xF, RBOX RING_SINK_STARVED_AK, 0xE, 0x2, RBOX RXR_CYCLES_NE_HOM, 0x10, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_CYCLES_NE_SNP, 0x10, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_CYCLES_NE_NDR, 0x10, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_CYCLES_NE_VN1_HOM, 0x14, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_CYCLES_NE_VN1_SNP, 0x14, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_CYCLES_NE_VN1_NDR, 0x14, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_CYCLES_NE_VN1_DRS, 0x14, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_CYCLES_NE_VN1_NCB, 0x14, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_CYCLES_NE_VN1_NCS, 0x14, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_HOM, 0x11, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_SNP, 0x11, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_NDR, 0x11, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_DRS, 0x11, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_NCB, 0x11, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_NCS, 0x11, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_VN1_HOM, 0x15, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_VN1_SNP, 0x15, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_VN1_NDR, 0x15, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_VN1_DRS, 0x15, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_VN1_NCB, 0x15, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_INSERTS_VN1_NCS, 0x15, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 RXR_OCCUPANCY_VN1_HOM, 0x13, 0x1, RBOX0C0|RBOX1C0 RXR_OCCUPANCY_VN1_SNP, 0x13, 0x2, RBOX0C0|RBOX1C0 RXR_OCCUPANCY_VN1_NDR, 0x13, 0x4, RBOX0C0|RBOX1C0 RXR_OCCUPANCY_VN1_DRS, 0x13, 0x8, RBOX0C0|RBOX1C0 RXR_OCCUPANCY_VN1_NCB, 0x13, 0x10, RBOX0C0|RBOX1C0 RXR_OCCUPANCY_VN1_NCS, 0x13, 0x20, RBOX0C0|RBOX1C0 TXR_CYCLES_FULL, 0x25, 0x0, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 TXR_CYCLES_NE, 0x23, 0x0, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 TXR_NACK_DN_AD, 0x26, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 TXR_NACK_DN_BL, 0x26, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 TXR_NACK_DN_AK, 0x26, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 TXR_NACK_UP_AD, 0x26, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 TXR_NACK_UP_BL, 0x26, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 TXR_NACK_UP_AK, 0x26, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 SBO0_CREDITS_ACQUIRED_AD, 0x28, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 SBO0_CREDITS_ACQUIRED_BL, 0x28, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 SBO1_CREDITS_ACQUIRED_AD, 0x29, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 SBO1_CREDITS_ACQUIRED_BL, 0x29, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 STALL_NO_SBO_CREDIT_SBO0_AD, 0x2C, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 STALL_NO_SBO_CREDIT_SBO1_AD, 0x2C, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 STALL_NO_SBO_CREDIT_SBO0_BL, 0x2C, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 STALL_NO_SBO_CREDIT_SBO1_BL, 0x2C, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_REJECT_HOM, 0x37, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_REJECT_SNP, 0x37, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_REJECT_NDR, 0x37, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_REJECT_DRS, 0x37, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_REJECT_NCB, 0x37, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_REJECT_NCS, 0x37, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_REJECT_HOM, 0x39, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_REJECT_SNP, 0x39, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_REJECT_NDR, 0x39, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_REJECT_DRS, 0x39, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_REJECT_NCB, 0x39, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_REJECT_NCS, 0x39, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VNA_CREDITS_REJECT_HOM, 0x34, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VNA_CREDITS_REJECT_SNP, 0x34, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VNA_CREDITS_REJECT_NDR, 0x34, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VNA_CREDITS_REJECT_DRS, 0x34, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VNA_CREDITS_REJECT_NCB, 0x34, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VNA_CREDITS_REJECT_NCS, 0x34, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_USED_HOM, 0x36, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_USED_SNP, 0x36, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_USED_NDR, 0x36, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_USED_DRS, 0x36, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_USED_NCB, 0x36, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN0_CREDITS_USED_NCS, 0x36, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_USED_HOM, 0x38, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_USED_SNP, 0x38, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_USED_NDR, 0x38, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_USED_DRS, 0x38, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_USED_NCB, 0x38, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 VN1_CREDITS_USED_NCS, 0x38, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1 BOUNCE_CONTROL, 0xA, 0x0, SBOX SBOX_CLOCKTICKS, 0x0, 0x0, SBOX FAST_ASSERTED, 0x9, 0x0, SBOX RING_AD_USED_ANY, 0x1B, 0xF, SBOX RING_AD_USED_UP_EVEN, 0x1B, 0x1, SBOX RING_AD_USED_UP_ODD, 0x1B, 0x2, SBOX RING_AD_USED_UP, 0x1B, 0x3, SBOX RING_AD_USED_DOWN_EVEN, 0x1B, 0x4, SBOX RING_AD_USED_DOWN_ODD, 0x1B, 0x8, SBOX RING_AD_USED_DOWN, 0x1B, 0xC, SBOX RING_AK_USED_ANY, 0x1C, 0xF, SBOX RING_AK_USED_UP_EVEN, 0x1C, 0x1, SBOX RING_AK_USED_UP_ODD, 0x1C, 0x2, SBOX RING_AK_USED_UP, 0x1C, 0x3, SBOX RING_AK_USED_DOWN_EVEN, 0x1C, 0x4, SBOX RING_AK_USED_DOWN_ODD, 0x1C, 0x8, SBOX RING_AK_USED_DOWN, 0x1C, 0xC, SBOX RING_BL_USED_ANY, 0x1D, 0xF, SBOX RING_BL_USED_UP_EVEN, 0x1D, 0x1, SBOX RING_BL_USED_UP_ODD, 0x1D, 0x2, SBOX RING_BL_USED_UP, 0x1D, 0x3, SBOX RING_BL_USED_DOWN_EVEN, 0x1D, 0x4, SBOX RING_BL_USED_DOWN_ODD, 0x1D, 0x8, SBOX RING_BL_USED_DOWN, 0x1D, 0xC, SBOX RING_BOUNCES_AD_CACHE, 0x5, 0x1, SBOX RING_BOUNCES_AK_CORE, 0x5, 0x2, SBOX RING_BOUNCES_BL_CORE, 0x5, 0x4, SBOX RING_BOUNCES_IV_CORE, 0x5, 0x8, SBOX RING_IV_USED_ANY, 0x1E, 0xF, SBOX RING_IV_USED_UP, 0x1E, 0x3, SBOX RING_IV_USED_DOWN, 0x1E, 0xC, SBOX RXR_BYPASS_AD_CRD, 0x12, 0x1, SBOX RXR_BYPASS_AD_BNC, 0x12, 0x2, SBOX RXR_BYPASS_BL_CRD, 0x12, 0x4, SBOX RXR_BYPASS_BL_BNC, 0x12, 0x8, SBOX RXR_BYPASS_AK, 0x12, 0x10, SBOX RXR_BYPASS_IV, 0x12, 0x20, SBOX RXR_INSERTS_AD_CRD, 0x12, 0x1, SBOX RXR_INSERTS_AD_BNC, 0x12, 0x2, SBOX RXR_INSERTS_BL_CRD, 0x12, 0x4, SBOX RXR_INSERTS_BL_BNC, 0x12, 0x8, SBOX RXR_INSERTS_AK, 0x12, 0x10, SBOX RXR_INSERTS_IV, 0x12, 0x20, SBOX RXR_OCCUPANCY_AD_CRD, 0x11, 0x1, SBOX RXR_OCCUPANCY_AD_BNC, 0x11, 0x2, SBOX RXR_OCCUPANCY_BL_CRD, 0x11, 0x4, SBOX RXR_OCCUPANCY_BL_BNC, 0x11, 0x8, SBOX RXR_OCCUPANCY_AK, 0x11, 0x10, SBOX RXR_OCCUPANCY_IV, 0x11, 0x20, SBOX TXR_ADS_USED_AD, 0x4, 0x1, SBOX TXR_ADS_USED_AK, 0x4, 0x2, SBOX TXR_ADS_USED_BL, 0x4, 0x4, SBOX TXR_INSERTS_AD_CRD, 0x2, 0x1, SBOX TXR_INSERTS_AD_BNC, 0x2, 0x2, SBOX TXR_INSERTS_BL_CRD, 0x2, 0x4, SBOX TXR_INSERTS_BL_BNC, 0x2, 0x8, SBOX TXR_INSERTS_AK, 0x2, 0x10, SBOX TXR_INSERTS_IV, 0x2, 0x20, SBOX TXR_OCCUPANCY_AD_CRD, 0x1, 0x1, SBOX TXR_OCCUPANCY_AD_BNC, 0x1, 0x2, SBOX TXR_OCCUPANCY_BL_CRD, 0x1, 0x4, SBOX TXR_OCCUPANCY_BL_BNC, 0x1, 0x8, SBOX TXR_OCCUPANCY_AK, 0x1, 0x10, SBOX TXR_OCCUPANCY_IV, 0x1, 0x20, SBOX TXR_ORDERING_IV_SNOOPGO_UP, 0x7, 0x1, SBOX TXR_ORDERING_IV_SNOOPGO_DN, 0x7, 0x2, SBOX TXR_ORDERING_AK_U2C_UP_EVEN, 0x7, 0x4, SBOX TXR_ORDERING_AK_U2C_UP_ODD, 0x7, 0x8, SBOX TXR_ORDERING_AK_U2C_DN_EVEN, 0x7, 0x10, SBOX TXR_ORDERING_AK_U2C_DN_ODD, 0x7, 0x20, SBOX QBOX_CLOCKTICKS, 0x14, 0x0, QBOX CTO_COUNT, 0x38, 0x0, QBOX, MATCH0|MATCH1|MATCH2|MATCH3|MASK0|MASK1|MASK2|MASK3 DIRECT2CORE_SUCCESS_RBT_HIT, 0x13, 0x1, QBOX DIRECT2CORE_FAILURE_CREDITS, 0x13, 0x2, QBOX DIRECT2CORE_FAILURE_RBT_HIT, 0x13, 0x4, QBOX DIRECT2CORE_FAILURE_CREDITS_RBT, 0x13, 0x8, QBOX DIRECT2CORE_FAILURE_MISS, 0x13, 0x10, QBOX DIRECT2CORE_FAILURE_CREDITS_MISS, 0x13, 0x20, QBOX DIRECT2CORE_FAILURE_RBT_MISS, 0x13, 0x40, QBOX DIRECT2CORE_FAILURE_CREDITS_RBT_MISS, 0x13, 0x80, QBOX L1_POWER_CYCLES, 0x12, 0x0, QBOX RXL0P_POWER_CYCLES, 0x10, 0x0, QBOX RXL0_POWER_CYCLES, 0xF, 0x0, QBOX RXL_BYPASSED, 0x9, 0x0, QBOX RXL_CREDITS_CONSUMED_VN0_DRS, 0x1E, 0x1, QBOX RXL_CREDITS_CONSUMED_VN0_NCB, 0x1E, 0x2, QBOX RXL_CREDITS_CONSUMED_VN0_NCS, 0x1E, 0x4, QBOX RXL_CREDITS_CONSUMED_VN0_HOM, 0x1E, 0x8, QBOX RXL_CREDITS_CONSUMED_VN0_SNP, 0x1E, 0x10, QBOX RXL_CREDITS_CONSUMED_VN0_NDR, 0x1E, 0x20, QBOX RXL_CREDITS_CONSUMED_VN1_DRS, 0x39, 0x1, QBOX RXL_CREDITS_CONSUMED_VN1_NCB, 0x39, 0x2, QBOX RXL_CREDITS_CONSUMED_VN1_NCS, 0x39, 0x4, QBOX RXL_CREDITS_CONSUMED_VN1_HOM, 0x39, 0x8, QBOX RXL_CREDITS_CONSUMED_VN1_SNP, 0x39, 0x10, QBOX RXL_CREDITS_CONSUMED_VN1_NDR, 0x39, 0x20, QBOX RXL_CREDITS_CONSUMED_VNA, 0x1D, 0x0, QBOX RXL_CYCLES_NE, 0xA, 0x0, QBOX RXL_FLITS_G0_IDLE, 0x1, 0x1, QBOX RXL_FLITS_G0_DATA, 0x1, 0x2, QBOX RXL_FLITS_G0_NON_DATA, 0x1, 0x4, QBOX RXL_FLITS_G1_SNP, 0x2, 0x1, QBOX RXL_FLITS_G1_HOM_REQ, 0x2, 0x2, QBOX RXL_FLITS_G1_HOM_NONREQ, 0x2, 0x4, QBOX RXL_FLITS_G1_HOM, 0x2, 0x6, QBOX RXL_FLITS_G1_DRS_DATA, 0x2, 0x8, QBOX RXL_FLITS_G1_DRS_NONDATA, 0x2, 0x10, QBOX RXL_FLITS_G1_DRS, 0x2, 0x18, QBOX RXL_FLITS_G2_NDR_AD, 0x3, 0x1, QBOX RXL_FLITS_G2_NDR_AK, 0x3, 0x2, QBOX RXL_FLITS_G2_NCB_DATA, 0x3, 0x4, QBOX RXL_FLITS_G2_NCB_NONDATA, 0x3, 0x8, QBOX RXL_FLITS_G2_NCB, 0x3, 0xC, QBOX RXL_FLITS_G2_NCS, 0x3, 0x10, QBOX RXL_INSERTS, 0x8, 0x0, QBOX RXL_INSERTS_DRS_VN0, 0x9, 0x1, QBOX RXL_INSERTS_DRS_VN1, 0x9, 0x2, QBOX RXL_INSERTS_HOM_VN0, 0xC, 0x1, QBOX RXL_INSERTS_HOM_VN1, 0xC, 0x2, QBOX RXL_INSERTS_NCB_VN0, 0xA, 0x1, QBOX RXL_INSERTS_NCB_VN1, 0xA, 0x2, QBOX RXL_INSERTS_NCS_VN0, 0xB, 0x1, QBOX RXL_INSERTS_NCS_VN1, 0xB, 0x2, QBOX RXL_INSERTS_NDR_VN0, 0xE, 0x1, QBOX RXL_INSERTS_NDR_VN1, 0xE, 0x2, QBOX RXL_INSERTS_SNP_VN0, 0xD, 0x1, QBOX RXL_INSERTS_SNP_VN1, 0xD, 0x2, QBOX RXL_OCCUPANCY, 0xB, 0x0, QBOX RXL_OCCUPANCY_DRS_VN0, 0x15, 0x1, QBOX RXL_OCCUPANCY_DRS_VN1, 0x15, 0x2, QBOX RXL_OCCUPANCY_HOM_VN0, 0x18, 0x1, QBOX RXL_OCCUPANCY_HOM_VN1, 0x18, 0x2, QBOX RXL_OCCUPANCY_NCB_VN0, 0x16, 0x1, QBOX RXL_OCCUPANCY_NCB_VN1, 0x16, 0x2, QBOX RXL_OCCUPANCY_NCS_VN0, 0x17, 0x1, QBOX RXL_OCCUPANCY_NCS_VN1, 0x17, 0x2, QBOX RXL_OCCUPANCY_NDR_VN0, 0x1A, 0x1, QBOX RXL_OCCUPANCY_NDR_VN1, 0x1A, 0x2, QBOX RXL_OCCUPANCY_SNP_VN0, 0x19, 0x1, QBOX RXL_OCCUPANCY_SNP_VN1, 0x19, 0x2, QBOX TXL0P_POWER_CYCLES, 0xD, 0x0, QBOX TXL0_POWER_CYCLES, 0xC, 0x0, QBOX TXL_BYPASSED, 0x5, 0x0, QBOX TXL_CYCLES_NE, 0x6, 0x0, QBOX TXL_FLITS_G0_IDLE, 0x0, 0x1, QBOX TXL_FLITS_G0_DATA, 0x0, 0x2, QBOX TXL_FLITS_G0_NON_DATA, 0x0, 0x4, QBOX TXL_FLITS_G1_SNP, 0x0, 0x1, QBOX TXL_FLITS_G1_HOM_REQ, 0x0, 0x2, QBOX TXL_FLITS_G1_HOM_NONREQ, 0x0, 0x4, QBOX TXL_FLITS_G1_HOM, 0x0, 0x6, QBOX TXL_FLITS_G1_DRS_DATA, 0x0, 0x8, QBOX TXL_FLITS_G1_DRS_NONDATA, 0x0, 0x10, QBOX TXL_FLITS_G1_DRS, 0x0, 0x18, QBOX TXL_FLITS_G2_NDR_AD, 0x1, 0x1, QBOX TXL_FLITS_G2_NDR_AK, 0x1, 0x2, QBOX TXL_FLITS_G2_NCB_DATA, 0x1, 0x4, QBOX TXL_FLITS_G2_NCB_NONDATA, 0x1, 0x8, QBOX TXL_FLITS_G2_NCB, 0x1, 0xC, QBOX TXL_FLITS_G2_NCS, 0x1, 0x10, QBOX TXL_INSERTS, 0x4, 0x0, QBOX TXL_OCCUPANCY, 0x7, 0x0, QBOX TXR_AD_HOM_CREDIT_ACQUIRED_VN0, 0x26, 0x1, QBOX TXR_AD_HOM_CREDIT_ACQUIRED_VN1, 0x26, 0x2, QBOX TXR_AD_HOM_CREDIT_OCCUPANCY_VN0, 0x22, 0x1, QBOX TXR_AD_HOM_CREDIT_OCCUPANCY_VN1, 0x22, 0x2, QBOX TXR_AD_NDR_CREDIT_ACQUIRED_VN0, 0x28, 0x1, QBOX TXR_AD_NDR_CREDIT_ACQUIRED_VN1, 0x28, 0x2, QBOX TXR_AD_NDR_CREDIT_OCCUPANCY_VN0, 0x24, 0x1, QBOX TXR_AD_NDR_CREDIT_OCCUPANCY_VN1, 0x24, 0x2, QBOX TXR_AD_SNP_CREDIT_ACQUIRED_VN0, 0x27, 0x1, QBOX TXR_AD_SNP_CREDIT_ACQUIRED_VN1, 0x27, 0x2, QBOX TXR_AD_SNP_CREDIT_OCCUPANCY_VN0, 0x23, 0x1, QBOX TXR_AD_SNP_CREDIT_OCCUPANCY_VN1, 0x23, 0x2, QBOX TXR_AK_NDR_CREDIT_ACQUIRED, 0x29, 0x0, QBOX TXR_AK_NDR_CREDIT_OCCUPANCY, 0x25, 0x0, QBOX TXR_BL_DRS_CREDIT_ACQUIRED_VN0, 0x2A, 0x1, QBOX TXR_BL_DRS_CREDIT_ACQUIRED_VN1, 0x2A, 0x2, QBOX TXR_BL_DRS_CREDIT_ACQUIRED_VN_SHR, 0x2A, 0x4, QBOX TXR_BL_DRS_CREDIT_OCCUPANCY_VN0, 0x1F, 0x1, QBOX TXR_BL_DRS_CREDIT_OCCUPANCY_VN1, 0x1F, 0x2, QBOX TXR_BL_DRS_CREDIT_OCCUPANCY_VN_SHR, 0x1F, 0x4, QBOX TXR_BL_NCB_CREDIT_ACQUIRED_VN0, 0x2B, 0x1, QBOX TXR_BL_NCB_CREDIT_ACQUIRED_VN1, 0x2B, 0x2, QBOX TXR_BL_NCB_CREDIT_OCCUPANCY_VN0, 0x20, 0x1, QBOX TXR_BL_NCB_CREDIT_OCCUPANCY_VN1, 0x20, 0x2, QBOX TXR_BL_NCS_CREDIT_ACQUIRED_VN0, 0x2C, 0x1, QBOX TXR_BL_NCS_CREDIT_ACQUIRED_VN1, 0x2C, 0x2, QBOX TXR_BL_NCS_CREDIT_OCCUPANCY_VN0, 0x21, 0x1, QBOX TXR_BL_NCS_CREDIT_OCCUPANCY_VN1, 0x21, 0x2, QBOX VNA_CREDIT_RETURNS, 0x1C, 0x0, QBOX VNA_CREDIT_RETURN_OCCUPANCY, 0x1B, 0x0, QBOX QPI_RATE, 0x0, 0x0, QBOX0FIX0|QBOX1FIX0|QBOX2FIX0 QPI_RX_IDLE, 0x1, 0x0, QBOX0FIX1|QBOX1FIX1|QBOX2FIX1 QPI_RX_LLR, 0x2, 0x0, QBOX0FIX2|QBOX1FIX2|QBOX2FIX2
In [3]:
!likwid-perfctr -a
Group name Description -------------------------------------------------------------------------------- FLOPS_AVX Packed AVX MFLOP/s TLB_INSTR L1 Instruction TLB miss rate/ratio NUMA Local and remote memory accesses ENERGY Power and Energy consumption TLB_DATA L2 data TLB miss rate/ratio CLOCK Power and Energy consumption PORT_USAGE Execution port utilization CYCLE_ACTIVITY Cycle Activities UOPS UOPs execution info QPI QPI Link Layer data L2 L2 cache bandwidth in MBytes/s CACHES Cache bandwidth in MBytes/s BRANCH Branch prediction miss rate/ratio DATA Load to store ratio RECOVERY Recovery duration UOPS_EXEC UOPs execution MEM Main memory bandwidth in MBytes/s UOPS_ISSUE UOPs issueing ICACHE Instruction cache miss rate/ratio L3CACHE L3 cache miss rate/ratio L2CACHE L2 cache miss rate/ratio SBOX Ring Transfer bandwidth HA Main memory bandwidth in MBytes/s seen from Home agent FALSE_SHARE False sharing UOPS_RETIRE UOPs retirement L3 L3 cache bandwidth in MBytes/s CBOX CBOX related data and metrics
In [28]:
!likwid-perfctr -H -g MEM
Group MEM: Formulas: Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0))*64.0/runtime Memory read data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0))*64.0 Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC1))*64.0/runtime Memory write data volume [GBytes] = 1.0E-09*(SUM(MBOXxC1))*64.0 Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0/runtime Memory data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0 - Profiling group to measure memory bandwidth drawn by all cores of a socket. Since this group is based on Uncore events it is only possible to measure on a per socket base. Some of the counters may not be available on your system. Also outputs total data volume transferred from main memory. The same metrics are provided by the HA group.
In [15]:
%%writefile tmp/perfctr.py
import numpy as np
import likwid
likwid.init_thread()
likwid.init_openmp_threads()
n = 2048
with likwid.Region("generation"):
A = np.random.randn(n, n)
b = np.random.randn(n)
with likwid.Region("matmul"):
A @ A
Overwriting tmp/perfctr.py
Also add -m
option below.
- Advantages?
- Disadvantages?
Make sure the MSR access daemon is SUID root:
chmod u+s /usr/sbin/likwid-accessD
In [18]:
!likwid-perfctr -C S0:0-7@S1:0-7 -M 1 -g MEM python3 ./tmp/perfctr.py
-------------------------------------------------------------------------------- CPU name: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz CPU type: Intel Xeon Broadwell EN/EP/EX processor CPU clock: 2.19 GHz -------------------------------------------------------------------------------- Running without Marker API. Activate Marker API with -m on commandline. -------------------------------------------------------------------------------- Group 1: MEM +-----------------------+---------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------+-----------+ | Event | Counter | Core 0 | Core 1 | Core 2 | Core 3 | Core 4 | Core 5 | Core 6 | Core 7 | Core 12 | Core 13 | Core 14 | Core 15 | Core 16 | Core 17 | Core 18 | Core 19 | +-----------------------+---------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------+-----------+ | INSTR_RETIRED_ANY | FIXC0 | 1949520384 | 356458518 | 364873900 | 362027043 | 352323878 | 340961083 | 339418061 | 340180266 | 329162139 | 438126876 | 323908544 | 320824436 | 315188456 | 309798621 | 25306192 | 339338862 | | CPU_CLK_UNHALTED_CORE | FIXC1 | 1116681108 | 256556129 | 272782533 | 268002653 | 250929834 | 233567786 | 231861324 | 227811152 | 216251858 | 294065131 | 206608401 | 200982634 | 191193868 | 182247701 | 40194541 | 230682461 | | CPU_CLK_UNHALTED_REF | FIXC2 | 1133365728 | 336271012 | 303399030 | 298917234 | 291128508 | 278965060 | 277807860 | 284598006 | 275326590 | 392295904 | 271997726 | 270141696 | 264090640 | 259688220 | 54866878 | 326025546 | | CAS_COUNT_RD | MBOX0C0 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | CAS_COUNT_WR | MBOX0C1 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | CAS_COUNT_RD | MBOX1C0 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | CAS_COUNT_WR | MBOX1C1 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | CAS_COUNT_RD | MBOX2C0 | 1841226 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 297400 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | CAS_COUNT_WR | MBOX2C1 | 1764762 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 169316 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | CAS_COUNT_RD | MBOX3C0 | 1719502 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 395703 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | CAS_COUNT_WR | MBOX3C1 | 1643243 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 268513 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | CAS_COUNT_RD | MBOX4C0 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | CAS_COUNT_WR | MBOX4C1 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | CAS_COUNT_RD | MBOX5C0 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | CAS_COUNT_WR | MBOX5C1 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | CAS_COUNT_RD | MBOX6C0 | 1837196 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 293754 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | CAS_COUNT_WR | MBOX6C1 | 1762917 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 167864 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | CAS_COUNT_RD | MBOX7C0 | 1715048 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 392383 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | CAS_COUNT_WR | MBOX7C1 | 1641179 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 266912 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +-----------------------+---------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------+-----------+ +----------------------------+---------+------------+----------+------------+--------------+ | Event | Counter | Sum | Min | Max | Avg | +----------------------------+---------+------------+----------+------------+--------------+ | INSTR_RETIRED_ANY STAT | FIXC0 | 6807417259 | 25306192 | 1949520384 | 4.254636e+08 | | CPU_CLK_UNHALTED_CORE STAT | FIXC1 | 4420419114 | 40194541 | 1116681108 | 2.762762e+08 | | CPU_CLK_UNHALTED_REF STAT | FIXC2 | 5318885638 | 54866878 | 1133365728 | 3.324304e+08 | | CAS_COUNT_RD STAT | MBOX0C0 | 0 | 0 | 0 | 0 | | CAS_COUNT_WR STAT | MBOX0C1 | 0 | 0 | 0 | 0 | | CAS_COUNT_RD STAT | MBOX1C0 | 0 | 0 | 0 | 0 | | CAS_COUNT_WR STAT | MBOX1C1 | 0 | 0 | 0 | 0 | | CAS_COUNT_RD STAT | MBOX2C0 | 2138626 | 0 | 1841226 | 133664.1250 | | CAS_COUNT_WR STAT | MBOX2C1 | 1934078 | 0 | 1764762 | 120879.8750 | | CAS_COUNT_RD STAT | MBOX3C0 | 2115205 | 0 | 1719502 | 132200.3125 | | CAS_COUNT_WR STAT | MBOX3C1 | 1911756 | 0 | 1643243 | 119484.7500 | | CAS_COUNT_RD STAT | MBOX4C0 | 0 | 0 | 0 | 0 | | CAS_COUNT_WR STAT | MBOX4C1 | 0 | 0 | 0 | 0 | | CAS_COUNT_RD STAT | MBOX5C0 | 0 | 0 | 0 | 0 | | CAS_COUNT_WR STAT | MBOX5C1 | 0 | 0 | 0 | 0 | | CAS_COUNT_RD STAT | MBOX6C0 | 2130950 | 0 | 1837196 | 133184.3750 | | CAS_COUNT_WR STAT | MBOX6C1 | 1930781 | 0 | 1762917 | 120673.8125 | | CAS_COUNT_RD STAT | MBOX7C0 | 2107431 | 0 | 1715048 | 131714.4375 | | CAS_COUNT_WR STAT | MBOX7C1 | 1908091 | 0 | 1641179 | 119255.6875 | +----------------------------+---------+------------+----------+------------+--------------+ +-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+ | Metric | Core 0 | Core 1 | Core 2 | Core 3 | Core 4 | Core 5 | Core 6 | Core 7 | Core 12 | Core 13 | Core 14 | Core 15 | Core 16 | Core 17 | Core 18 | Core 19 | +-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+ | Runtime (RDTSC) [s] | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | 1.0019 | | Runtime unhalted [s] | 0.5088 | 0.1169 | 0.1243 | 0.1221 | 0.1143 | 0.1064 | 0.1056 | 0.1038 | 0.0985 | 0.1340 | 0.0941 | 0.0916 | 0.0871 | 0.0830 | 0.0183 | 0.1051 | | Clock [MHz] | 2162.4963 | 1674.5158 | 1973.3251 | 1967.8157 | 1891.7504 | 1837.6356 | 1831.8084 | 1756.8691 | 1723.8837 | 1645.2278 | 1667.1665 | 1632.9135 | 1588.9756 | 1540.3027 | 1607.8780 | 1552.9562 | | CPI | 0.5728 | 0.7197 | 0.7476 | 0.7403 | 0.7122 | 0.6850 | 0.6831 | 0.6697 | 0.6570 | 0.6712 | 0.6379 | 0.6265 | 0.6066 | 0.5883 | 1.5883 | 0.6798 | | Memory read bandwidth [MBytes/s] | 454.3715 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 88.1049 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | Memory read data volume [GBytes] | 0.4552 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0883 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | Memory write bandwidth [MBytes/s] | 435.1521 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 55.7414 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | Memory write data volume [GBytes] | 0.4360 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0558 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | Memory bandwidth [MBytes/s] | 889.5237 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 143.8462 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | Memory data volume [GBytes] | 0.8912 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1441 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+ +----------------------------------------+------------+-----------+-----------+-----------+ | Metric | Sum | Min | Max | Avg | +----------------------------------------+------------+-----------+-----------+-----------+ | Runtime (RDTSC) [s] STAT | 16.0304 | 1.0019 | 1.0019 | 1.0019 | | Runtime unhalted [s] STAT | 2.0139 | 0.0183 | 0.5088 | 0.1259 | | Clock [MHz] STAT | 28055.5204 | 1540.3027 | 2162.4963 | 1753.4700 | | CPI STAT | 11.5860 | 0.5728 | 1.5883 | 0.7241 | | Memory read bandwidth [MBytes/s] STAT | 542.4764 | 0 | 454.3715 | 33.9048 | | Memory read data volume [GBytes] STAT | 0.5435 | 0 | 0.4552 | 0.0340 | | Memory write bandwidth [MBytes/s] STAT | 490.8935 | 0 | 435.1521 | 30.6808 | | Memory write data volume [GBytes] STAT | 0.4918 | 0 | 0.4360 | 0.0307 | | Memory bandwidth [MBytes/s] STAT | 1033.3699 | 0 | 889.5237 | 64.5856 | | Memory data volume [GBytes] STAT | 1.0353 | 0 | 0.8912 | 0.0647 | +----------------------------------------+------------+-----------+-----------+-----------+
In [ ]: