Using Performance Counters¶

In [1]:
!mkdir -p tmp

Using perf¶

A Linux tool for accessing performance counters.

See also the Wiki documentation for perf.

In [2]:
!perf list
  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  ref-cycles                                         [Hardware event]
  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  cgroup-switches                                    [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]

tool:
  duration_time
  user_time
  system_time

cache:
  L1-dcache-loads OR cpu_atom/L1-dcache-loads/
  L1-dcache-stores OR cpu_atom/L1-dcache-stores/
  L1-icache-loads OR cpu_atom/L1-icache-loads/
  L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/
  LLC-loads OR cpu_atom/LLC-loads/
  LLC-load-misses OR cpu_atom/LLC-load-misses/
  LLC-stores OR cpu_atom/LLC-stores/
  LLC-store-misses OR cpu_atom/LLC-store-misses/
  dTLB-loads OR cpu_atom/dTLB-loads/
  dTLB-load-misses OR cpu_atom/dTLB-load-misses/
  dTLB-stores OR cpu_atom/dTLB-stores/
  dTLB-store-misses OR cpu_atom/dTLB-store-misses/
  iTLB-load-misses OR cpu_atom/iTLB-load-misses/
  branch-loads OR cpu_atom/branch-loads/
  branch-load-misses OR cpu_atom/branch-load-misses/
  L1-dcache-loads OR cpu_core/L1-dcache-loads/
  L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/
  L1-dcache-stores OR cpu_core/L1-dcache-stores/
  L1-icache-load-misses OR cpu_core/L1-icache-load-misses/
  LLC-loads OR cpu_core/LLC-loads/
  LLC-load-misses OR cpu_core/LLC-load-misses/
  LLC-stores OR cpu_core/LLC-stores/
  LLC-store-misses OR cpu_core/LLC-store-misses/
  dTLB-loads OR cpu_core/dTLB-loads/
  dTLB-load-misses OR cpu_core/dTLB-load-misses/
  dTLB-stores OR cpu_core/dTLB-stores/
  dTLB-store-misses OR cpu_core/dTLB-store-misses/
  iTLB-load-misses OR cpu_core/iTLB-load-misses/
  branch-loads OR cpu_core/branch-loads/
  branch-load-misses OR cpu_core/branch-load-misses/
  node-loads OR cpu_core/node-loads/
  node-load-misses OR cpu_core/node-load-misses/
  branch-instructions OR cpu_atom/branch-instructions/[Kernel PMU event]
  branch-misses OR cpu_atom/branch-misses/           [Kernel PMU event]
  bus-cycles OR cpu_atom/bus-cycles/                 [Kernel PMU event]
  cache-misses OR cpu_atom/cache-misses/             [Kernel PMU event]
  cache-references OR cpu_atom/cache-references/     [Kernel PMU event]
  cpu-cycles OR cpu_atom/cpu-cycles/                 [Kernel PMU event]
  instructions OR cpu_atom/instructions/             [Kernel PMU event]
  mem-loads OR cpu_atom/mem-loads/                   [Kernel PMU event]
  mem-stores OR cpu_atom/mem-stores/                 [Kernel PMU event]
  ref-cycles OR cpu_atom/ref-cycles/                 [Kernel PMU event]
  topdown-bad-spec OR cpu_atom/topdown-bad-spec/     [Kernel PMU event]
  topdown-be-bound OR cpu_atom/topdown-be-bound/     [Kernel PMU event]
  topdown-fe-bound OR cpu_atom/topdown-fe-bound/     [Kernel PMU event]
  topdown-retiring OR cpu_atom/topdown-retiring/     [Kernel PMU event]
  branch-instructions OR cpu_core/branch-instructions/[Kernel PMU event]
  branch-misses OR cpu_core/branch-misses/           [Kernel PMU event]
  bus-cycles OR cpu_core/bus-cycles/                 [Kernel PMU event]
  cache-misses OR cpu_core/cache-misses/             [Kernel PMU event]
  cache-references OR cpu_core/cache-references/     [Kernel PMU event]
  cpu-cycles OR cpu_core/cpu-cycles/                 [Kernel PMU event]
  instructions OR cpu_core/instructions/             [Kernel PMU event]
  mem-loads OR cpu_core/mem-loads/                   [Kernel PMU event]
  mem-loads-aux OR cpu_core/mem-loads-aux/           [Kernel PMU event]
  mem-stores OR cpu_core/mem-stores/                 [Kernel PMU event]
  ref-cycles OR cpu_core/ref-cycles/                 [Kernel PMU event]
  slots OR cpu_core/slots/                           [Kernel PMU event]
  topdown-bad-spec OR cpu_core/topdown-bad-spec/     [Kernel PMU event]
  topdown-be-bound OR cpu_core/topdown-be-bound/     [Kernel PMU event]
  topdown-br-mispredict OR cpu_core/topdown-br-mispredict/[Kernel PMU event]
  topdown-fe-bound OR cpu_core/topdown-fe-bound/     [Kernel PMU event]
  topdown-fetch-lat OR cpu_core/topdown-fetch-lat/   [Kernel PMU event]
  topdown-heavy-ops OR cpu_core/topdown-heavy-ops/   [Kernel PMU event]
  topdown-mem-bound OR cpu_core/topdown-mem-bound/   [Kernel PMU event]
  topdown-retiring OR cpu_core/topdown-retiring/     [Kernel PMU event]
  cstate_core/c1-residency/                          [Kernel PMU event]
  cstate_core/c6-residency/                          [Kernel PMU event]
  cstate_core/c7-residency/                          [Kernel PMU event]
  cstate_pkg/c10-residency/                          [Kernel PMU event]
  cstate_pkg/c2-residency/                           [Kernel PMU event]
  cstate_pkg/c3-residency/                           [Kernel PMU event]
  cstate_pkg/c6-residency/                           [Kernel PMU event]
  cstate_pkg/c8-residency/                           [Kernel PMU event]
  i915/actual-frequency/                             [Kernel PMU event]
  i915/bcs0-busy/                                    [Kernel PMU event]
  i915/bcs0-sema/                                    [Kernel PMU event]
  i915/bcs0-wait/                                    [Kernel PMU event]
  i915/interrupts/                                   [Kernel PMU event]
  i915/rc6-residency/                                [Kernel PMU event]
  i915/rcs0-busy/                                    [Kernel PMU event]
  i915/rcs0-sema/                                    [Kernel PMU event]
  i915/rcs0-wait/                                    [Kernel PMU event]
  i915/requested-frequency/                          [Kernel PMU event]
  i915/software-gt-awake-time/                       [Kernel PMU event]
  i915/vcs0-busy/                                    [Kernel PMU event]
  i915/vcs0-sema/                                    [Kernel PMU event]
  i915/vcs0-wait/                                    [Kernel PMU event]
  i915/vcs1-busy/                                    [Kernel PMU event]
  i915/vcs1-sema/                                    [Kernel PMU event]
  i915/vcs1-wait/                                    [Kernel PMU event]
  i915/vecs0-busy/                                   [Kernel PMU event]
  i915/vecs0-sema/                                   [Kernel PMU event]
  i915/vecs0-wait/                                   [Kernel PMU event]
  intel_bts//                                        [Kernel PMU event]
  intel_pt//                                         [Kernel PMU event]
  msr/aperf/                                         [Kernel PMU event]
  msr/cpu_thermal_margin/                            [Kernel PMU event]
  msr/mperf/                                         [Kernel PMU event]
  msr/pperf/                                         [Kernel PMU event]
  msr/smi/                                           [Kernel PMU event]
  msr/tsc/                                           [Kernel PMU event]
  power/energy-cores/                                [Kernel PMU event]
  power/energy-gpu/                                  [Kernel PMU event]
  power/energy-pkg/                                  [Kernel PMU event]
  power/energy-psys/                                 [Kernel PMU event]
  uncore_clock/clockticks/                           [Kernel PMU event]
  uncore_imc_free_running/data_read/                 [Kernel PMU event]
  uncore_imc_free_running/data_total/                [Kernel PMU event]
  uncore_imc_free_running/data_write/                [Kernel PMU event]

cache:
  longest_lat_cache.miss
       [Counts the number of cacheable memory requests that miss in the LLC.
        Counts on a per core basis. Unit: cpu_atom]
  longest_lat_cache.reference
       [Counts the number of cacheable memory requests that access the LLC.
        Counts on a per core basis. Unit: cpu_atom]
  mem_bound_stalls.ifetch
       [Counts the number of cycles the core is stalled due to an instruction
        cache or TLB miss which hit in the L2,LLC,DRAM or MMIO (Non-DRAM). Unit:
        cpu_atom]
  mem_bound_stalls.ifetch_dram_hit
       [Counts the number of cycles the core is stalled due to an instruction
        cache or TLB miss which hit in DRAM or MMIO (Non-DRAM). Unit: cpu_atom]
  mem_bound_stalls.ifetch_l2_hit
       [Counts the number of cycles the core is stalled due to an instruction
        cache or TLB miss which hit in the L2 cache. Unit: cpu_atom]
  mem_bound_stalls.ifetch_llc_hit
       [Counts the number of cycles the core is stalled due to an instruction
        cache or TLB miss which hit in the LLC or other core with HITE/F/M.
        Unit: cpu_atom]
  mem_bound_stalls.load
       [Counts the number of cycles the core is stalled due to a demand load
        miss which hit in the L2,LLC,DRAM or MMIO (Non-DRAM). Unit: cpu_atom]
  mem_bound_stalls.load_dram_hit
       [Counts the number of cycles the core is stalled due to a demand load
        miss which hit in DRAM or MMIO (Non-DRAM). Unit: cpu_atom]
  mem_bound_stalls.load_l2_hit
       [Counts the number of cycles the core is stalled due to a demand load
        which hit in the L2 cache. Unit: cpu_atom]
  mem_bound_stalls.load_llc_hit
       [Counts the number of cycles the core is stalled due to a demand load
        which hit in the LLC or other core with HITE/F/M. Unit: cpu_atom]
  mem_load_uops_retired.dram_hit
       [Counts the number of load uops retired that hit in DRAM Supports address
        when precise (Precise event). Unit: cpu_atom]
  mem_load_uops_retired.l2_hit
       [Counts the number of load uops retired that hit in the L2 cache Supports
        address when precise (Precise event). Unit: cpu_atom]
  mem_load_uops_retired.l3_hit
       [Counts the number of load uops retired that hit in the L3 cache Supports
        address when precise (Precise event). Unit: cpu_atom]
  mem_scheduler_block.all
       [Counts the number of cycles that uops are blocked for any of the
        following reasons: load buffer,store buffer or RSV full. Unit: cpu_atom]
  mem_scheduler_block.ld_buf
       [Counts the number of cycles that uops are blocked due to a load buffer
        full condition. Unit: cpu_atom]
  mem_scheduler_block.rsv
       [Counts the number of cycles that uops are blocked due to an RSV full
        condition. Unit: cpu_atom]
  mem_scheduler_block.st_buf
       [Counts the number of cycles that uops are blocked due to a store buffer
        full condition. Unit: cpu_atom]
  mem_uops_retired.all_loads
       [Counts the number of load uops retired Supports address when precise
        (Precise event). Unit: cpu_atom]
  mem_uops_retired.all_stores
       [Counts the number of store uops retired Supports address when precise
        (Precise event). Unit: cpu_atom]
  mem_uops_retired.load_latency_gt_128
       [Counts the number of tagged loads with an instruction latency that
        exceeds or equals the threshold of 128 cycles as defined in
        MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled
        Supports address when precise (Must be precise). Unit: cpu_atom]
  mem_uops_retired.load_latency_gt_16
       [Counts the number of tagged loads with an instruction latency that
        exceeds or equals the threshold of 16 cycles as defined in
        MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled
        Supports address when precise (Must be precise). Unit: cpu_atom]
  mem_uops_retired.load_latency_gt_256
       [Counts the number of tagged loads with an instruction latency that
        exceeds or equals the threshold of 256 cycles as defined in
        MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled
        Supports address when precise (Must be precise). Unit: cpu_atom]
  mem_uops_retired.load_latency_gt_32
       [Counts the number of tagged loads with an instruction latency that
        exceeds or equals the threshold of 32 cycles as defined in
        MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled
        Supports address when precise (Must be precise). Unit: cpu_atom]
  mem_uops_retired.load_latency_gt_4
       [Counts the number of tagged loads with an instruction latency that
        exceeds or equals the threshold of 4 cycles as defined in
        MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled
        Supports address when precise (Must be precise). Unit: cpu_atom]
  mem_uops_retired.load_latency_gt_512
       [Counts the number of tagged loads with an instruction latency that
        exceeds or equals the threshold of 512 cycles as defined in
        MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled
        Supports address when precise (Must be precise). Unit: cpu_atom]
  mem_uops_retired.load_latency_gt_64
       [Counts the number of tagged loads with an instruction latency that
        exceeds or equals the threshold of 64 cycles as defined in
        MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled
        Supports address when precise (Must be precise). Unit: cpu_atom]
  mem_uops_retired.load_latency_gt_8
       [Counts the number of tagged loads with an instruction latency that
        exceeds or equals the threshold of 8 cycles as defined in
        MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled
        Supports address when precise (Must be precise). Unit: cpu_atom]
  mem_uops_retired.lock_loads
       [Counts the number of load uops retired that performed one or more locks
        Supports address when precise (Precise event). Unit: cpu_atom]
  mem_uops_retired.split_loads
       [Counts the number of retired split load uops Supports address when
        precise (Precise event). Unit: cpu_atom]
  mem_uops_retired.store_latency
       [Counts the number of stores uops retired. Counts with or without PEBS
        enabled Supports address when precise (Must be precise). Unit: cpu_atom]
  ocr.demand_data_rd.l3_hit
       [Counts demand data reads that were supplied by the L3 cache. Unit:
        cpu_atom]
  ocr.demand_data_rd.l3_hit.snoop_hit_no_fwd
       [Counts demand data reads that were supplied by the L3 cache where a
        snoop was sent,the snoop hit,but no data was forwarded. Unit: cpu_atom]
  ocr.demand_data_rd.l3_hit.snoop_hit_with_fwd
       [Counts demand data reads that were supplied by the L3 cache where a
        snoop was sent,the snoop hit,and non-modified data was forwarded. Unit:
        cpu_atom]
  ocr.demand_data_rd.l3_hit.snoop_hitm
       [Counts demand data reads that were supplied by the L3 cache where a
        snoop was sent,the snoop hit,and modified data was forwarded. Unit:
        cpu_atom]
  ocr.demand_rfo.l3_hit
       [Counts demand reads for ownership (RFO) and software prefetches for
        exclusive ownership (PREFETCHW) that were supplied by the L3 cache.
        Unit: cpu_atom]
  ocr.demand_rfo.l3_hit.snoop_hitm
       [Counts demand reads for ownership (RFO) and software prefetches for
        exclusive ownership (PREFETCHW) that were supplied by the L3 cache where
        a snoop was sent,the snoop hit,and modified data was forwarded. Unit:
        cpu_atom]
  topdown_fe_bound.icache
       [Counts the number of issue slots every cycle that were not delivered by
        the frontend due to instruction cache misses. Unit: cpu_atom]
  l1d.hwpf_miss
       [L1D.HWPF_MISS. Unit: cpu_core]
  l1d.replacement
       [Counts the number of cache lines replaced in L1 data cache. Unit:
        cpu_core]
  l1d_pend_miss.fb_full
       [Number of cycles a demand request has waited due to L1D Fill Buffer (FB)
        unavailability. Unit: cpu_core]
  l1d_pend_miss.fb_full_periods
       [Number of phases a demand request has waited due to L1D Fill Buffer (FB)
        unavailability. Unit: cpu_core]
  l1d_pend_miss.l2_stalls
       [Number of cycles a demand request has waited due to L1D due to lack of
        L2 resources. Unit: cpu_core]
  l1d_pend_miss.pending
       [Number of L1D misses that are outstanding. Unit: cpu_core]
  l1d_pend_miss.pending_cycles
       [Cycles with L1D load Misses outstanding. Unit: cpu_core]
  l2_lines_in.all
       [L2 cache lines filling L2. Unit: cpu_core]
  l2_lines_out.useless_hwpf
       [Cache lines that have been L2 hardware prefetched but not used by demand
        accesses. Unit: cpu_core]
  l2_request.all
       [All accesses to L2 cache [This event is alias to L2_RQSTS.REFERENCES].
        Unit: cpu_core]
  l2_request.miss
       [Read requests with true-miss in L2 cache. [This event is alias to
        L2_RQSTS.MISS]. Unit: cpu_core]
  l2_rqsts.all_code_rd
       [L2 code requests. Unit: cpu_core]
  l2_rqsts.all_demand_data_rd
       [Demand Data Read access L2 cache. Unit: cpu_core]
  l2_rqsts.all_demand_miss
       [Demand requests that miss L2 cache. Unit: cpu_core]
  l2_rqsts.all_hwpf
       [L2_RQSTS.ALL_HWPF. Unit: cpu_core]
  l2_rqsts.all_rfo
       [RFO requests to L2 cache. Unit: cpu_core]
  l2_rqsts.code_rd_hit
       [L2 cache hits when fetching instructions,code reads. Unit: cpu_core]
  l2_rqsts.code_rd_miss
       [L2 cache misses when fetching instructions. Unit: cpu_core]
  l2_rqsts.demand_data_rd_hit
       [Demand Data Read requests that hit L2 cache. Unit: cpu_core]
  l2_rqsts.demand_data_rd_miss
       [Demand Data Read miss L2 cache. Unit: cpu_core]
  l2_rqsts.hwpf_miss
       [L2_RQSTS.HWPF_MISS. Unit: cpu_core]
  l2_rqsts.miss
       [Read requests with true-miss in L2 cache. [This event is alias to
        L2_REQUEST.MISS]. Unit: cpu_core]
  l2_rqsts.references
       [All accesses to L2 cache [This event is alias to L2_REQUEST.ALL]. Unit:
        cpu_core]
  l2_rqsts.rfo_hit
       [RFO requests that hit L2 cache. Unit: cpu_core]
  l2_rqsts.rfo_miss
       [RFO requests that miss L2 cache. Unit: cpu_core]
  l2_rqsts.swpf_hit
       [SW prefetch requests that hit L2 cache. Unit: cpu_core]
  l2_rqsts.swpf_miss
       [SW prefetch requests that miss L2 cache. Unit: cpu_core]
  l2_trans.l2_wb
       [L2 writebacks that access L2 cache. Unit: cpu_core]
  longest_lat_cache.miss
       [Core-originated cacheable requests that missed L3 (Except hardware
        prefetches to the L3). Unit: cpu_core]
  longest_lat_cache.reference
       [Core-originated cacheable requests that refer to L3 (Except hardware
        prefetches to the L3). Unit: cpu_core]
  mem_inst_retired.all_loads
       [Retired load instructions Supports address when precise (Precise event).
        Unit: cpu_core]
  mem_inst_retired.all_stores
       [Retired store instructions Supports address when precise (Precise
        event). Unit: cpu_core]
  mem_inst_retired.any
       [All retired memory instructions Supports address when precise (Precise
        event). Unit: cpu_core]
  mem_inst_retired.lock_loads
       [Retired load instructions with locked access Supports address when
        precise (Precise event). Unit: cpu_core]
  mem_inst_retired.split_loads
       [Retired load instructions that split across a cacheline boundary
        Supports address when precise (Precise event). Unit: cpu_core]
  mem_inst_retired.split_stores
       [Retired store instructions that split across a cacheline boundary
        Supports address when precise (Precise event). Unit: cpu_core]
  mem_inst_retired.stlb_miss_loads
       [Retired load instructions that miss the STLB Supports address when
        precise (Precise event). Unit: cpu_core]
  mem_inst_retired.stlb_miss_stores
       [Retired store instructions that miss the STLB Supports address when
        precise (Precise event). Unit: cpu_core]
  mem_load_completed.l1_miss_any
       [Completed demand load uops that miss the L1 d-cache. Unit: cpu_core]
  mem_load_l3_hit_retired.xsnp_fwd
       [Retired load instructions whose data sources were HitM responses from
        shared L3 Supports address when precise (Precise event). Unit: cpu_core]
  mem_load_l3_hit_retired.xsnp_hit
       [Retired load instructions whose data sources were L3 and cross-core
        snoop hits in on-pkg core cache Supports address when precise (Precise
        event). Unit: cpu_core]
  mem_load_l3_hit_retired.xsnp_hitm
       [Retired load instructions whose data sources were HitM responses from
        shared L3 Supports address when precise (Precise event). Unit: cpu_core]
  mem_load_l3_hit_retired.xsnp_miss
       [Retired load instructions whose data sources were L3 hit and cross-core
        snoop missed in on-pkg core cache Supports address when precise (Precise
        event). Unit: cpu_core]
  mem_load_l3_hit_retired.xsnp_no_fwd
       [Retired load instructions whose data sources were L3 and cross-core
        snoop hits in on-pkg core cache Supports address when precise (Precise
        event). Unit: cpu_core]
  mem_load_l3_hit_retired.xsnp_none
       [Retired load instructions whose data sources were hits in L3 without
        snoops required Supports address when precise (Precise event). Unit:
        cpu_core]
  mem_load_l3_miss_retired.local_dram
       [Retired load instructions which data sources missed L3 but serviced from
        local dram Supports address when precise (Precise event). Unit: cpu_core]
  mem_load_misc_retired.uc
       [Retired instructions with at least 1 uncacheable load or lock Supports
        address when precise (Precise event). Unit: cpu_core]
  mem_load_retired.fb_hit
       [Number of completed demand load requests that missed the L1,but hit the
        FB(fill buffer),because a preceding miss to the same cacheline initiated
        the line to be brought into L1,but data is not yet ready in L1 Supports
        address when precise (Precise event). Unit: cpu_core]
  mem_load_retired.l1_hit
       [Retired load instructions with L1 cache hits as data sources Supports
        address when precise (Precise event). Unit: cpu_core]
  mem_load_retired.l1_miss
       [Retired load instructions missed L1 cache as data sources Supports
        address when precise (Precise event). Unit: cpu_core]
  mem_load_retired.l2_hit
       [Retired load instructions with L2 cache hits as data sources Supports
        address when precise (Precise event). Unit: cpu_core]
  mem_load_retired.l2_miss
       [Retired load instructions missed L2 cache as data sources Supports
        address when precise (Precise event). Unit: cpu_core]
  mem_load_retired.l3_hit
       [Retired load instructions with L3 cache hits as data sources Supports
        address when precise (Precise event). Unit: cpu_core]
  mem_load_retired.l3_miss
       [Retired load instructions missed L3 cache as data sources Supports
        address when precise (Precise event). Unit: cpu_core]
  mem_store_retired.l2_hit
       [MEM_STORE_RETIRED.L2_HIT. Unit: cpu_core]
  mem_uop_retired.any
       [Retired memory uops for any access. Unit: cpu_core]
  ocr.demand_data_rd.l3_hit.snoop_hit_with_fwd
       [Counts demand data reads that resulted in a snoop hit in another cores
        caches which forwarded the unmodified data to the requesting core. Unit:
        cpu_core]
  ocr.demand_data_rd.l3_hit.snoop_hitm
       [Counts demand data reads that resulted in a snoop hit in another cores
        caches,data forwarding is required as the data is modified. Unit:
        cpu_core]
  ocr.demand_rfo.l3_hit.snoop_hitm
       [Counts demand read for ownership (RFO) requests and software prefetches
        for exclusive ownership (PREFETCHW) that resulted in a snoop hit in
        another cores caches,data forwarding is required as the data is
        modified. Unit: cpu_core]
  offcore_requests.all_requests
       [OFFCORE_REQUESTS.ALL_REQUESTS. Unit: cpu_core]
  offcore_requests.data_rd
       [Demand and prefetch data reads. Unit: cpu_core]
  offcore_requests.demand_code_rd
       [Cacheable and noncacheable code read requests. Unit: cpu_core]
  offcore_requests.demand_data_rd
       [Demand Data Read requests sent to uncore. Unit: cpu_core]
  offcore_requests.demand_rfo
       [Demand RFO requests including regular RFOs,locks,ItoM. Unit: cpu_core]
  offcore_requests_outstanding.cycles_with_data_rd
       [OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD Spec update: ADL038.
        Unit: cpu_core]
  offcore_requests_outstanding.cycles_with_demand_code_rd
       [Cycles with offcore outstanding Code Reads transactions in the
        SuperQueue (SQ),queue to uncore. Unit: cpu_core]
  offcore_requests_outstanding.cycles_with_demand_data_rd
       [Cycles where at least 1 outstanding demand data read request is pending.
        Unit: cpu_core]
  offcore_requests_outstanding.cycles_with_demand_rfo
       [For every cycle where the core is waiting on at least 1 outstanding
        Demand RFO request,increments by 1. Unit: cpu_core]
  offcore_requests_outstanding.data_rd
       [OFFCORE_REQUESTS_OUTSTANDING.DATA_RD Spec update: ADL038. Unit: cpu_core]
  offcore_requests_outstanding.demand_code_rd
       [Offcore outstanding Code Reads transactions in the SuperQueue (SQ),queue
        to uncore,every cycle. Unit: cpu_core]
  offcore_requests_outstanding.demand_data_rd
       [For every cycle,increments by the number of outstanding demand data read
        requests pending. Unit: cpu_core]
  sq_misc.bus_lock
       [Counts bus locks,accounts for cache line split locks and UC locks. Unit:
        cpu_core]
  sw_prefetch_access.any
       [Counts the number of PREFETCHNTA,PREFETCHW,PREFETCHT0,PREFETCHT1 or
        PREFETCHT2 instructions executed. Unit: cpu_core]
  sw_prefetch_access.nta
       [Number of PREFETCHNTA instructions executed. Unit: cpu_core]
  sw_prefetch_access.prefetchw
       [Number of PREFETCHW instructions executed. Unit: cpu_core]
  sw_prefetch_access.t0
       [Number of PREFETCHT0 instructions executed. Unit: cpu_core]
  sw_prefetch_access.t1_t2
       [Number of PREFETCHT1 or PREFETCHT2 instructions executed. Unit: cpu_core]

floating point:
  machine_clears.fp_assist
       [Counts the number of floating point operations retired that required
        microcode assist. Unit: cpu_atom]
  uops_retired.fpdiv
       [Counts the number of floating point divide uops retired (x87 and SSE,
        including x87 sqrt) (Precise event). Unit: cpu_atom]
  arith.fpdiv_active
       [ARITH.FPDIV_ACTIVE. Unit: cpu_core]
  assists.fp
       [Counts all microcode FP assists. Unit: cpu_core]
  assists.sse_avx_mix
       [ASSISTS.SSE_AVX_MIX. Unit: cpu_core]
  fp_arith_dispatched.port_0
       [FP_ARITH_DISPATCHED.PORT_0 [This event is alias to
        FP_ARITH_DISPATCHED.V0]. Unit: cpu_core]
  fp_arith_dispatched.port_1
       [FP_ARITH_DISPATCHED.PORT_1 [This event is alias to
        FP_ARITH_DISPATCHED.V1]. Unit: cpu_core]
  fp_arith_dispatched.port_5
       [FP_ARITH_DISPATCHED.PORT_5 [This event is alias to
        FP_ARITH_DISPATCHED.V2]. Unit: cpu_core]
  fp_arith_dispatched.v0
       [FP_ARITH_DISPATCHED.V0 [This event is alias to
        FP_ARITH_DISPATCHED.PORT_0]. Unit: cpu_core]
  fp_arith_dispatched.v1
       [FP_ARITH_DISPATCHED.V1 [This event is alias to
        FP_ARITH_DISPATCHED.PORT_1]. Unit: cpu_core]
  fp_arith_dispatched.v2
       [FP_ARITH_DISPATCHED.V2 [This event is alias to
        FP_ARITH_DISPATCHED.PORT_5]. Unit: cpu_core]
  fp_arith_inst_retired.128b_packed_double
       [Counts number of SSE/AVX computational 128-bit packed double precision
        floating-point instructions retired; some instructions will count twice
        as noted below. Each count represents 2 computation operations,one for
        each element. Applies to SSE* and AVX* packed double precision
        floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX
        SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as
        they perform 2 calculations per element. Unit: cpu_core]
  fp_arith_inst_retired.128b_packed_single
       [Number of SSE/AVX computational 128-bit packed single precision
        floating-point instructions retired; some instructions will count twice
        as noted below. Each count represents 4 computation operations,one for
        each element. Applies to SSE* and AVX* packed single precision
        floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT
        DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they
        perform 2 calculations per element. Unit: cpu_core]
  fp_arith_inst_retired.256b_packed_double
       [Counts number of SSE/AVX computational 256-bit packed double precision
        floating-point instructions retired; some instructions will count twice
        as noted below. Each count represents 4 computation operations,one for
        each element. Applies to SSE* and AVX* packed double precision
        floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX
        SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform
        2 calculations per element. Unit: cpu_core]
  fp_arith_inst_retired.256b_packed_single
       [Counts number of SSE/AVX computational 256-bit packed single precision
        floating-point instructions retired; some instructions will count twice
        as noted below. Each count represents 8 computation operations,one for
        each element. Applies to SSE* and AVX* packed single precision
        floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX
        SQRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count
        twice as they perform 2 calculations per element. Unit: cpu_core]
  fp_arith_inst_retired.4_flops
       [Number of SSE/AVX computational 128-bit packed single and 256-bit packed
        double precision FP instructions retired; some instructions will count
        twice as noted below. Each count represents 2 or/and 4 computation
        operations,1 for each element. Applies to SSE* and AVX* packed single
        precision and packed double precision FP instructions: ADD SUB HADD HSUB
        SUBADD MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and
        FM(N)ADD/SUB count twice as they perform 2 calculations per element.
        Unit: cpu_core]
  fp_arith_inst_retired.scalar
       [Number of SSE/AVX computational scalar floating-point instructions
        retired; some instructions will count twice as noted below. Applies to
        SSE* and AVX* scalar,double and single precision floating-point: ADD SUB
        MUL DIV MIN MAX RCP14 RSQRT14 RANGE SQRT DPP FM(N)ADD/SUB. DPP and
        FM(N)ADD/SUB instructions count twice as they perform multiple
        calculations per element. Unit: cpu_core]
  fp_arith_inst_retired.scalar_double
       [Counts number of SSE/AVX computational scalar double precision
        floating-point instructions retired; some instructions will count twice
        as noted below. Each count represents 1 computational operation. Applies
        to SSE* and AVX* scalar double precision floating-point instructions:
        ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions
        count twice as they perform 2 calculations per element. Unit: cpu_core]
  fp_arith_inst_retired.scalar_single
       [Counts number of SSE/AVX computational scalar single precision
        floating-point instructions retired; some instructions will count twice
        as noted below. Each count represents 1 computational operation. Applies
        to SSE* and AVX* scalar single precision floating-point instructions:
        ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB
        instructions count twice as they perform 2 calculations per element.
        Unit: cpu_core]
  fp_arith_inst_retired.vector
       [Number of any Vector retired FP arithmetic instructions. Unit: cpu_core]

frontend:
  baclears.any
       [Counts the total number of BACLEARS due to all branch types including
        conditional and unconditional jumps,returns,and indirect branches. Unit:
        cpu_atom]
  icache.accesses
       [Counts the number of requests to the instruction cache for one or more
        bytes of a cache line. Unit: cpu_atom]
  icache.misses
       [Counts the number of instruction cache misses. Unit: cpu_atom]
  baclears.any
       [Clears due to Unknown Branches. Unit: cpu_core]
  decode.lcp
       [Stalls caused by changing prefix length of the instruction. Unit:
        cpu_core]
  decode.ms_busy
       [Cycles the Microcode Sequencer is busy. Unit: cpu_core]
  dsb2mite_switches.penalty_cycles
       [DSB-to-MITE switch true penalty cycles. Unit: cpu_core]
  frontend_retired.any_dsb_miss
       [Retired Instructions who experienced DSB miss (Precise event). Unit:
        cpu_core]
  frontend_retired.dsb_miss
       [Retired Instructions who experienced a critical DSB miss (Precise
        event). Unit: cpu_core]
  frontend_retired.itlb_miss
       [Retired Instructions who experienced iTLB true miss (Precise event).
        Unit: cpu_core]
  frontend_retired.l1i_miss
       [Retired Instructions who experienced Instruction L1 Cache true miss
        (Precise event). Unit: cpu_core]
  frontend_retired.l2_miss
       [Retired Instructions who experienced Instruction L2 Cache true miss
        (Precise event). Unit: cpu_core]
  frontend_retired.latency_ge_1
       [Retired instructions after front-end starvation of at least 1 cycle
        (Precise event). Unit: cpu_core]
  frontend_retired.latency_ge_128
       [Retired instructions that are fetched after an interval where the
        front-end delivered no uops for a period of 128 cycles which was not
        interrupted by a back-end stall (Precise event). Unit: cpu_core]
  frontend_retired.latency_ge_16
       [Retired instructions that are fetched after an interval where the
        front-end delivered no uops for a period of 16 cycles which was not
        interrupted by a back-end stall (Precise event). Unit: cpu_core]
  frontend_retired.latency_ge_2
       [Retired instructions after front-end starvation of at least 2 cycles
        (Precise event). Unit: cpu_core]
  frontend_retired.latency_ge_256
       [Retired instructions that are fetched after an interval where the
        front-end delivered no uops for a period of 256 cycles which was not
        interrupted by a back-end stall (Precise event). Unit: cpu_core]
  frontend_retired.latency_ge_2_bubbles_ge_1
       [Retired instructions that are fetched after an interval where the
        front-end had at least 1 bubble-slot for a period of 2 cycles which was
        not interrupted by a back-end stall (Precise event). Unit: cpu_core]
  frontend_retired.latency_ge_32
       [Retired instructions that are fetched after an interval where the
        front-end delivered no uops for a period of 32 cycles which was not
        interrupted by a back-end stall (Precise event). Unit: cpu_core]
  frontend_retired.latency_ge_4
       [Retired instructions that are fetched after an interval where the
        front-end delivered no uops for a period of 4 cycles which was not
        interrupted by a back-end stall (Precise event). Unit: cpu_core]
  frontend_retired.latency_ge_512
       [Retired instructions that are fetched after an interval where the
        front-end delivered no uops for a period of 512 cycles which was not
        interrupted by a back-end stall (Precise event). Unit: cpu_core]
  frontend_retired.latency_ge_64
       [Retired instructions that are fetched after an interval where the
        front-end delivered no uops for a period of 64 cycles which was not
        interrupted by a back-end stall (Precise event). Unit: cpu_core]
  frontend_retired.latency_ge_8
       [Retired instructions that are fetched after an interval where the
        front-end delivered no uops for a period of 8 cycles which was not
        interrupted by a back-end stall (Precise event). Unit: cpu_core]
  frontend_retired.ms_flows
       [FRONTEND_RETIRED.MS_FLOWS (Precise event). Unit: cpu_core]
  frontend_retired.stlb_miss
       [Retired Instructions who experienced STLB (2nd level TLB) true miss
        (Precise event). Unit: cpu_core]
  frontend_retired.unknown_branch
       [FRONTEND_RETIRED.UNKNOWN_BRANCH (Precise event). Unit: cpu_core]
  icache_data.stall_periods
       [ICACHE_DATA.STALL_PERIODS. Unit: cpu_core]
  icache_data.stalls
       [Cycles where a code fetch is stalled due to L1 instruction cache miss.
        Unit: cpu_core]
  icache_tag.stalls
       [Cycles where a code fetch is stalled due to L1 instruction cache tag
        miss. Unit: cpu_core]
  idq.dsb_cycles_any
       [Cycles Decode Stream Buffer (DSB) is delivering any Uop. Unit: cpu_core]
  idq.dsb_cycles_ok
       [Cycles DSB is delivering optimal number of Uops. Unit: cpu_core]
  idq.dsb_uops
       [Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream
        Buffer (DSB) path. Unit: cpu_core]
  idq.mite_cycles_any
       [Cycles MITE is delivering any Uop. Unit: cpu_core]
  idq.mite_cycles_ok
       [Cycles MITE is delivering optimal number of Uops. Unit: cpu_core]
  idq.mite_uops
       [Uops delivered to Instruction Decode Queue (IDQ) from MITE path. Unit:
        cpu_core]
  idq.ms_cycles_any
       [Cycles when uops are being delivered to IDQ while MS is busy. Unit:
        cpu_core]
  idq.ms_switches
       [Number of switches from DSB or MITE to the MS. Unit: cpu_core]
  idq.ms_uops
       [Uops delivered to IDQ while MS is busy. Unit: cpu_core]
  idq_bubbles.core
       [Uops not delivered by IDQ when backend of the machine is not stalled
        [This event is alias to IDQ_UOPS_NOT_DELIVERED.CORE]. Unit: cpu_core]
  idq_bubbles.cycles_0_uops_deliv.core
       [Cycles when no uops are not delivered by the IDQ when backend of the
        machine is not stalled [This event is alias to
        IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE]. Unit: cpu_core]
  idq_bubbles.cycles_fe_was_ok
       [Cycles when optimal number of uops was delivered to the back-end when
        the back-end is not stalled [This event is alias to
        IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK]. Unit: cpu_core]
  idq_uops_not_delivered.core
       [Uops not delivered by IDQ when backend of the machine is not stalled
        [This event is alias to IDQ_BUBBLES.CORE]. Unit: cpu_core]
  idq_uops_not_delivered.cycles_0_uops_deliv.core
       [Cycles when no uops are not delivered by the IDQ when backend of the
        machine is not stalled [This event is alias to
        IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE]. Unit: cpu_core]
  idq_uops_not_delivered.cycles_fe_was_ok
       [Cycles when optimal number of uops was delivered to the back-end when
        the back-end is not stalled [This event is alias to
        IDQ_BUBBLES.CYCLES_FE_WAS_OK]. Unit: cpu_core]

memory:
  ld_head.any_at_ret
       [Counts the number of cycles that the head (oldest load) of the load
        buffer is stalled due to any number of reasons,including an L1 miss,WCB
        full,pagewalk,store address block or store data block,on a load that
        retires. Unit: cpu_atom]
  ld_head.l1_bound_at_ret
       [Counts the number of cycles that the head (oldest load) of the load
        buffer is stalled due to a core bound stall including a store address
        match,a DTLB miss or a page walk that detains the load from retiring.
        Unit: cpu_atom]
  ld_head.l1_miss_at_ret
       [Counts the number of cycles that the head (oldest load) of the load
        buffer and retirement are both stalled due to a DL1 miss. Unit: cpu_atom]
  ld_head.other_at_ret
       [Counts the number of cycles that the head (oldest load) of the load
        buffer and retirement are both stalled due to other block cases. Unit:
        cpu_atom]
  ld_head.pgwalk_at_ret
       [Counts the number of cycles that the head (oldest load) of the load
        buffer and retirement are both stalled due to a pagewalk. Unit: cpu_atom]
  ld_head.st_addr_at_ret
       [Counts the number of cycles that the head (oldest load) of the load
        buffer and retirement are both stalled due to a store address match.
        Unit: cpu_atom]
  machine_clears.memory_ordering
       [Counts the number of machine clears due to memory ordering caused by a
        snoop from an external agent. Does not count internally generated
        machine clears such as those due to memory disambiguation. Unit:
        cpu_atom]
  ocr.demand_data_rd.l3_miss
       [Counts demand data reads that were not supplied by the L3 cache. Unit:
        cpu_atom]
  ocr.demand_data_rd.l3_miss_local
       [Counts demand data reads that were not supplied by the L3 cache.
        [L3_MISS_LOCAL is alias to L3_MISS]. Unit: cpu_atom]
  ocr.demand_rfo.l3_miss
       [Counts demand reads for ownership (RFO) and software prefetches for
        exclusive ownership (PREFETCHW) that were not supplied by the L3 cache.
        Unit: cpu_atom]
  ocr.demand_rfo.l3_miss_local
       [Counts demand reads for ownership (RFO) and software prefetches for
        exclusive ownership (PREFETCHW) that were not supplied by the L3 cache.
        [L3_MISS_LOCAL is alias to L3_MISS]. Unit: cpu_atom]
  cycle_activity.stalls_l3_miss
       [Execution stalls while L3 cache miss demand load is outstanding. Unit:
        cpu_core]
  machine_clears.memory_ordering
       [Number of machine clears due to memory ordering conflicts. Unit:
        cpu_core]
  mem_trans_retired.load_latency_gt_1024
       [Counts randomly selected loads when the latency from first dispatch to
        completion is greater than 1024 cycles Supports address when precise
        (Must be precise). Unit: cpu_core]
  mem_trans_retired.load_latency_gt_128
       [Counts randomly selected loads when the latency from first dispatch to
        completion is greater than 128 cycles Supports address when precise
        (Must be precise). Unit: cpu_core]
  mem_trans_retired.load_latency_gt_16
       [Counts randomly selected loads when the latency from first dispatch to
        completion is greater than 16 cycles Supports address when precise (Must
        be precise). Unit: cpu_core]
  mem_trans_retired.load_latency_gt_256
       [Counts randomly selected loads when the latency from first dispatch to
        completion is greater than 256 cycles Supports address when precise
        (Must be precise). Unit: cpu_core]
  mem_trans_retired.load_latency_gt_32
       [Counts randomly selected loads when the latency from first dispatch to
        completion is greater than 32 cycles Supports address when precise (Must
        be precise). Unit: cpu_core]
  mem_trans_retired.load_latency_gt_4
       [Counts randomly selected loads when the latency from first dispatch to
        completion is greater than 4 cycles Supports address when precise (Must
        be precise). Unit: cpu_core]
  mem_trans_retired.load_latency_gt_512
       [Counts randomly selected loads when the latency from first dispatch to
        completion is greater than 512 cycles Supports address when precise
        (Must be precise). Unit: cpu_core]
  mem_trans_retired.load_latency_gt_64
       [Counts randomly selected loads when the latency from first dispatch to
        completion is greater than 64 cycles Supports address when precise (Must
        be precise). Unit: cpu_core]
  mem_trans_retired.load_latency_gt_8
       [Counts randomly selected loads when the latency from first dispatch to
        completion is greater than 8 cycles Supports address when precise (Must
        be precise). Unit: cpu_core]
  mem_trans_retired.store_sample
       [Retired memory store access operations. A PDist event for PEBS Store
        Latency Facility Supports address when precise (Must be precise). Unit:
        cpu_core]
  memory_activity.cycles_l1d_miss
       [Cycles while L1 cache miss demand load is outstanding. Unit: cpu_core]
  memory_activity.stalls_l1d_miss
       [Execution stalls while L1 cache miss demand load is outstanding. Unit:
        cpu_core]
  memory_activity.stalls_l2_miss
       [Execution stalls while L2 cache miss demand cacheable load request is
        outstanding. Unit: cpu_core]
  memory_activity.stalls_l3_miss
       [Execution stalls while L3 cache miss demand cacheable load request is
        outstanding. Unit: cpu_core]
  ocr.demand_data_rd.l3_miss
       [Counts demand data reads that were not supplied by the L3 cache. Unit:
        cpu_core]
  ocr.demand_rfo.l3_miss
       [Counts demand read for ownership (RFO) requests and software prefetches
        for exclusive ownership (PREFETCHW) that were not supplied by the L3
        cache. Unit: cpu_core]
  offcore_requests.l3_miss_demand_data_rd
       [Counts demand data read requests that miss the L3 cache. Unit: cpu_core]
  offcore_requests_outstanding.l3_miss_demand_data_rd
       [For every cycle,increments by the number of demand data read requests
        pending that are known to have missed the L3 cache. Unit: cpu_core]

other:
  ocr.corewb_m.any_response
       [Counts modified writebacks from L1 cache and L2 cache that have any type
        of response. Unit: cpu_atom]
  ocr.demand_data_rd.any_response
       [Counts demand data reads that have any type of response. Unit: cpu_atom]
  ocr.demand_rfo.any_response
       [Counts demand reads for ownership (RFO) and software prefetches for
        exclusive ownership (PREFETCHW) that have any type of response. Unit:
        cpu_atom]
  ocr.streaming_wr.any_response
       [Counts streaming stores that have any type of response. Unit: cpu_atom]
  serialization.c01_ms_scb
       [Counts the number of issue slots in a UMWAIT or TPAUSE instruction where
        no uop issues due to the instruction putting the CPU into the C0.1
        activity state. For Tremont,UMWAIT and TPAUSE will only put the CPU into
        C0.1 activity state (not C0.2 activity state). Unit: cpu_atom]
  assists.hardware
       [ASSISTS.HARDWARE. Unit: cpu_core]
  assists.page_fault
       [ASSISTS.PAGE_FAULT. Unit: cpu_core]
  core_power.license_1
       [CORE_POWER.LICENSE_1. Unit: cpu_core]
  core_power.license_2
       [CORE_POWER.LICENSE_2. Unit: cpu_core]
  core_power.license_3
       [CORE_POWER.LICENSE_3. Unit: cpu_core]
  ocr.demand_data_rd.any_response
       [Counts demand data reads that have any type of response. Unit: cpu_core]
  ocr.demand_data_rd.dram
       [Counts demand data reads that were supplied by DRAM. Unit: cpu_core]
  ocr.demand_rfo.any_response
       [Counts demand read for ownership (RFO) requests and software prefetches
        for exclusive ownership (PREFETCHW) that have any type of response.
        Unit: cpu_core]
  ocr.streaming_wr.any_response
       [Counts streaming stores that have any type of response. Unit: cpu_core]
  rs.empty
       [Cycles when Reservation Station (RS) is empty for the thread. Unit:
        cpu_core]
  rs.empty_count
       [Counts end of periods where the Reservation Station (RS) was empty.
        Unit: cpu_core]
  rs.empty_resource
       [Cycles when Reservation Station (RS) is empty due to a resource in the
        back-end. Unit: cpu_core]
  xq.full_cycles
       [Cycles the uncore cannot take further requests. Unit: cpu_core]

pipeline:
  br_inst_retired.all_branches
       [Counts the total number of branch instructions retired for all branch
        types (Precise event). Unit: cpu_atom]
  br_inst_retired.cond
       [Counts the number of retired JCC (Jump on Conditional Code) branch
        instructions retired,includes both taken and not taken branches (Precise
        event). Unit: cpu_atom]
  br_inst_retired.cond_taken
       [Counts the number of taken JCC (Jump on Conditional Code) branch
        instructions retired (Precise event). Unit: cpu_atom]
  br_inst_retired.far_branch
       [Counts the number of far branch instructions retired,includes far jump,
        far call and return,and interrupt call and return (Precise event). Unit:
        cpu_atom]
  br_inst_retired.indirect
       [Counts the number of near indirect JMP and near indirect CALL branch
        instructions retired (Precise event). Unit: cpu_atom]
  br_inst_retired.indirect_call
       [Counts the number of near indirect CALL branch instructions retired
        (Precise event). Unit: cpu_atom]
  br_inst_retired.near_call
       [Counts the number of near CALL branch instructions retired (Precise
        event). Unit: cpu_atom]
  br_inst_retired.near_return
       [Counts the number of near RET branch instructions retired (Precise
        event). Unit: cpu_atom]
  br_inst_retired.near_taken
       [Counts the number of near taken branch instructions retired (Precise
        event). Unit: cpu_atom]
  br_inst_retired.rel_call
       [Counts the number of near relative CALL branch instructions retired
        (Precise event). Unit: cpu_atom]
  br_misp_retired.all_branches
       [Counts the total number of mispredicted branch instructions retired for
        all branch types (Precise event). Unit: cpu_atom]
  br_misp_retired.cond
       [Counts the number of mispredicted JCC (Jump on Conditional Code) branch
        instructions retired (Precise event). Unit: cpu_atom]
  br_misp_retired.cond_taken
       [Counts the number of mispredicted taken JCC (Jump on Conditional Code)
        branch instructions retired (Precise event). Unit: cpu_atom]
  br_misp_retired.indirect
       [Counts the number of mispredicted near indirect JMP and near indirect
        CALL branch instructions retired (Precise event). Unit: cpu_atom]
  br_misp_retired.indirect_call
       [Counts the number of mispredicted near indirect CALL branch instructions
        retired (Precise event). Unit: cpu_atom]
  br_misp_retired.near_taken
       [Counts the number of mispredicted near taken branch instructions retired
        (Precise event). Unit: cpu_atom]
  br_misp_retired.return
       [Counts the number of mispredicted near RET branch instructions retired
        (Precise event). Unit: cpu_atom]
  cpu_clk_unhalted.core
       [Counts the number of unhalted core clock cycles. (Fixed event). Unit:
        cpu_atom]
  cpu_clk_unhalted.core_p
       [Counts the number of unhalted core clock cycles. Unit: cpu_atom]
  cpu_clk_unhalted.ref_tsc
       [Counts the number of unhalted reference clock cycles at TSC frequency.
        (Fixed event). Unit: cpu_atom]
  cpu_clk_unhalted.ref_tsc_p
       [Counts the number of unhalted reference clock cycles at TSC frequency.
        Unit: cpu_atom]
  cpu_clk_unhalted.thread
       [Counts the number of unhalted core clock cycles. (Fixed event). Unit:
        cpu_atom]
  cpu_clk_unhalted.thread_p
       [Counts the number of unhalted core clock cycles. Unit: cpu_atom]
  inst_retired.any
       [Counts the total number of instructions retired. (Fixed event) (Precise
        event). Unit: cpu_atom]
  inst_retired.any_p
       [Counts the total number of instructions retired (Precise event). Unit:
        cpu_atom]
  ld_blocks.address_alias
       [Counts the number of retired loads that are blocked because it initially
        appears to be store forward blocked,but subsequently is shown not to be
        blocked based on 4K alias check (Precise event). Unit: cpu_atom]
  ld_blocks.data_unknown
       [Counts the number of retired loads that are blocked because its address
        exactly matches an older store whose data is not ready (Precise event).
        Unit: cpu_atom]
  machine_clears.disambiguation
       [Counts the number of machine clears due to memory ordering in which an
        internal load passes an older store within the same CPU. Unit: cpu_atom]
  machine_clears.mrn_nuke
       [Counts the number of machines clears due to memory renaming. Unit:
        cpu_atom]
  machine_clears.page_fault
       [Counts the number of machine clears due to a page fault. Counts both
        I-Side and D-Side (Loads/Stores) page faults. A page fault occurs when
        either the page is not present,or an access violation occurs. Unit:
        cpu_atom]
  machine_clears.slow
       [Counts the number of machine clears that flush the pipeline and restart
        the machine with the use of microcode due to SMC,MEMORY_ORDERING,
        FP_ASSISTS,PAGE_FAULT,DISAMBIGUATION,and FPC_VIRTUAL_TRAP. Unit:
        cpu_atom]
  machine_clears.smc
       [Counts the number of machine clears due to program modifying data (self
        modifying code) within 1K of a recently fetched code page. Unit:
        cpu_atom]
  misc_retired.lbr_inserts
       [Counts the number of LBR entries recorded. Requires LBRs to be enabled
        in IA32_LBR_CTL. [This event is alias to LBR_INSERTS.ANY] (Precise
        event). Unit: cpu_atom]
  serialization.non_c01_ms_scb
       [Counts the number of issue slots not consumed by the backend due to a
        micro-sequencer (MS) scoreboard,which stalls the front-end from issuing
        from the UROM until a specified older uop retires. Unit: cpu_atom]
  topdown_bad_speculation.all
       [Counts the total number of issue slots that were not consumed by the
        backend because allocation is stalled due to a mispredicted jump or a
        machine clear. Unit: cpu_atom]
  topdown_bad_speculation.fastnuke
       [Counts the number of issue slots every cycle that were not consumed by
        the backend due to fast nukes such as memory ordering and memory
        disambiguation machine clears. Unit: cpu_atom]
  topdown_bad_speculation.machine_clears
       [Counts the total number of issue slots that were not consumed by the
        backend because allocation is stalled due to a machine clear (nuke) of
        any kind including memory ordering and memory disambiguation. Unit:
        cpu_atom]
  topdown_bad_speculation.mispredict
       [Counts the number of issue slots every cycle that were not consumed by
        the backend due to branch mispredicts. Unit: cpu_atom]
  topdown_bad_speculation.nuke
       [Counts the number of issue slots every cycle that were not consumed by
        the backend due to a machine clear (nuke). Unit: cpu_atom]
  topdown_be_bound.all
       [Counts the total number of issue slots every cycle that were not
        consumed by the backend due to backend stalls. Unit: cpu_atom]
  topdown_be_bound.alloc_restrictions
       [Counts the number of issue slots every cycle that were not consumed by
        the backend due to certain allocation restrictions. Unit: cpu_atom]
  topdown_be_bound.mem_scheduler
       [Counts the number of issue slots every cycle that were not consumed by
        the backend due to memory reservation stalls in which a scheduler is not
        able to accept uops. Unit: cpu_atom]
  topdown_be_bound.non_mem_scheduler
       [Counts the number of issue slots every cycle that were not consumed by
        the backend due to IEC or FPC RAT stalls,which can be due to FIQ or IEC
        reservation stalls in which the integer,floating point or SIMD scheduler
        is not able to accept uops. Unit: cpu_atom]
  topdown_be_bound.register
       [Counts the number of issue slots every cycle that were not consumed by
        the backend due to the physical register file unable to accept an entry
        (marble stalls). Unit: cpu_atom]
  topdown_be_bound.reorder_buffer
       [Counts the number of issue slots every cycle that were not consumed by
        the backend due to the reorder buffer being full (ROB stalls). Unit:
        cpu_atom]
  topdown_be_bound.serialization
       [Counts the number of issue slots every cycle that were not consumed by
        the backend due to scoreboards from the instruction queue (IQ),jump
        execution unit (JEU),or microcode sequencer (MS). Unit: cpu_atom]
  topdown_fe_bound.all
       [Counts the total number of issue slots every cycle that were not
        consumed by the backend due to frontend stalls. Unit: cpu_atom]
  topdown_fe_bound.branch_detect
       [Counts the number of issue slots every cycle that were not delivered by
        the frontend due to BACLEARS. Unit: cpu_atom]
  topdown_fe_bound.branch_resteer
       [Counts the number of issue slots every cycle that were not delivered by
        the frontend due to BTCLEARS. Unit: cpu_atom]
  topdown_fe_bound.cisc
       [Counts the number of issue slots every cycle that were not delivered by
        the frontend due to the microcode sequencer (MS). Unit: cpu_atom]
  topdown_fe_bound.decode
       [Counts the number of issue slots every cycle that were not delivered by
        the frontend due to decode stalls. Unit: cpu_atom]
  topdown_fe_bound.frontend_bandwidth
       [Counts the number of issue slots every cycle that were not delivered by
        the frontend due to frontend bandwidth restrictions due to decode,
        predecode,cisc,and other limitations. Unit: cpu_atom]
  topdown_fe_bound.frontend_latency
       [Counts the number of issue slots every cycle that were not delivered by
        the frontend due to a latency related stalls including BACLEARs,BTCLEARs,
        ITLB misses,and ICache misses. Unit: cpu_atom]
  topdown_fe_bound.itlb
       [Counts the number of issue slots every cycle that were not delivered by
        the frontend due to ITLB misses. Unit: cpu_atom]
  topdown_fe_bound.other
       [Counts the number of issue slots every cycle that were not delivered by
        the frontend due to other common frontend stalls not categorized. Unit:
        cpu_atom]
  topdown_fe_bound.predecode
       [Counts the number of issue slots every cycle that were not delivered by
        the frontend due to wrong predecodes. Unit: cpu_atom]
  topdown_retiring.all
       [Counts the total number of consumed retirement slots (Precise event).
        Unit: cpu_atom]
  uops_issued.any
       [Counts the number of uops issued by the front end every cycle. Unit:
        cpu_atom]
  uops_retired.all
       [Counts the total number of uops retired (Precise event). Unit: cpu_atom]
  uops_retired.idiv
       [Counts the number of integer divide uops retired (Precise event). Unit:
        cpu_atom]
  uops_retired.ms
       [Counts the number of uops that are from complex flows issued by the
        micro-sequencer (MS) (Precise event). Unit: cpu_atom]
  uops_retired.x87
       [Counts the number of x87 uops retired,includes those in MS flows
        (Precise event). Unit: cpu_atom]
  arith.div_active
       [Cycles when divide unit is busy executing divide or square root
        operations. Unit: cpu_core]
  arith.idiv_active
       [This event counts the cycles the integer divider is busy. Unit: cpu_core]
  assists.any
       [Number of occurrences where a microcode assist is invoked by hardware.
        Unit: cpu_core]
  br_inst_retired.all_branches
       [All branch instructions retired (Precise event). Unit: cpu_core]
  br_inst_retired.cond
       [Conditional branch instructions retired (Precise event). Unit: cpu_core]
  br_inst_retired.cond_ntaken
       [Not taken branch instructions retired (Precise event). Unit: cpu_core]
  br_inst_retired.cond_taken
       [Taken conditional branch instructions retired (Precise event). Unit:
        cpu_core]
  br_inst_retired.far_branch
       [Far branch instructions retired (Precise event). Unit: cpu_core]
  br_inst_retired.indirect
       [Indirect near branch instructions retired (excluding returns) (Precise
        event). Unit: cpu_core]
  br_inst_retired.near_call
       [Direct and indirect near call instructions retired (Precise event).
        Unit: cpu_core]
  br_inst_retired.near_return
       [Return instructions retired (Precise event). Unit: cpu_core]
  br_inst_retired.near_taken
       [Taken branch instructions retired (Precise event). Unit: cpu_core]
  br_misp_retired.all_branches
       [All mispredicted branch instructions retired (Precise event). Unit:
        cpu_core]
  br_misp_retired.cond
       [Mispredicted conditional branch instructions retired (Precise event).
        Unit: cpu_core]
  br_misp_retired.cond_ntaken
       [Mispredicted non-taken conditional branch instructions retired (Precise
        event). Unit: cpu_core]
  br_misp_retired.cond_taken
       [number of branch instructions retired that were mispredicted and taken
        (Precise event). Unit: cpu_core]
  br_misp_retired.indirect
       [Miss-predicted near indirect branch instructions retired (excluding
        returns) (Precise event). Unit: cpu_core]
  br_misp_retired.indirect_call
       [Mispredicted indirect CALL retired (Precise event). Unit: cpu_core]
  br_misp_retired.near_taken
       [Number of near branch instructions retired that were mispredicted and
        taken (Precise event). Unit: cpu_core]
  br_misp_retired.ret
       [This event counts the number of mispredicted ret instructions retired.
        Non PEBS (Precise event). Unit: cpu_core]
  cpu_clk_unhalted.c01
       [Core clocks when the thread is in the C0.1 light-weight slower wakeup
        time but more power saving optimized state. Unit: cpu_core]
  cpu_clk_unhalted.c02
       [Core clocks when the thread is in the C0.2 light-weight faster wakeup
        time but less power saving optimized state. Unit: cpu_core]
  cpu_clk_unhalted.c0_wait
       [Core clocks when the thread is in the C0.1 or C0.2 or running a PAUSE in
        C0 ACPI state. Unit: cpu_core]
  cpu_clk_unhalted.distributed
       [Cycle counts are evenly distributed between active threads in the Core.
        Unit: cpu_core]
  cpu_clk_unhalted.one_thread_active
       [Core crystal clock cycles when this thread is unhalted and the other
        thread is halted. Unit: cpu_core]
  cpu_clk_unhalted.pause
       [CPU_CLK_UNHALTED.PAUSE. Unit: cpu_core]
  cpu_clk_unhalted.pause_inst
       [CPU_CLK_UNHALTED.PAUSE_INST. Unit: cpu_core]
  cpu_clk_unhalted.ref_distributed
       [Core crystal clock cycles. Cycle counts are evenly distributed between
        active threads in the Core. Unit: cpu_core]
  cpu_clk_unhalted.ref_tsc
       [Reference cycles when the core is not in halt state. Unit: cpu_core]
  cpu_clk_unhalted.ref_tsc_p
       [Reference cycles when the core is not in halt state. Unit: cpu_core]
  cpu_clk_unhalted.thread
       [Core cycles when the thread is not in halt state. Unit: cpu_core]
  cpu_clk_unhalted.thread_p
       [Thread cycles when thread is not in halt state. Unit: cpu_core]
  cycle_activity.cycles_l1d_miss
       [Cycles while L1 cache miss demand load is outstanding. Unit: cpu_core]
  cycle_activity.cycles_l2_miss
       [Cycles while L2 cache miss demand load is outstanding. Unit: cpu_core]
  cycle_activity.cycles_mem_any
       [Cycles while memory subsystem has an outstanding load. Unit: cpu_core]
  cycle_activity.stalls_l1d_miss
       [Execution stalls while L1 cache miss demand load is outstanding. Unit:
        cpu_core]
  cycle_activity.stalls_l2_miss
       [Execution stalls while L2 cache miss demand load is outstanding. Unit:
        cpu_core]
  cycle_activity.stalls_total
       [Total execution stalls. Unit: cpu_core]
  exe_activity.1_ports_util
       [Cycles total of 1 uop is executed on all ports and Reservation Station
        was not empty. Unit: cpu_core]
  exe_activity.2_3_ports_util
       [Cycles total of 2 or 3 uops are executed on all ports and Reservation
        Station (RS) was not empty. Unit: cpu_core]
  exe_activity.2_ports_util
       [Cycles total of 2 uops are executed on all ports and Reservation Station
        was not empty. Unit: cpu_core]
  exe_activity.3_ports_util
       [Cycles total of 3 uops are executed on all ports and Reservation Station
        was not empty. Unit: cpu_core]
  exe_activity.4_ports_util
       [Cycles total of 4 uops are executed on all ports and Reservation Station
        was not empty. Unit: cpu_core]
  exe_activity.bound_on_loads
       [Execution stalls while memory subsystem has an outstanding load. Unit:
        cpu_core]
  exe_activity.bound_on_stores
       [Cycles where the Store Buffer was full and no loads caused an execution
        stall. Unit: cpu_core]
  exe_activity.exe_bound_0_ports
       [Cycles no uop executed while RS was not empty,the SB was not full and
        there was no outstanding load. Unit: cpu_core]
  inst_decoded.decoders
       [Instruction decoders utilized in a cycle. Unit: cpu_core]
  inst_retired.any
       [Number of instructions retired. Fixed Counter - architectural event
        (Precise event). Unit: cpu_core]
  inst_retired.any_p
       [Number of instructions retired. General Counter - architectural event
        (Precise event). Unit: cpu_core]
  inst_retired.macro_fused
       [INST_RETIRED.MACRO_FUSED (Precise event). Unit: cpu_core]
  inst_retired.nop
       [Retired NOP instructions (Precise event). Unit: cpu_core]
  inst_retired.prec_dist
       [Precise instruction retired with PEBS precise-distribution (Precise
        event). Unit: cpu_core]
  inst_retired.rep_iteration
       [Iterations of Repeat string retired instructions (Precise event). Unit:
        cpu_core]
  int_misc.clear_resteer_cycles
       [Counts cycles after recovery from a branch misprediction or machine
        clear till the first uop is issued from the resteered path. Unit:
        cpu_core]
  int_misc.clears_count
       [Clears speculative count. Unit: cpu_core]
  int_misc.recovery_cycles
       [Core cycles the allocator was stalled due to recovery from earlier clear
        event for this thread. Unit: cpu_core]
  int_misc.unknown_branch_cycles
       [Bubble cycles of BAClear (Unknown Branch). Unit: cpu_core]
  int_misc.uop_dropping
       [TMA slots where uops got dropped. Unit: cpu_core]
  int_vec_retired.128bit
       [INT_VEC_RETIRED.128BIT. Unit: cpu_core]
  int_vec_retired.256bit
       [INT_VEC_RETIRED.256BIT. Unit: cpu_core]
  int_vec_retired.add_128
       [integer ADD,SUB,SAD 128-bit vector instructions. Unit: cpu_core]
  int_vec_retired.add_256
       [integer ADD,SUB,SAD 256-bit vector instructions. Unit: cpu_core]
  int_vec_retired.mul_256
       [INT_VEC_RETIRED.MUL_256. Unit: cpu_core]
  int_vec_retired.shuffles
       [INT_VEC_RETIRED.SHUFFLES. Unit: cpu_core]
  int_vec_retired.vnni_128
       [INT_VEC_RETIRED.VNNI_128. Unit: cpu_core]
  int_vec_retired.vnni_256
       [INT_VEC_RETIRED.VNNI_256. Unit: cpu_core]
  ld_blocks.address_alias
       [False dependencies in MOB due to partial compare on address. Unit:
        cpu_core]
  ld_blocks.no_sr
       [The number of times that split load operations are temporarily blocked
        because all resources for handling the split accesses are in use. Unit:
        cpu_core]
  ld_blocks.store_forward
       [Loads blocked due to overlapping with a preceding store that cannot be
        forwarded. Unit: cpu_core]
  load_hit_prefetch.swpf
       [Counts the number of demand load dispatches that hit L1D fill buffer
        (FB) allocated for software prefetch. Unit: cpu_core]
  lsd.cycles_active
       [Cycles Uops delivered by the LSD,but didn't come from the decoder. Unit:
        cpu_core]
  lsd.cycles_ok
       [Cycles optimal number of Uops delivered by the LSD,but did not come from
        the decoder. Unit: cpu_core]
  lsd.uops
       [Number of Uops delivered by the LSD. Unit: cpu_core]
  machine_clears.count
       [Number of machine clears (nukes) of any type. Unit: cpu_core]
  machine_clears.smc
       [Self-modifying code (SMC) detected. Unit: cpu_core]
  misc2_retired.lfence
       [LFENCE instructions retired. Unit: cpu_core]
  misc_retired.lbr_inserts
       [Increments whenever there is an update to the LBR array. Unit: cpu_core]
  resource_stalls.sb
       [Cycles stalled due to no store buffers available. (not including
        draining form sync). Unit: cpu_core]
  resource_stalls.scoreboard
       [Counts cycles where the pipeline is stalled due to serializing
        operations. Unit: cpu_core]
  topdown.backend_bound_slots
       [TMA slots where no uops were being issued due to lack of back-end
        resources. Unit: cpu_core]
  topdown.bad_spec_slots
       [TMA slots wasted due to incorrect speculations. Unit: cpu_core]
  topdown.br_mispredict_slots
       [TMA slots wasted due to incorrect speculation by branch mispredictions.
        Unit: cpu_core]
  topdown.memory_bound_slots
       [TOPDOWN.MEMORY_BOUND_SLOTS. Unit: cpu_core]
  topdown.slots
       [TMA slots available for an unhalted logical processor. Fixed counter -
        architectural event. Unit: cpu_core]
  topdown.slots_p
       [TMA slots available for an unhalted logical processor. General counter -
        architectural event. Unit: cpu_core]
  uops_decoded.dec0_uops
       [UOPS_DECODED.DEC0_UOPS. Unit: cpu_core]
  uops_dispatched.port_0
       [Uops executed on port 0. Unit: cpu_core]
  uops_dispatched.port_1
       [Uops executed on port 1. Unit: cpu_core]
  uops_dispatched.port_2_3_10
       [Uops executed on ports 2,3 and 10. Unit: cpu_core]
  uops_dispatched.port_4_9
       [Uops executed on ports 4 and 9. Unit: cpu_core]
  uops_dispatched.port_5_11
       [Uops executed on ports 5 and 11. Unit: cpu_core]
  uops_dispatched.port_6
       [Uops executed on port 6. Unit: cpu_core]
  uops_dispatched.port_7_8
       [Uops executed on ports 7 and 8. Unit: cpu_core]
  uops_executed.core_cycles_ge_1
       [Cycles at least 1 micro-op is executed from any thread on physical core.
        Unit: cpu_core]
  uops_executed.core_cycles_ge_2
       [Cycles at least 2 micro-op is executed from any thread on physical core.
        Unit: cpu_core]
  uops_executed.core_cycles_ge_3
       [Cycles at least 3 micro-op is executed from any thread on physical core.
        Unit: cpu_core]
  uops_executed.core_cycles_ge_4
       [Cycles at least 4 micro-op is executed from any thread on physical core.
        Unit: cpu_core]
  uops_executed.cycles_ge_1
       [Cycles where at least 1 uop was executed per-thread. Unit: cpu_core]
  uops_executed.cycles_ge_2
       [Cycles where at least 2 uops were executed per-thread. Unit: cpu_core]
  uops_executed.cycles_ge_3
       [Cycles where at least 3 uops were executed per-thread. Unit: cpu_core]
  uops_executed.cycles_ge_4
       [Cycles where at least 4 uops were executed per-thread. Unit: cpu_core]
  uops_executed.stalls
       [Counts number of cycles no uops were dispatched to be executed on this
        thread. Unit: cpu_core]
  uops_executed.thread
       [Counts the number of uops to be executed per-thread each cycle. Unit:
        cpu_core]
  uops_executed.x87
       [Counts the number of x87 uops dispatched. Unit: cpu_core]
  uops_issued.any
       [Uops that RAT issues to RS. Unit: cpu_core]
  uops_issued.cycles
       [UOPS_ISSUED.CYCLES. Unit: cpu_core]
  uops_retired.cycles
       [Cycles with retired uop(s). Unit: cpu_core]
  uops_retired.heavy
       [Retired uops except the last uop of each instruction. Unit: cpu_core]
  uops_retired.ms
       [UOPS_RETIRED.MS. Unit: cpu_core]
  uops_retired.slots
       [Retirement slots used. Unit: cpu_core]
  uops_retired.stalls
       [Cycles without actually retired uops. Unit: cpu_core]

uncore interconnect:
  unc_arb_coh_trk_requests.all
       [Number of requests allocated in Coherency Tracker. Unit: uncore_arb]
  unc_arb_dat_occupancy.all
       [Each cycle counts number of any coherent request at memory controller
        that were issued by any core. Unit: uncore_arb]
  unc_arb_dat_occupancy.rd
       [Each cycle counts number of coherent reads pending on data return from
        memory controller that were issued by any core. Unit: uncore_arb]
  unc_arb_req_trk_occupancy.drd
       [Each cycle count number of 'valid' coherent Data Read entries . Such
        entry is defined as valid when it is allocated till deallocation.
        Doesn't include prefetches [This event is alias to
        UNC_ARB_TRK_OCCUPANCY.RD]. Unit: uncore_arb]
  unc_arb_req_trk_request.drd
       [Number of all coherent Data Read entries. Doesn't include prefetches
        [This event is alias to UNC_ARB_TRK_REQUESTS.RD]. Unit: uncore_arb]
  unc_arb_trk_occupancy.all
       [Each cycle counts number of all outgoing valid entries in ReqTrk. Such
        entry is defined as valid from its allocation in ReqTrk till
        deallocation. Accounts for Coherent and non-coherent traffic. Unit:
        uncore_arb]
  unc_arb_trk_occupancy.rd
       [Each cycle count number of 'valid' coherent Data Read entries . Such
        entry is defined as valid when it is allocated till deallocation.
        Doesn't include prefetches [This event is alias to
        UNC_ARB_REQ_TRK_OCCUPANCY.DRD]. Unit: uncore_arb]
  unc_arb_trk_requests.all
       [Counts the number of coherent and in-coherent requests initiated by IA
        cores,processor graphic units,or LLC. Unit: uncore_arb]
  unc_arb_trk_requests.rd
       [Number of all coherent Data Read entries. Doesn't include prefetches
        [This event is alias to UNC_ARB_REQ_TRK_REQUEST.DRD]. Unit: uncore_arb]

uncore memory:
  unc_m_act_count_rd
       [ACT command for a read request sent to DRAM. Unit: uncore_imc]
  unc_m_act_count_total
       [ACT command sent to DRAM. Unit: uncore_imc]
  unc_m_act_count_wr
       [ACT command for a write request sent to DRAM. Unit: uncore_imc]
  unc_m_cas_count_rd
       [Read CAS command sent to DRAM. Unit: uncore_imc]
  unc_m_cas_count_wr
       [Write CAS command sent to DRAM. Unit: uncore_imc]
  unc_m_clockticks
       [Number of clocks. Unit: uncore_imc]
  unc_m_dram_page_empty_rd
       [incoming read request page status is Page Empty. Unit: uncore_imc]
  unc_m_dram_page_empty_wr
       [incoming write request page status is Page Empty. Unit: uncore_imc]
  unc_m_dram_page_hit_rd
       [incoming read request page status is Page Hit. Unit: uncore_imc]
  unc_m_dram_page_hit_wr
       [incoming write request page status is Page Hit. Unit: uncore_imc]
  unc_m_dram_page_miss_rd
       [incoming read request page status is Page Miss. Unit: uncore_imc]
  unc_m_dram_page_miss_wr
       [incoming write request page status is Page Miss. Unit: uncore_imc]
  unc_m_dram_thermal_hot
       [Any Rank at Hot state. Unit: uncore_imc]
  unc_m_dram_thermal_warm
       [Any Rank at Warm state. Unit: uncore_imc]
  unc_m_pre_count_idle
       [PRE command sent to DRAM due to page table idle timer expiration. Unit:
        uncore_imc]
  unc_m_pre_count_page_miss
       [PRE command sent to DRAM for a read/write request. Unit: uncore_imc]
  unc_m_prefetch_rd
       [Incoming read prefetch request from IA. Unit: uncore_imc]
  unc_m_vc0_requests_rd
       [Incoming VC0 read request. Unit: uncore_imc]
  unc_m_vc0_requests_wr
       [Incoming VC0 write request. Unit: uncore_imc]
  unc_m_vc1_requests_rd
       [Incoming VC1 read request. Unit: uncore_imc]
  unc_m_vc1_requests_wr
       [Incoming VC1 write request. Unit: uncore_imc]
  unc_mc0_rdcas_count_freerun
       [Counts every 64B read request entering the Memory Controller 0 to DRAM
        (sum of all channels). Unit: uncore_imc_free_running_0]
  unc_mc0_wrcas_count_freerun
       [Counts every 64B write request entering the Memory Controller 0 to DRAM
        (sum of all channels). Each write request counts as a new request
        incrementing this counter. However,same cache line write requests (both
        full and partial) are combined to a single 64 byte data transfer to
        DRAM. Unit: uncore_imc_free_running_0]

uncore other:
  unc_clock.socket
       [This 48-bit fixed counter counts the UCLK cycles. Unit: uncore_clock]

virtual memory:
  dtlb_load_misses.walk_completed
       [Counts the number of page walks completed due to load DTLB misses to any
        page size. Unit: cpu_atom]
  dtlb_store_misses.walk_completed
       [Counts the number of page walks completed due to store DTLB misses to
        any page size. Unit: cpu_atom]
  itlb_misses.miss_caused_walk
       [Counts the number of page walks initiated by a instruction fetch that
        missed the first and second level TLBs. Unit: cpu_atom]
  itlb_misses.pde_cache_miss
       [Counts the number of page walks due to an instruction fetch that miss
        the PDE (Page Directory Entry) cache. Unit: cpu_atom]
  itlb_misses.walk_completed
       [Counts the number of page walks completed due to instruction fetch
        misses to any page size. Unit: cpu_atom]
  ld_head.dtlb_miss_at_ret
       [Counts the number of cycles that the head (oldest load) of the load
        buffer and retirement are both stalled due to a DTLB miss. Unit:
        cpu_atom]
  dtlb_load_misses.stlb_hit
       [Loads that miss the DTLB and hit the STLB. Unit: cpu_core]
  dtlb_load_misses.walk_active
       [Cycles when at least one PMH is busy with a page walk for a demand load.
        Unit: cpu_core]
  dtlb_load_misses.walk_completed
       [Load miss in all TLB levels causes a page walk that completes. (All page
        sizes). Unit: cpu_core]
  dtlb_load_misses.walk_completed_1g
       [Page walks completed due to a demand data load to a 1G page. Unit:
        cpu_core]
  dtlb_load_misses.walk_completed_2m_4m
       [Page walks completed due to a demand data load to a 2M/4M page. Unit:
        cpu_core]
  dtlb_load_misses.walk_completed_4k
       [Page walks completed due to a demand data load to a 4K page. Unit:
        cpu_core]
  dtlb_load_misses.walk_pending
       [Number of page walks outstanding for a demand load in the PMH each
        cycle. Unit: cpu_core]
  dtlb_store_misses.stlb_hit
       [Stores that miss the DTLB and hit the STLB. Unit: cpu_core]
  dtlb_store_misses.walk_active
       [Cycles when at least one PMH is busy with a page walk for a store. Unit:
        cpu_core]
  dtlb_store_misses.walk_completed
       [Store misses in all TLB levels causes a page walk that completes. (All
        page sizes). Unit: cpu_core]
  dtlb_store_misses.walk_completed_1g
       [Page walks completed due to a demand data store to a 1G page. Unit:
        cpu_core]
  dtlb_store_misses.walk_completed_2m_4m
       [Page walks completed due to a demand data store to a 2M/4M page. Unit:
        cpu_core]
  dtlb_store_misses.walk_completed_4k
       [Page walks completed due to a demand data store to a 4K page. Unit:
        cpu_core]
  dtlb_store_misses.walk_pending
       [Number of page walks outstanding for a store in the PMH each cycle.
        Unit: cpu_core]
  itlb_misses.stlb_hit
       [Instruction fetch requests that miss the ITLB and hit the STLB. Unit:
        cpu_core]
  itlb_misses.walk_active
       [Cycles when at least one PMH is busy with a page walk for code
        (instruction fetch) request. Unit: cpu_core]
  itlb_misses.walk_completed
       [Code miss in all TLB levels causes a page walk that completes. (All page
        sizes). Unit: cpu_core]
  itlb_misses.walk_completed_2m_4m
       [Code miss in all TLB levels causes a page walk that completes. (2M/4M).
        Unit: cpu_core]
  itlb_misses.walk_completed_4k
       [Code miss in all TLB levels causes a page walk that completes. (4K).
        Unit: cpu_core]
  itlb_misses.walk_pending
       [Number of page walks outstanding for an outstanding code request in the
        PMH each cycle. Unit: cpu_core]
  rNNN                                               [Raw event descriptor]
  cpu_atom/event=0..255,pc,edge,.../modifier         [Raw event descriptor]
       [(see 'man perf-list' or 'man perf-record' on how to encode it)]
  cpu_core/event=0..255,pc,edge,.../modifier         [Raw event descriptor]
       [(see 'man perf-list' or 'man perf-record' on how to encode it)]
  breakpoint//modifier                               [Raw event descriptor]
  cstate_core/event=0..0xffffffffffffffff/modifier   [Raw event descriptor]
  cstate_pkg/event=0..0xffffffffffffffff/modifier    [Raw event descriptor]
  i915/i915_eventid=0..0x1fffff/modifier             [Raw event descriptor]
  intel_bts//modifier                                [Raw event descriptor]
  intel_pt/ptw,event,cyc_thresh=0..15,.../modifier   [Raw event descriptor]
  kprobe/retprobe/modifier                           [Raw event descriptor]
  msr/event=0..0xffffffffffffffff/modifier           [Raw event descriptor]
  power/event=0..255/modifier                        [Raw event descriptor]
  software//modifier                                 [Raw event descriptor]
  tracepoint//modifier                               [Raw event descriptor]
  uncore_arb/event=0..255,edge,inv,.../modifier      [Raw event descriptor]
  uncore_cbox/event=0..255,edge,threshold=0..63,.../modifier[Raw event descriptor]
  uncore_clock/event=0..255/modifier                 [Raw event descriptor]
  uncore_imc_free_running/event=0..255,umask=0..255/modifier[Raw event descriptor]
  uncore_imc/event=0..255,edge,chmask=0..15/modifier [Raw event descriptor]
  uprobe/ref_ctr_offset=0..0xffffffff,retprobe/modifier[Raw event descriptor]
  mem:<addr>[/len][:access]                          [Hardware breakpoint]

Metric Groups:

Backend: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_core_bound
       [This metric represents fraction of slots where Core non-memory issues
        were of a bottleneck]
  tma_info_core_ilp
       [Instruction-Level-Parallelism (average number of uops executed when
        there is execution) per thread (logical-processor)]
  tma_info_memory_l2mpki
       [L2 cache true misses per kilo instruction for retired demand loads]
  tma_memory_bound
       [This metric represents fraction of slots the Memory subsystem within the
        Backend was a bottleneck]

Bad:
  tma_info_bad_spec_branch_misprediction_cost
       [Branch Misprediction Cost: Fraction of TMA slots wasted per
        non-speculative branch misprediction (retired JEClear)]
  tma_info_bad_spec_ipmisp_cond_ntaken
       [Instructions per retired mispredicts for conditional non-taken branches
        (lower number means higher occurrence rate)]
  tma_info_bad_spec_ipmisp_cond_taken
       [Instructions per retired mispredicts for conditional taken branches
        (lower number means higher occurrence rate)]
  tma_info_bad_spec_ipmisp_indirect
       [Instructions per retired mispredicts for indirect CALL or JMP branches
        (lower number means higher occurrence rate)]
  tma_info_bad_spec_ipmisp_ret
       [Instructions per retired mispredicts for return branches (lower number
        means higher occurrence rate)]
  tma_info_bad_spec_ipmispredict
       [Number of Instructions per non-speculative Branch Misprediction
        (JEClear) (lower number means higher occurrence rate)]
  tma_info_bottleneck_irregular_overhead
       [Total pipeline cost of irregular execution (e.g]
  tma_info_bottleneck_mispredictions
       [Total pipeline cost of Branch Misprediction related bottlenecks]
  tma_info_branches_callret
       [Fraction of branches that are CALL or RET]
  tma_info_branches_cond_nt
       [Fraction of branches that are non-taken conditionals]
  tma_info_branches_cond_tk
       [Fraction of branches that are taken conditionals]
  tma_info_branches_jump
       [Fraction of branches that are unconditional (direct or indirect) jumps]
  tma_info_branches_other_branches
       [Fraction of branches of other types (not individually covered by other
        metrics in Info.Branches group)]

BadSpec: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_branch_mispredicts
       [This metric represents fraction of slots the CPU has wasted due to
        Branch Misprediction]
  tma_clears_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Machine Clears]
  tma_info_bad_spec_ipmispredict
       [Number of Instructions per non-speculative Branch Misprediction
        (JEClear) (lower number means higher occurrence rate)]
  tma_info_bottleneck_mispredictions
       [Total pipeline cost of Branch Misprediction related bottlenecks]
  tma_machine_clears
       [This metric represents fraction of slots the CPU has wasted due to
        Machine Clears]
  tma_mispredicts_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Branch Misprediction at execution stage]

BigFootprint: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_icache_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        instruction cache misses]
  tma_info_bottleneck_big_code
       [Total pipeline cost of instruction fetch related bottlenecks by large
        code footprint programs (i-side cache; TLB and BTB misses)]
  tma_itlb_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        Instruction TLB (ITLB) misses]
  tma_unknown_branches
       [This metric represents fraction of cycles the CPU was stalled due to new
        branch address clears]

BrMispredicts: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_branch_mispredicts
       [This metric represents fraction of slots the CPU has wasted due to
        Branch Misprediction]
  tma_info_bad_spec_branch_misprediction_cost
       [Branch Misprediction Cost: Fraction of TMA slots wasted per
        non-speculative branch misprediction (retired JEClear)]
  tma_info_bad_spec_ipmisp_cond_ntaken
       [Instructions per retired mispredicts for conditional non-taken branches
        (lower number means higher occurrence rate)]
  tma_info_bad_spec_ipmisp_cond_taken
       [Instructions per retired mispredicts for conditional taken branches
        (lower number means higher occurrence rate)]
  tma_info_bad_spec_ipmisp_indirect
       [Instructions per retired mispredicts for indirect CALL or JMP branches
        (lower number means higher occurrence rate)]
  tma_info_bad_spec_ipmisp_ret
       [Instructions per retired mispredicts for return branches (lower number
        means higher occurrence rate)]
  tma_info_bad_spec_ipmispredict
       [Number of Instructions per non-speculative Branch Misprediction
        (JEClear) (lower number means higher occurrence rate)]
  tma_info_bad_spec_spec_clears_ratio
       [Speculative to Retired ratio of all clears (covering mispredicts and
        nukes)]
  tma_info_bottleneck_mispredictions
       [Total pipeline cost of Branch Misprediction related bottlenecks]
  tma_mispredicts_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Branch Misprediction at execution stage]
  tma_other_mispredicts
       [This metric estimates fraction of slots the CPU was stalled due to other
        cases of misprediction (non-retired x86 branches or other types)]

Branches: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_fused_instructions
       [This metric represents fraction of slots where the CPU was retiring
        fused instructions -- where one uop can represent multiple contiguous
        instructions]
  tma_info_branches_callret
       [Fraction of branches that are CALL or RET]
  tma_info_branches_cond_nt
       [Fraction of branches that are non-taken conditionals]
  tma_info_branches_cond_tk
       [Fraction of branches that are taken conditionals]
  tma_info_branches_jump
       [Fraction of branches that are unconditional (direct or indirect) jumps]
  tma_info_branches_other_branches
       [Fraction of branches of other types (not individually covered by other
        metrics in Info.Branches group)]
  tma_info_inst_mix_bptkbranch
       [Branch instructions per taken branch]
  tma_info_inst_mix_ipbranch
       [Instructions per Branch (lower number means higher occurrence rate)]
  tma_info_inst_mix_ipcall
       [Instructions per (near) call (lower number means higher occurrence rate)]
  tma_info_inst_mix_iptb
       [Instructions per taken branch]
  tma_info_system_ipfarbranch
       [Instructions per Far Branch ( Far Branches apply upon transition from
        application to operating system,handling interrupts,exceptions) [lower
        number means higher occurrence rate]]
  tma_info_thread_uptb
       [Uops per taken branch]
  tma_non_fused_branches
       [This metric represents fraction of slots where the CPU was retiring
        branch instructions that were not fused]

BvBC: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_icache_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        instruction cache misses]
  tma_info_bottleneck_big_code
       [Total pipeline cost of instruction fetch related bottlenecks by large
        code footprint programs (i-side cache; TLB and BTB misses)]
  tma_itlb_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        Instruction TLB (ITLB) misses]
  tma_unknown_branches
       [This metric represents fraction of cycles the CPU was stalled due to new
        branch address clears]

BvBO: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_fused_instructions
       [This metric represents fraction of slots where the CPU was retiring
        fused instructions -- where one uop can represent multiple contiguous
        instructions]
  tma_info_bottleneck_branching_overhead
       [Total pipeline cost of instructions used for program control-flow - a
        subset of the Retiring category in TMA]
  tma_non_fused_branches
       [This metric represents fraction of slots where the CPU was retiring
        branch instructions that were not fused]
  tma_nop_instructions
       [This metric represents fraction of slots where the CPU was retiring NOP
        (no op) instructions]

BvCB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_divider
       [This metric represents fraction of cycles where the Divider unit was
        active]
  tma_info_bottleneck_compute_bound_est
       [Total pipeline cost when the execution is compute-bound - an estimation]
  tma_ports_utilized_3m
       [This metric represents fraction of cycles CPU executed total of 3 or
        more uops per cycle on all execution ports (Logical Processor cycles
        since ICL,Physical Core cycles otherwise)]

BvFB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_frontend_bound
       [This category represents fraction of slots where the processor's
        Frontend undersupplies its Backend]
  tma_info_bottleneck_instruction_fetch_bw
       [Total pipeline cost of instruction fetch bandwidth related bottlenecks
        (when the front-end could not sustain operations delivery to the
        back-end)]

BvIO: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_assists
       [This metric estimates fraction of slots the CPU retired uops delivered
        by the Microcode_Sequencer as a result of Assists]
  tma_frontend_bound
       [This category represents fraction of slots where the processor's
        Frontend undersupplies its Backend]
  tma_info_bottleneck_irregular_overhead
       [Total pipeline cost of irregular execution (e.g]
  tma_other_mispredicts
       [This metric estimates fraction of slots the CPU was stalled due to other
        cases of misprediction (non-retired x86 branches or other types)]
  tma_other_nukes
       [This metric represents fraction of slots the CPU has wasted due to Nukes
        (Machine Clears) not related to memory ordering]
  tma_serializing_operation
       [This metric represents fraction of cycles the CPU issue-pipeline was
        stalled due to serializing operations]

BvMB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_bottleneck_cache_memory_bandwidth
       [Total pipeline cost of external Memory- or Cache-Bandwidth related
        bottlenecks]

BvML: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_bottleneck_cache_memory_latency
       [Total pipeline cost of external Memory- or Cache-Latency related
        bottlenecks]
  tma_l1_hit_latency
       [This metric roughly estimates fraction of cycles with demand load
        accesses that hit the L1 cache]
  tma_l2_bound
       [This metric estimates how often the CPU was stalled due to L2 cache
        accesses by loads]
  tma_l3_hit_latency
       [This metric estimates fraction of cycles with demand load accesses that
        hit the L3 cache under unloaded scenarios (possibly L3 latency limited)]
  tma_mem_latency
       [This metric estimates fraction of cycles where the performance was
        likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or
        HBM)]
  tma_store_latency
       [This metric estimates fraction of cycles the CPU spent handling L1D
        store misses]

BvMP: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_branch_mispredicts
       [This metric represents fraction of slots the CPU has wasted due to
        Branch Misprediction]
  tma_info_bottleneck_mispredictions
       [Total pipeline cost of Branch Misprediction related bottlenecks]
  tma_mispredicts_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Branch Misprediction at execution stage]

BvMS: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_contested_accesses
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to contested accesses]
  tma_data_sharing
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to data-sharing accesses]
  tma_false_sharing
       [This metric roughly estimates how often CPU was handling
        synchronizations due to False Sharing]
  tma_fb_full
       [This metric does a *rough estimation* of how often L1D Fill Buffer
        unavailability limited additional L1D miss memory access requests to
        proceed]
  tma_info_bottleneck_memory_synchronization
       [Total pipeline cost of Memory Synchronization related bottlenecks (data
        transfers and coherency updates across processors)]
  tma_machine_clears
       [This metric represents fraction of slots the CPU has wasted due to
        Machine Clears]
  tma_mem_bandwidth
       [This metric estimates fraction of cycles where the core's performance
        was likely hurt due to approaching bandwidth limits of external memory -
        DRAM ([SPR-HBM] and/or HBM)]
  tma_sq_full
       [This metric measures fraction of cycles where the Super Queue (SQ) was
        full taking into account all request-types and both hardware SMT threads
        (Logical Processors)]

BvMT: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_dtlb_load
       [This metric roughly estimates the fraction of cycles where the Data TLB
        (DTLB) was missed by load accesses]
  tma_dtlb_store
       [This metric roughly estimates the fraction of cycles spent handling
        first-level data TLB store misses]
  tma_info_bottleneck_memory_data_tlbs
       [Total pipeline cost of Memory Address Translation related bottlenecks
        (data-side TLBs)]

BvOB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_backend_bound
       [This category represents fraction of slots where no uops are being
        delivered due to a lack of required resources for accepting new uops in
        the Backend]
  tma_info_bottleneck_other_bottlenecks
       [Total pipeline cost of remaining bottlenecks in the back-end]

BvUW: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_bottleneck_useful_work
       [Total pipeline cost of "useful operations" - the portion of Retiring
        category not covered by Branching_Overhead nor Irregular_Overhead]
  tma_retiring
       [This category represents fraction of slots utilized by useful work i.e.
        issued uops that eventually get retired]

C0Wait: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_c01_wait
       [This metric represents fraction of cycles the CPU was stalled due
        staying in C0.1 power-performance optimized state (Faster wakeup time;
        Smaller power savings)]
  tma_c02_wait
       [This metric represents fraction of cycles the CPU was stalled due
        staying in C0.2 power-performance optimized state (Slower wakeup time;
        Larger power savings)]
  tma_info_system_c0_wait
       [Fraction of cycles the processor is waiting yet unhalted; covering
        legacy PAUSE instruction,as well as C0.1 / C0.2 power-performance
        optimized states]

CacheHits: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_memory_fb_hpki
       [Fill Buffer (FB) hits per kilo instructions for retired demand loads
        (L1D misses that merge into ongoing miss-handling entries)]
  tma_info_memory_l1mpki
       [L1 cache true misses per kilo instruction for retired demand loads]
  tma_info_memory_l1mpki_load
       [L1 cache true misses per kilo instruction for all demand loads
        (including speculative)]
  tma_info_memory_l2hpki_all
       [L2 cache hits per kilo instruction for all request types (including
        speculative)]
  tma_info_memory_l2hpki_load
       [L2 cache hits per kilo instruction for all demand loads (including
        speculative)]
  tma_info_memory_l2mpki
       [L2 cache true misses per kilo instruction for retired demand loads]
  tma_info_memory_l2mpki_all
       [L2 cache ([RKL+] true) misses per kilo instruction for all request types
        (including speculative)]
  tma_info_memory_l2mpki_load
       [L2 cache ([RKL+] true) misses per kilo instruction for all demand loads
        (including speculative)]
  tma_l1_bound
       [This metric estimates how often the CPU was stalled without loads
        missing the L1 data cache]
  tma_l2_bound
       [This metric estimates how often the CPU was stalled due to L2 cache
        accesses by loads]
  tma_l3_bound
       [This metric estimates how often the CPU was stalled due to loads
        accesses to L3 cache or contended with a sibling Core]

CacheMisses: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_memory_l2mpki_rfo
       [Offcore requests (L2 cache miss) per kilo instruction for demand RFOs]

CodeGen: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_branches_cond_nt
       [Fraction of branches that are non-taken conditionals]
  tma_info_branches_cond_tk
       [Fraction of branches that are taken conditionals]

Compute: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_core_bound
       [This metric represents fraction of slots where Core non-memory issues
        were of a bottleneck]
  tma_fp_scalar
       [This metric approximates arithmetic floating-point (FP) scalar uops
        fraction the CPU has retired]
  tma_fp_vector
       [This metric approximates arithmetic floating-point (FP) vector uops
        fraction the CPU has retired aggregated across all vector widths]
  tma_fp_vector_128b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 128-bit wide vectors]
  tma_fp_vector_256b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 256-bit wide vectors]
  tma_int_vector_128b
       [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_int_vector_256b
       [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_port_0
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch)]
  tma_x87_use
       [This metric serves as an approximation of legacy x87 usage]

Cor: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_botlnk_l0_core_bound_likely
       [Probability of Core Bound bottleneck hidden by SMT-profiling artifacts]
  tma_info_bottleneck_compute_bound_est
       [Total pipeline cost when the execution is compute-bound - an estimation]
  tma_info_bottleneck_irregular_overhead
       [Total pipeline cost of irregular execution (e.g]
  tma_info_bottleneck_other_bottlenecks
       [Total pipeline cost of remaining bottlenecks in the back-end]
  tma_info_core_fp_arith_utilization
       [Actual per-core usage of the Floating Point non-X87 execution units
        (regardless of precision or vector-width)]
  tma_info_core_ilp
       [Instruction-Level-Parallelism (average number of uops executed when
        there is execution) per thread (logical-processor)]
  tma_info_pipeline_execute
       [Instruction-Level-Parallelism (average number of uops executed when
        there is execution) per core]
  tma_info_system_gflops
       [Giga Floating Point Operations Per Second]
  tma_info_thread_execute_per_issue
       [The ratio of Executed- by Issued-Uops]

DSB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_dsb
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to DSB (decoded uop cache) fetch pipeline]
  tma_info_botlnk_l2_dsb_bandwidth
       [Total pipeline cost of DSB (uop cache) hits - subset of the
        Instruction_Fetch_BW Bottleneck]
  tma_info_frontend_dsb_coverage
       [Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)]

DSBmiss: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_decoder0_alone
       [This metric represents fraction of cycles where decoder-0 was the only
        active decoder]
  tma_dsb_switches
       [This metric represents fraction of cycles the CPU was stalled due to
        switches from DSB to MITE pipelines]
  tma_info_botlnk_l2_dsb_misses
       [Total pipeline cost of DSB (uop cache) misses - subset of the
        Instruction_Fetch_BW Bottleneck]
  tma_info_frontend_dsb_switch_cost
       [Average number of cycles of a switch from the DSB fetch-unit to MITE
        fetch unit - see DSB_Switches tree node for details]
  tma_info_frontend_ipdsb_miss_ret
       [Instructions per non-speculative DSB miss (lower number means higher
        occurrence rate)]
  tma_mite
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to the MITE pipeline (the legacy decode pipeline)]

DataSharing: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_contested_accesses
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to contested accesses]
  tma_false_sharing
       [This metric roughly estimates how often CPU was handling
        synchronizations due to False Sharing]

Default:
  tma_backend_bound
       [This category represents fraction of slots where no uops are being
        delivered due to a lack of required resources for accepting new uops in
        the Backend]
  tma_bad_speculation
       [This category represents fraction of slots wasted due to incorrect
        speculations]
  tma_frontend_bound
       [This category represents fraction of slots where the processor's
        Frontend undersupplies its Backend]
  tma_retiring
       [This category represents fraction of slots utilized by useful work i.e.
        issued uops that eventually get retired]

Fed: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_botlnk_l2_dsb_misses
       [Total pipeline cost of DSB (uop cache) misses - subset of the
        Instruction_Fetch_BW Bottleneck]
  tma_info_botlnk_l2_ic_misses
       [Total pipeline cost of Instruction Cache misses - subset of the Big_Code
        Bottleneck]
  tma_info_bottleneck_big_code
       [Total pipeline cost of instruction fetch related bottlenecks by large
        code footprint programs (i-side cache; TLB and BTB misses)]
  tma_info_bottleneck_instruction_fetch_bw
       [Total pipeline cost of instruction fetch bandwidth related bottlenecks
        (when the front-end could not sustain operations delivery to the
        back-end)]
  tma_info_frontend_dsb_coverage
       [Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)]
  tma_info_frontend_fetch_upc
       [Average number of Uops issued by front-end when it issued something]
  tma_info_frontend_icache_miss_latency
       [Average Latency for L1 instruction cache misses]
  tma_info_frontend_ipdsb_miss_ret
       [Instructions per non-speculative DSB miss (lower number means higher
        occurrence rate)]
  tma_info_frontend_ipunknown_branch
       [Instructions per speculative Unknown Branch Misprediction (BAClear)
        (lower number means higher occurrence rate)]
  tma_info_frontend_lsd_coverage
       [Fraction of Uops delivered by the LSD (Loop Stream Detector; aka Loop
        Cache)]
  tma_info_frontend_unknown_branch_cost
       [Average number of cycles the front-end was delayed due to an Unknown
        Branch detection]
  tma_info_inst_mix_bptkbranch
       [Branch instructions per taken branch]
  tma_info_inst_mix_ipbranch
       [Instructions per Branch (lower number means higher occurrence rate)]
  tma_info_inst_mix_ipcall
       [Instructions per (near) call (lower number means higher occurrence rate)]
  tma_info_inst_mix_iptb
       [Instructions per taken branch]
  tma_info_memory_tlb_code_stlb_mpki
       [STLB (2nd level TLB) code speculative misses per kilo instruction
        (misses of any page-size that complete the page walk)]
  tma_info_pipeline_fetch_dsb
       [Average number of uops fetched from DSB per cycle]
  tma_info_pipeline_fetch_lsd
       [Average number of uops fetched from LSD per cycle]
  tma_info_pipeline_fetch_mite
       [Average number of uops fetched from MITE per cycle]
  tma_info_thread_uptb
       [Uops per taken branch]

FetchBW: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_decoder0_alone
       [This metric represents fraction of cycles where decoder-0 was the only
        active decoder]
  tma_dsb
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to DSB (decoded uop cache) fetch pipeline]
  tma_fetch_bandwidth
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend bandwidth issues]
  tma_info_botlnk_l2_dsb_bandwidth
       [Total pipeline cost of DSB (uop cache) hits - subset of the
        Instruction_Fetch_BW Bottleneck]
  tma_info_bottleneck_instruction_fetch_bw
       [Total pipeline cost of instruction fetch bandwidth related bottlenecks
        (when the front-end could not sustain operations delivery to the
        back-end)]
  tma_info_frontend_dsb_coverage
       [Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)]
  tma_info_frontend_fetch_upc
       [Average number of Uops issued by front-end when it issued something]
  tma_info_inst_mix_iptb
       [Instructions per taken branch]
  tma_info_pipeline_fetch_dsb
       [Average number of uops fetched from DSB per cycle]
  tma_info_pipeline_fetch_lsd
       [Average number of uops fetched from LSD per cycle]
  tma_info_pipeline_fetch_mite
       [Average number of uops fetched from MITE per cycle]
  tma_info_thread_uptb
       [Uops per taken branch]
  tma_lsd
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to LSD (Loop Stream Detector) unit]
  tma_mite
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to the MITE pipeline (the legacy decode pipeline)]

FetchLat: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_branch_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers]
  tma_dsb_switches
       [This metric represents fraction of cycles the CPU was stalled due to
        switches from DSB to MITE pipelines]
  tma_icache_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        instruction cache misses]
  tma_info_botlnk_l2_ic_misses
       [Total pipeline cost of Instruction Cache misses - subset of the Big_Code
        Bottleneck]
  tma_info_frontend_icache_miss_latency
       [Average Latency for L1 instruction cache misses]
  tma_itlb_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        Instruction TLB (ITLB) misses]
  tma_lcp
       [This metric represents fraction of cycles CPU was stalled due to Length
        Changing Prefixes (LCPs)]
  tma_ms_switches
       [This metric estimates the fraction of cycles when the CPU was stalled
        due to switches of uop delivery to the Microcode Sequencer (MS)]
  tma_unknown_branches
       [This metric represents fraction of cycles the CPU was stalled due to new
        branch address clears]

Flops: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_fp_scalar
       [This metric approximates arithmetic floating-point (FP) scalar uops
        fraction the CPU has retired]
  tma_fp_vector
       [This metric approximates arithmetic floating-point (FP) vector uops
        fraction the CPU has retired aggregated across all vector widths]
  tma_fp_vector_128b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 128-bit wide vectors]
  tma_fp_vector_256b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 256-bit wide vectors]
  tma_info_core_flopc
       [Floating Point Operations Per Cycle]
  tma_info_core_fp_arith_utilization
       [Actual per-core usage of the Floating Point non-X87 execution units
        (regardless of precision or vector-width)]
  tma_info_inst_mix_iparith
       [Instructions per FP Arithmetic instruction (lower number means higher
        occurrence rate)]
  tma_info_inst_mix_iparith_avx128
       [Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number
        means higher occurrence rate)]
  tma_info_inst_mix_iparith_avx256
       [Instructions per FP Arithmetic AVX* 256-bit instruction (lower number
        means higher occurrence rate)]
  tma_info_inst_mix_iparith_scalar_dp
       [Instructions per FP Arithmetic Scalar Double-Precision instruction
        (lower number means higher occurrence rate)]
  tma_info_inst_mix_iparith_scalar_sp
       [Instructions per FP Arithmetic Scalar Single-Precision instruction
        (lower number means higher occurrence rate)]
  tma_info_inst_mix_ipflop
       [Instructions per Floating Point (FP) Operation (lower number means
        higher occurrence rate)]
  tma_info_inst_mix_ippause
       [Instructions per PAUSE (lower number means higher occurrence rate)]
  tma_info_system_gflops
       [Giga Floating Point Operations Per Second]

FpScalar: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_inst_mix_iparith_scalar_dp
       [Instructions per FP Arithmetic Scalar Double-Precision instruction
        (lower number means higher occurrence rate)]
  tma_info_inst_mix_iparith_scalar_sp
       [Instructions per FP Arithmetic Scalar Single-Precision instruction
        (lower number means higher occurrence rate)]

FpVector: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_inst_mix_iparith_avx128
       [Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number
        means higher occurrence rate)]
  tma_info_inst_mix_iparith_avx256
       [Instructions per FP Arithmetic AVX* 256-bit instruction (lower number
        means higher occurrence rate)]
  tma_info_inst_mix_ippause
       [Instructions per PAUSE (lower number means higher occurrence rate)]

Frontend: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_fetch_bandwidth
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend bandwidth issues]
  tma_fetch_latency
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend latency issues]
  tma_info_bottleneck_big_code
       [Total pipeline cost of instruction fetch related bottlenecks by large
        code footprint programs (i-side cache; TLB and BTB misses)]
  tma_info_bottleneck_instruction_fetch_bw
       [Total pipeline cost of instruction fetch bandwidth related bottlenecks
        (when the front-end could not sustain operations delivery to the
        back-end)]
  tma_info_inst_mix_iptb
       [Instructions per taken branch]

HPC: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_avx_assists
       [This metric estimates fraction of slots the CPU retired uops as a result
        of handing SSE to AVX* or AVX* to SSE transition Assists]
  tma_fp_arith
       [This metric represents overall arithmetic floating-point (FP) operations
        fraction the CPU has executed (retired)]
  tma_fp_assists
       [This metric roughly estimates fraction of slots the CPU retired uops as
        a result of handing Floating Point (FP) Assists]
  tma_info_core_fp_arith_utilization
       [Actual per-core usage of the Floating Point non-X87 execution units
        (regardless of precision or vector-width)]
  tma_info_system_cpu_utilization
       [Average CPU Utilization (percentage)]
  tma_info_system_dram_bw_use
       [Average external Memory Bandwidth Use for reads and writes [GB / sec]]
  tma_info_system_gflops
       [Giga Floating Point Operations Per Second]
  tma_shuffles_256b
       [This metric represents fraction of slots where the CPU was retiring
        Shuffle operations of 256-bit vector size (FP or Integer)]

IcMiss: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_icache_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        instruction cache misses]
  tma_info_botlnk_l2_ic_misses
       [Total pipeline cost of Instruction Cache misses - subset of the Big_Code
        Bottleneck]
  tma_info_bottleneck_big_code
       [Total pipeline cost of instruction fetch related bottlenecks by large
        code footprint programs (i-side cache; TLB and BTB misses)]
  tma_info_frontend_icache_miss_latency
       [Average Latency for L1 instruction cache misses]
  tma_info_frontend_l2mpki_code
       [L2 cache true code cacheline misses per kilo instruction]
  tma_info_frontend_l2mpki_code_all
       [L2 cache speculative code cacheline misses per kilo instruction]

Ifetch: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_bottleneck_%_ifetch_miss_bound_cycles
       [Percentage of time that allocation and retirement is stalled by the
        Frontend Cluster due to an Ifetch Miss,either Icache or ITLB Miss]

InsType: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_inst_mix_iparith
       [Instructions per FP Arithmetic instruction (lower number means higher
        occurrence rate)]
  tma_info_inst_mix_iparith_avx128
       [Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number
        means higher occurrence rate)]
  tma_info_inst_mix_iparith_avx256
       [Instructions per FP Arithmetic AVX* 256-bit instruction (lower number
        means higher occurrence rate)]
  tma_info_inst_mix_iparith_scalar_dp
       [Instructions per FP Arithmetic Scalar Double-Precision instruction
        (lower number means higher occurrence rate)]
  tma_info_inst_mix_iparith_scalar_sp
       [Instructions per FP Arithmetic Scalar Single-Precision instruction
        (lower number means higher occurrence rate)]
  tma_info_inst_mix_ipbranch
       [Instructions per Branch (lower number means higher occurrence rate)]
  tma_info_inst_mix_ipflop
       [Instructions per Floating Point (FP) Operation (lower number means
        higher occurrence rate)]
  tma_info_inst_mix_ipload
       [Instructions per Load (lower number means higher occurrence rate)]
  tma_info_inst_mix_ippause
       [Instructions per PAUSE (lower number means higher occurrence rate)]
  tma_info_inst_mix_ipstore
       [Instructions per Store (lower number means higher occurrence rate)]

IntVector: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_int_vector_128b
       [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_int_vector_256b
       [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]

LSD: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_frontend_lsd_coverage
       [Fraction of Uops delivered by the LSD (Loop Stream Detector; aka Loop
        Cache)]
  tma_lsd
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to LSD (Loop Stream Detector) unit]

Load_Store_Miss: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_bottleneck_%_load_miss_bound_cycles
       [Percentage of time that retirement is stalled due to an L1 miss]

MachineClears: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_clears_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Machine Clears]
  tma_machine_clears
       [This metric represents fraction of slots the CPU has wasted due to
        Machine Clears]

Machine_Clears: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_other_nukes
       [This metric represents fraction of slots the CPU has wasted due to Nukes
        (Machine Clears) not related to memory ordering]

Mem:
  tma_info_bottleneck_cache_memory_bandwidth
       [Total pipeline cost of external Memory- or Cache-Bandwidth related
        bottlenecks]
  tma_info_bottleneck_cache_memory_latency
       [Total pipeline cost of external Memory- or Cache-Latency related
        bottlenecks]
  tma_info_bottleneck_memory_data_tlbs
       [Total pipeline cost of Memory Address Translation related bottlenecks
        (data-side TLBs)]
  tma_info_bottleneck_memory_synchronization
       [Total pipeline cost of Memory Synchronization related bottlenecks (data
        transfers and coherency updates across processors)]
  tma_info_memory_core_l1d_cache_fill_bw_2t
       [Average per-core data fill bandwidth to the L1 data cache [GB / sec]]
  tma_info_memory_core_l2_cache_fill_bw_2t
       [Average per-core data fill bandwidth to the L2 cache [GB / sec]]
  tma_info_memory_core_l3_cache_access_bw_2t
       [Average per-core data access bandwidth to the L3 cache [GB / sec]]
  tma_info_memory_core_l3_cache_fill_bw_2t
       [Average per-core data fill bandwidth to the L3 cache [GB / sec]]
  tma_info_memory_fb_hpki
       [Fill Buffer (FB) hits per kilo instructions for retired demand loads
        (L1D misses that merge into ongoing miss-handling entries)]
  tma_info_memory_l1d_cache_fill_bw
       [Average per-thread data fill bandwidth to the L1 data cache [GB / sec]]
  tma_info_memory_l1mpki
       [L1 cache true misses per kilo instruction for retired demand loads]
  tma_info_memory_l1mpki_load
       [L1 cache true misses per kilo instruction for all demand loads
        (including speculative)]
  tma_info_memory_l2_cache_fill_bw
       [Average per-thread data fill bandwidth to the L2 cache [GB / sec]]
  tma_info_memory_l2hpki_all
       [L2 cache hits per kilo instruction for all request types (including
        speculative)]
  tma_info_memory_l2hpki_load
       [L2 cache hits per kilo instruction for all demand loads (including
        speculative)]
  tma_info_memory_l2mpki
       [L2 cache true misses per kilo instruction for retired demand loads]
  tma_info_memory_l2mpki_all
       [L2 cache ([RKL+] true) misses per kilo instruction for all request types
        (including speculative)]
  tma_info_memory_l2mpki_load
       [L2 cache ([RKL+] true) misses per kilo instruction for all demand loads
        (including speculative)]
  tma_info_memory_l3_cache_access_bw
       [Average per-thread data access bandwidth to the L3 cache [GB / sec]]
  tma_info_memory_l3_cache_fill_bw
       [Average per-thread data fill bandwidth to the L3 cache [GB / sec]]
  tma_info_memory_l3mpki
       [L3 cache true misses per kilo instruction for retired demand loads]
  tma_info_memory_load_miss_real_latency
       [Actual Average Latency for L1 data-cache miss demand load operations (in
        core cycles)]
  tma_info_memory_mix_bus_lock_pki
       ["Bus lock" per kilo instruction]
  tma_info_memory_mix_uc_load_pki
       [Un-cacheable retired load per kilo instruction]
  tma_info_memory_mlp
       [Memory-Level-Parallelism (average number of L1 miss demand load when
        there is at least one such miss]
  tma_info_memory_tlb_load_stlb_mpki
       [STLB (2nd level TLB) data load speculative misses per kilo instruction
        (misses of any page-size that complete the page walk)]
  tma_info_memory_tlb_page_walks_utilization
       [Utilization of the core's Page Walker(s) serving STLB misses triggered
        by instruction/Load/Store accesses]
  tma_info_memory_tlb_store_stlb_mpki
       [STLB (2nd level TLB) data store speculative misses per kilo instruction
        (misses of any page-size that complete the page walk)]
  tma_info_system_mem_parallel_reads
       [Average number of parallel data read requests to external memory]
  tma_info_system_mem_read_latency
       [Average latency of data read request to external memory (in nanoseconds)]
  tma_info_thread_cpi
       [Cycles Per Instruction (per Logical Processor)]

MemOffcore: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_system_dram_bw_use
       [Average external Memory Bandwidth Use for reads and writes [GB / sec]]

Mem_Exec: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_bottleneck_%_mem_exec_bound_cycles
       [Percentage of time that retirement is stalled by the Memory Cluster due
        to a pipeline stall]

MemoryBW: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_fb_full
       [This metric does a *rough estimation* of how often L1D Fill Buffer
        unavailability limited additional L1D miss memory access requests to
        proceed]
  tma_info_bottleneck_cache_memory_bandwidth
       [Total pipeline cost of external Memory- or Cache-Bandwidth related
        bottlenecks]
  tma_info_memory_core_l1d_cache_fill_bw_2t
       [Average per-core data fill bandwidth to the L1 data cache [GB / sec]]
  tma_info_memory_core_l2_cache_fill_bw_2t
       [Average per-core data fill bandwidth to the L2 cache [GB / sec]]
  tma_info_memory_core_l3_cache_access_bw_2t
       [Average per-core data access bandwidth to the L3 cache [GB / sec]]
  tma_info_memory_core_l3_cache_fill_bw_2t
       [Average per-core data fill bandwidth to the L3 cache [GB / sec]]
  tma_info_memory_l1d_cache_fill_bw
       [Average per-thread data fill bandwidth to the L1 data cache [GB / sec]]
  tma_info_memory_l2_cache_fill_bw
       [Average per-thread data fill bandwidth to the L2 cache [GB / sec]]
  tma_info_memory_l3_cache_access_bw
       [Average per-thread data access bandwidth to the L3 cache [GB / sec]]
  tma_info_memory_l3_cache_fill_bw
       [Average per-thread data fill bandwidth to the L3 cache [GB / sec]]
  tma_info_memory_mlp
       [Memory-Level-Parallelism (average number of L1 miss demand load when
        there is at least one such miss]
  tma_info_system_dram_bw_use
       [Average external Memory Bandwidth Use for reads and writes [GB / sec]]
  tma_info_system_mem_parallel_reads
       [Average number of parallel data read requests to external memory]
  tma_mem_bandwidth
       [This metric estimates fraction of cycles where the core's performance
        was likely hurt due to approaching bandwidth limits of external memory -
        DRAM ([SPR-HBM] and/or HBM)]
  tma_sq_full
       [This metric measures fraction of cycles where the Super Queue (SQ) was
        full taking into account all request-types and both hardware SMT threads
        (Logical Processors)]
  tma_streaming_stores
       [This metric estimates how often CPU was stalled due to Streaming store
        memory accesses; Streaming store optimize out a read request required by
        RFO stores]

MemoryBound: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_dram_bound
       [This metric estimates how often the CPU was stalled on accesses to
        external memory (DRAM) by loads]
  tma_info_memory_load_miss_real_latency
       [Actual Average Latency for L1 data-cache miss demand load operations (in
        core cycles)]
  tma_info_memory_mlp
       [Memory-Level-Parallelism (average number of L1 miss demand load when
        there is at least one such miss]
  tma_l1_bound
       [This metric estimates how often the CPU was stalled without loads
        missing the L1 data cache]
  tma_l2_bound
       [This metric estimates how often the CPU was stalled due to L2 cache
        accesses by loads]
  tma_l3_bound
       [This metric estimates how often the CPU was stalled due to loads
        accesses to L3 cache or contended with a sibling Core]
  tma_store_bound
       [This metric estimates how often CPU was stalled due to RFO store memory
        accesses; RFO store issue a read-for-ownership request before the write]

MemoryLat: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_bottleneck_cache_memory_latency
       [Total pipeline cost of external Memory- or Cache-Latency related
        bottlenecks]
  tma_info_memory_load_miss_real_latency
       [Actual Average Latency for L1 data-cache miss demand load operations (in
        core cycles)]
  tma_info_system_mem_read_latency
       [Average latency of data read request to external memory (in nanoseconds)]
  tma_l1_hit_latency
       [This metric roughly estimates fraction of cycles with demand load
        accesses that hit the L1 cache]
  tma_l3_hit_latency
       [This metric estimates fraction of cycles with demand load accesses that
        hit the L3 cache under unloaded scenarios (possibly L3 latency limited)]
  tma_mem_latency
       [This metric estimates fraction of cycles where the performance was
        likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or
        HBM)]
  tma_store_latency
       [This metric estimates fraction of cycles the CPU spent handling L1D
        store misses]

MemoryTLB: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_dtlb_load
       [This metric roughly estimates the fraction of cycles where the Data TLB
        (DTLB) was missed by load accesses]
  tma_dtlb_store
       [This metric roughly estimates the fraction of cycles spent handling
        first-level data TLB store misses]
  tma_info_bottleneck_big_code
       [Total pipeline cost of instruction fetch related bottlenecks by large
        code footprint programs (i-side cache; TLB and BTB misses)]
  tma_info_bottleneck_memory_data_tlbs
       [Total pipeline cost of Memory Address Translation related bottlenecks
        (data-side TLBs)]
  tma_info_memory_tlb_code_stlb_mpki
       [STLB (2nd level TLB) code speculative misses per kilo instruction
        (misses of any page-size that complete the page walk)]
  tma_info_memory_tlb_load_stlb_mpki
       [STLB (2nd level TLB) data load speculative misses per kilo instruction
        (misses of any page-size that complete the page walk)]
  tma_info_memory_tlb_page_walks_utilization
       [Utilization of the core's Page Walker(s) serving STLB misses triggered
        by instruction/Load/Store accesses]
  tma_info_memory_tlb_store_stlb_mpki
       [STLB (2nd level TLB) data store speculative misses per kilo instruction
        (misses of any page-size that complete the page walk)]
  tma_itlb_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        Instruction TLB (ITLB) misses]
  tma_load_stlb_hit
       [This metric roughly estimates the fraction of cycles where the (first
        level) DTLB was missed by load accesses,that later on hit in
        second-level TLB (STLB)]
  tma_load_stlb_miss
       [This metric estimates the fraction of cycles where the Second-level TLB
        (STLB) was missed by load accesses,performing a hardware page walk]
  tma_store_stlb_hit
       [This metric roughly estimates the fraction of cycles where the TLB was
        missed by store accesses,hitting in the second-level TLB (STLB)]
  tma_store_stlb_miss
       [This metric estimates the fraction of cycles where the STLB was missed
        by store accesses,performing a hardware page walk]

Memory_BW: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_memory_latency_data_l2_mlp
       [Average Parallel L2 cache miss data reads]
  tma_info_memory_latency_load_l2_mlp
       [Average Parallel L2 cache miss demand Loads]

Memory_Lat: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_memory_latency_load_l2_miss_latency
       [Average Latency for L2 cache miss demand Loads]
  tma_info_memory_latency_load_l3_miss_latency
       [Average Latency for L3 cache miss demand Loads]

MicroSeq: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_pipeline_ipassist
       [Instructions per a microcode Assist invocation]
  tma_info_pipeline_strings_cycles
       [Estimated fraction of retirement-cycles dealing with repeat instructions]
  tma_microcode_sequencer
       [This metric represents fraction of slots the CPU was retiring uops
        fetched by the Microcode Sequencer (MS) unit]
  tma_ms_switches
       [This metric estimates the fraction of cycles when the CPU was stalled
        due to switches of uop delivery to the Microcode Sequencer (MS)]

OS: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_system_ipfarbranch
       [Instructions per Far Branch ( Far Branches apply upon transition from
        application to operating system,handling interrupts,exceptions) [lower
        number means higher occurrence rate]]
  tma_info_system_kernel_cpi
       [Cycles Per Instruction for the Operating System (OS) Kernel mode]
  tma_info_system_kernel_utilization
       [Fraction of cycles spent in the Operating System (OS) Kernel mode]

Offcore: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_contested_accesses
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to contested accesses]
  tma_data_sharing
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to data-sharing accesses]
  tma_false_sharing
       [This metric roughly estimates how often CPU was handling
        synchronizations due to False Sharing]
  tma_info_bottleneck_cache_memory_bandwidth
       [Total pipeline cost of external Memory- or Cache-Bandwidth related
        bottlenecks]
  tma_info_bottleneck_cache_memory_latency
       [Total pipeline cost of external Memory- or Cache-Latency related
        bottlenecks]
  tma_info_bottleneck_memory_data_tlbs
       [Total pipeline cost of Memory Address Translation related bottlenecks
        (data-side TLBs)]
  tma_info_bottleneck_memory_synchronization
       [Total pipeline cost of Memory Synchronization related bottlenecks (data
        transfers and coherency updates across processors)]
  tma_info_bottleneck_other_bottlenecks
       [Total pipeline cost of remaining bottlenecks in the back-end]
  tma_info_memory_core_l3_cache_access_bw_2t
       [Average per-core data access bandwidth to the L3 cache [GB / sec]]
  tma_info_memory_l2mpki_all
       [L2 cache ([RKL+] true) misses per kilo instruction for all request types
        (including speculative)]
  tma_info_memory_l2mpki_rfo
       [Offcore requests (L2 cache miss) per kilo instruction for demand RFOs]
  tma_info_memory_l3_cache_access_bw
       [Average per-thread data access bandwidth to the L3 cache [GB / sec]]
  tma_info_memory_latency_data_l2_mlp
       [Average Parallel L2 cache miss data reads]
  tma_info_memory_latency_load_l2_miss_latency
       [Average Latency for L2 cache miss demand Loads]
  tma_info_memory_latency_load_l2_mlp
       [Average Parallel L2 cache miss demand Loads]
  tma_info_memory_latency_load_l3_miss_latency
       [Average Latency for L3 cache miss demand Loads]
  tma_lock_latency
       [This metric represents fraction of cycles the CPU spent handling cache
        misses due to lock operations]
  tma_mem_bandwidth
       [This metric estimates fraction of cycles where the core's performance
        was likely hurt due to approaching bandwidth limits of external memory -
        DRAM ([SPR-HBM] and/or HBM)]
  tma_mem_latency
       [This metric estimates fraction of cycles where the performance was
        likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or
        HBM)]
  tma_sq_full
       [This metric measures fraction of cycles where the Super Queue (SQ) was
        full taking into account all request-types and both hardware SMT threads
        (Logical Processors)]
  tma_store_latency
       [This metric estimates fraction of cycles the CPU spent handling L1D
        store misses]
  tma_streaming_stores
       [This metric estimates how often CPU was stalled due to Streaming store
        memory accesses; Streaming store optimize out a read request required by
        RFO stores]

PGO: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_frontend_bound
       [This category represents fraction of slots where the processor's
        Frontend undersupplies its Backend]
  tma_info_branches_cond_nt
       [Fraction of branches that are non-taken conditionals]
  tma_info_branches_cond_tk
       [Fraction of branches that are taken conditionals]
  tma_info_inst_mix_bptkbranch
       [Branch instructions per taken branch]
  tma_info_inst_mix_ipcall
       [Instructions per (near) call (lower number means higher occurrence rate)]
  tma_info_inst_mix_iptb
       [Instructions per taken branch]

Pipeline: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_fused_instructions
       [This metric represents fraction of slots where the CPU was retiring
        fused instructions -- where one uop can represent multiple contiguous
        instructions]
  tma_info_core_ilp
       [Instruction-Level-Parallelism (average number of uops executed when
        there is execution) per thread (logical-processor)]
  tma_info_pipeline_execute
       [Instruction-Level-Parallelism (average number of uops executed when
        there is execution) per core]
  tma_info_pipeline_ipassist
       [Instructions per a microcode Assist invocation]
  tma_info_pipeline_retire
       [Average number of Uops retired in cycles where at least one uop has
        retired]
  tma_info_pipeline_strings_cycles
       [Estimated fraction of retirement-cycles dealing with repeat instructions]
  tma_info_thread_clks
       [Per-Logical Processor actual clocks when the Logical Processor is active]
  tma_info_thread_cpi
       [Cycles Per Instruction (per Logical Processor)]
  tma_info_thread_execute_per_issue
       [The ratio of Executed- by Issued-Uops]
  tma_info_thread_uoppi
       [Uops Per Instruction]
  tma_int_operations
       [This metric represents overall Integer (Int) select operations fraction
        the CPU has executed (retired)]
  tma_int_vector_128b
       [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_int_vector_256b
       [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_memory_operations
       [This metric represents fraction of slots where the CPU was retiring
        memory operations -- uops for memory load or store accesses]
  tma_non_fused_branches
       [This metric represents fraction of slots where the CPU was retiring
        branch instructions that were not fused]
  tma_nop_instructions
       [This metric represents fraction of slots where the CPU was retiring NOP
        (no op) instructions]
  tma_other_light_ops
       [This metric represents the remaining light uops fraction the CPU has
        executed - remaining means not covered by other sibling nodes]
  tma_shuffles_256b
       [This metric represents fraction of slots where the CPU was retiring
        Shuffle operations of 256-bit vector size (FP or Integer)]

PortsUtil: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_core_ilp
       [Instruction-Level-Parallelism (average number of uops executed when
        there is execution) per thread (logical-processor)]
  tma_info_pipeline_execute
       [Instruction-Level-Parallelism (average number of uops executed when
        there is execution) per core]
  tma_ports_utilization
       [This metric estimates fraction of cycles the CPU performance was
        potentially limited due to Core computation issues (non divider-related)]
  tma_ports_utilized_0
       [This metric represents fraction of cycles CPU executed no uops on any
        execution port (Logical Processor cycles since ICL,Physical Core cycles
        otherwise)]
  tma_ports_utilized_1
       [This metric represents fraction of cycles where the CPU executed total
        of 1 uop per cycle on all execution ports (Logical Processor cycles
        since ICL,Physical Core cycles otherwise)]
  tma_ports_utilized_2
       [This metric represents fraction of cycles CPU executed total of 2 uops
        per cycle on all execution ports (Logical Processor cycles since ICL,
        Physical Core cycles otherwise)]
  tma_ports_utilized_3m
       [This metric represents fraction of cycles CPU executed total of 3 or
        more uops per cycle on all execution ports (Logical Processor cycles
        since ICL,Physical Core cycles otherwise)]
  tma_serializing_operation
       [This metric represents fraction of cycles the CPU issue-pipeline was
        stalled due to serializing operations]

Power: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  C10_Pkg_Residency
       [C10 residency percent per package]
  C1_Core_Residency
       [C1 residency percent per core]
  C2_Pkg_Residency
       [C2 residency percent per package]
  C3_Pkg_Residency
       [C3 residency percent per package]
  C6_Core_Residency
       [C6 residency percent per core]
  C6_Pkg_Residency
       [C6 residency percent per package]
  C7_Core_Residency
       [C7 residency percent per core]
  C7_Pkg_Residency
       [C7 residency percent per package]
  C8_Pkg_Residency
       [C8 residency percent per package]
  C9_Pkg_Residency
       [C9 residency percent per package]
  tma_info_core_epc
       [uops Executed per Cycle]
  tma_info_system_core_frequency
       [Measured Average Core Frequency for unhalted processors [GHz]]
  tma_info_system_turbo_utilization
       [Average Frequency Utilization relative nominal frequency]

Prefetches: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_inst_mix_ipswpf
       [Instructions per Software prefetch instruction (of any type:
        NTA/T0/T1/T2/Prefetch) (lower number means higher occurrence rate)]

Ret: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_bottleneck_branching_overhead
       [Total pipeline cost of instructions used for program control-flow - a
        subset of the Retiring category in TMA]
  tma_info_bottleneck_irregular_overhead
       [Total pipeline cost of irregular execution (e.g]
  tma_info_bottleneck_useful_work
       [Total pipeline cost of "useful operations" - the portion of Retiring
        category not covered by Branching_Overhead nor Irregular_Overhead]
  tma_info_core_coreipc
       [Instructions Per Cycle across hyper-threads (per physical core)]
  tma_info_core_flopc
       [Floating Point Operations Per Cycle]
  tma_info_pipeline_ipassist
       [Instructions per a microcode Assist invocation]
  tma_info_pipeline_retire
       [Average number of Uops retired in cycles where at least one uop has
        retired]
  tma_info_pipeline_strings_cycles
       [Estimated fraction of retirement-cycles dealing with repeat instructions]
  tma_info_thread_ipc
       [Instructions Per Cycle (per Logical Processor)]
  tma_info_thread_uoppi
       [Uops Per Instruction]

Retire: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_heavy_operations
       [This metric represents fraction of slots where the CPU was retiring
        heavy-weight operations -- instructions that require two or more uops or
        micro-coded sequences]
  tma_info_pipeline_ipassist
       [Instructions per a microcode Assist invocation]
  tma_info_thread_uoppi
       [Uops Per Instruction]
  tma_light_operations
       [This metric represents fraction of slots where the CPU was retiring
        light-weight operations -- instructions that require no more than one
        uop (micro-operation)]

SMT: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_botlnk_l0_core_bound_likely
       [Probability of Core Bound bottleneck hidden by SMT-profiling artifacts]
  tma_info_core_core_clks
       [Core actual clocks when any Logical Processor is active on the Physical
        Core]
  tma_info_core_coreipc
       [Instructions Per Cycle across hyper-threads (per physical core)]
  tma_info_pipeline_execute
       [Instruction-Level-Parallelism (average number of uops executed when
        there is execution) per core]
  tma_info_system_smt_2t_utilization
       [Fraction of cycles where both hardware Logical Processors were active]
  tma_info_thread_slots_utilization
       [Fraction of Physical Core issue-slots utilized by this Logical Processor]

Snoop: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_contested_accesses
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to contested accesses]
  tma_data_sharing
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to data-sharing accesses]
  tma_false_sharing
       [This metric roughly estimates how often CPU was handling
        synchronizations due to False Sharing]

SoC: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  UNCORE_FREQ
       [Uncore frequency per die [GHZ]]
  tma_info_system_dram_bw_use
       [Average external Memory Bandwidth Use for reads and writes [GB / sec]]
  tma_info_system_mem_parallel_reads
       [Average number of parallel data read requests to external memory]
  tma_info_system_mem_read_latency
       [Average latency of data read request to external memory (in nanoseconds)]
  tma_info_system_socket_clks
       [Socket actual clocks when any core is active on that socket]

Summary: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_inst_mix_instructions
       [Total number of retired Instructions]
  tma_info_system_core_frequency
       [Measured Average Core Frequency for unhalted processors [GHz]]
  tma_info_system_cpu_utilization
       [Average CPU Utilization (percentage)]
  tma_info_system_cpus_utilized
       [Average number of utilized CPUs]
  tma_info_system_kernel_utilization
       [Fraction of cycles spent in Kernel mode]
  tma_info_thread_ipc
       [Instructions Per Cycle (per Logical Processor)]

TmaL1: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_backend_bound
       [This category represents fraction of slots where no uops are being
        delivered due to a lack of required resources for accepting new uops in
        the Backend]
  tma_bad_speculation
       [This category represents fraction of slots wasted due to incorrect
        speculations]
  tma_frontend_bound
       [This category represents fraction of slots where the processor's
        Frontend undersupplies its Backend]
  tma_info_core_coreipc
       [Instructions Per Cycle across hyper-threads (per physical core)]
  tma_info_inst_mix_instructions
       [Total number of retired Instructions]
  tma_info_thread_slots
       [Total issue-pipeline slots (per-Physical Core till ICL; per-Logical
        Processor ICL onward)]
  tma_info_thread_slots_utilization
       [Fraction of Physical Core issue-slots utilized by this Logical Processor]
  tma_retiring
       [This category represents fraction of slots utilized by useful work i.e.
        issued uops that eventually get retired]

TmaL2: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_branch_mispredicts
       [This metric represents fraction of slots the CPU has wasted due to
        Branch Misprediction]
  tma_core_bound
       [This metric represents fraction of slots where Core non-memory issues
        were of a bottleneck]
  tma_fetch_bandwidth
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend bandwidth issues]
  tma_fetch_latency
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend latency issues]
  tma_heavy_operations
       [This metric represents fraction of slots where the CPU was retiring
        heavy-weight operations -- instructions that require two or more uops or
        micro-coded sequences]
  tma_light_operations
       [This metric represents fraction of slots where the CPU was retiring
        light-weight operations -- instructions that require no more than one
        uop (micro-operation)]
  tma_machine_clears
       [This metric represents fraction of slots the CPU has wasted due to
        Machine Clears]
  tma_memory_bound
       [This metric represents fraction of slots the Memory subsystem within the
        Backend was a bottleneck]

TmaL3mem: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_dram_bound
       [This metric estimates how often the CPU was stalled on accesses to
        external memory (DRAM) by loads]
  tma_l1_bound
       [This metric estimates how often the CPU was stalled without loads
        missing the L1 data cache]
  tma_l2_bound
       [This metric estimates how often the CPU was stalled due to L2 cache
        accesses by loads]
  tma_l3_bound
       [This metric estimates how often the CPU was stalled due to loads
        accesses to L3 cache or contended with a sibling Core]
  tma_store_bound
       [This metric estimates how often CPU was stalled due to RFO store memory
        accesses; RFO store issue a read-for-ownership request before the write]

TopdownL1: [Metrics for top-down breakdown at level 1]
  tma_backend_bound
       [This category represents fraction of slots where no uops are being
        delivered due to a lack of required resources for accepting new uops in
        the Backend]
  tma_bad_speculation
       [This category represents fraction of slots wasted due to incorrect
        speculations]
  tma_frontend_bound
       [This category represents fraction of slots where the processor's
        Frontend undersupplies its Backend]
  tma_retiring
       [This category represents fraction of slots utilized by useful work i.e.
        issued uops that eventually get retired]

TopdownL2: [Metrics for top-down breakdown at level 2]
  tma_branch_mispredicts
       [This metric represents fraction of slots the CPU has wasted due to
        Branch Misprediction]
  tma_core_bound
       [This metric represents fraction of slots where Core non-memory issues
        were of a bottleneck]
  tma_fetch_bandwidth
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend bandwidth issues]
  tma_fetch_latency
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend latency issues]
  tma_heavy_operations
       [This metric represents fraction of slots where the CPU was retiring
        heavy-weight operations -- instructions that require two or more uops or
        micro-coded sequences]
  tma_ifetch_bandwidth
       [Counts the number of issue slots that were not delivered by the frontend
        due to frontend bandwidth restrictions due to decode,predecode,cisc,and
        other limitations]
  tma_ifetch_latency
       [Counts the number of issue slots that were not delivered by the frontend
        due to frontend latency restrictions due to icache misses,itlb misses,
        branch detection,and resteer limitations]
  tma_light_operations
       [This metric represents fraction of slots where the CPU was retiring
        light-weight operations -- instructions that require no more than one
        uop (micro-operation)]
  tma_machine_clears
       [This metric represents fraction of slots the CPU has wasted due to
        Machine Clears]
  tma_memory_bound
       [This metric represents fraction of slots the Memory subsystem within the
        Backend was a bottleneck]
  tma_resource_bound
       [Counts the number of cycles the core is stalled due to a resource
        limitation]

TopdownL3: [Metrics for top-down breakdown at level 3]
  tma_allocation_restriction
       [Counts the number of issue slots that were not consumed by the backend
        due to certain allocation restrictions]
  tma_branch_detect
       [Counts the number of issue slots that were not delivered by the frontend
        due to BACLEARS,which occurs when the Branch Target Buffer (BTB)
        prediction or lack thereof,was corrected by a later branch predictor in
        the frontend]
  tma_branch_resteer
       [Counts the number of issue slots that were not delivered by the frontend
        due to BTCLEARS,which occurs when the Branch Target Buffer (BTB)
        predicts a taken branch]
  tma_branch_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers]
  tma_cisc
       [Counts the number of issue slots that were not delivered by the frontend
        due to the microcode sequencer (MS)]
  tma_decode
       [Counts the number of issue slots that were not delivered by the frontend
        due to decode stalls]
  tma_divider
       [This metric represents fraction of cycles where the Divider unit was
        active]
  tma_dram_bound
       [This metric estimates how often the CPU was stalled on accesses to
        external memory (DRAM) by loads]
  tma_dsb
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to DSB (decoded uop cache) fetch pipeline]
  tma_dsb_switches
       [This metric represents fraction of cycles the CPU was stalled due to
        switches from DSB to MITE pipelines]
  tma_fast_nuke
       [Counts the number of issue slots that were not consumed by the backend
        due to a machine clear that does not require the use of microcode,
        classified as a fast nuke,due to memory ordering,memory disambiguation
        and memory renaming]
  tma_few_uops_instructions
       [This metric represents fraction of slots where the CPU was retiring
        instructions that that are decoder into two or up to ([SNB+] four;
        [ADL+] five) uops]
  tma_fp_arith
       [This metric represents overall arithmetic floating-point (FP) operations
        fraction the CPU has executed (retired)]
  tma_fused_instructions
       [This metric represents fraction of slots where the CPU was retiring
        fused instructions -- where one uop can represent multiple contiguous
        instructions]
  tma_icache_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        instruction cache misses]
  tma_int_operations
       [This metric represents overall Integer (Int) select operations fraction
        the CPU has executed (retired)]
  tma_itlb_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        Instruction TLB (ITLB) misses]
  tma_l1_bound
       [This metric estimates how often the CPU was stalled without loads
        missing the L1 data cache]
  tma_l2_bound
       [This metric estimates how often the CPU was stalled due to L2 cache
        accesses by loads]
  tma_l3_bound
       [This metric estimates how often the CPU was stalled due to loads
        accesses to L3 cache or contended with a sibling Core]
  tma_lcp
       [This metric represents fraction of cycles CPU was stalled due to Length
        Changing Prefixes (LCPs)]
  tma_lsd
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to LSD (Loop Stream Detector) unit]
  tma_mem_scheduler
       [Counts the number of issue slots that were not consumed by the backend
        due to memory reservation stalls in which a scheduler is not able to
        accept uops]
  tma_memory_operations
       [This metric represents fraction of slots where the CPU was retiring
        memory operations -- uops for memory load or store accesses]
  tma_microcode_sequencer
       [This metric represents fraction of slots the CPU was retiring uops
        fetched by the Microcode Sequencer (MS) unit]
  tma_mite
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to the MITE pipeline (the legacy decode pipeline)]
  tma_ms_switches
       [This metric estimates the fraction of cycles when the CPU was stalled
        due to switches of uop delivery to the Microcode Sequencer (MS)]
  tma_non_fused_branches
       [This metric represents fraction of slots where the CPU was retiring
        branch instructions that were not fused]
  tma_non_mem_scheduler
       [Counts the number of issue slots that were not consumed by the backend
        due to IEC or FPC RAT stalls,which can be due to FIQ or IEC reservation
        stalls in which the integer,floating point or SIMD scheduler is not able
        to accept uops]
  tma_nuke
       [Counts the number of issue slots that were not consumed by the backend
        due to a machine clear that requires the use of microcode (slow nuke)]
  tma_other_fb
       [Counts the number of issue slots that were not delivered by the frontend
        due to other common frontend stalls not categorized]
  tma_other_light_ops
       [This metric represents the remaining light uops fraction the CPU has
        executed - remaining means not covered by other sibling nodes]
  tma_other_mispredicts
       [This metric estimates fraction of slots the CPU was stalled due to other
        cases of misprediction (non-retired x86 branches or other types)]
  tma_other_nukes
       [This metric represents fraction of slots the CPU has wasted due to Nukes
        (Machine Clears) not related to memory ordering]
  tma_ports_utilization
       [This metric estimates fraction of cycles the CPU performance was
        potentially limited due to Core computation issues (non divider-related)]
  tma_predecode
       [Counts the number of issue slots that were not delivered by the frontend
        due to wrong predecodes]
  tma_register
       [Counts the number of issue slots that were not consumed by the backend
        due to the physical register file unable to accept an entry (marble
        stalls)]
  tma_reorder_buffer
       [Counts the number of issue slots that were not consumed by the backend
        due to the reorder buffer being full (ROB stalls)]
  tma_serialization
       [Counts the number of issue slots that were not consumed by the backend
        due to scoreboards from the instruction queue (IQ),jump execution unit
        (JEU),or microcode sequencer (MS)]
  tma_serializing_operation
       [This metric represents fraction of cycles the CPU issue-pipeline was
        stalled due to serializing operations]
  tma_store_bound
       [This metric estimates how often CPU was stalled due to RFO store memory
        accesses; RFO store issue a read-for-ownership request before the write]

TopdownL4: [Metrics for top-down breakdown at level 4]
  tma_assists
       [This metric estimates fraction of slots the CPU retired uops delivered
        by the Microcode_Sequencer as a result of Assists]
  tma_c01_wait
       [This metric represents fraction of cycles the CPU was stalled due
        staying in C0.1 power-performance optimized state (Faster wakeup time;
        Smaller power savings)]
  tma_c02_wait
       [This metric represents fraction of cycles the CPU was stalled due
        staying in C0.2 power-performance optimized state (Slower wakeup time;
        Larger power savings)]
  tma_cisc
       [This metric estimates fraction of cycles the CPU retired uops originated
        from CISC (complex instruction set computer) instruction]
  tma_clears_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Machine Clears]
  tma_contested_accesses
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to contested accesses]
  tma_data_sharing
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to data-sharing accesses]
  tma_decoder0_alone
       [This metric represents fraction of cycles where decoder-0 was the only
        active decoder]
  tma_dtlb_load
       [This metric roughly estimates the fraction of cycles where the Data TLB
        (DTLB) was missed by load accesses]
  tma_dtlb_store
       [This metric roughly estimates the fraction of cycles spent handling
        first-level data TLB store misses]
  tma_false_sharing
       [This metric roughly estimates how often CPU was handling
        synchronizations due to False Sharing]
  tma_fb_full
       [This metric does a *rough estimation* of how often L1D Fill Buffer
        unavailability limited additional L1D miss memory access requests to
        proceed]
  tma_fp_scalar
       [This metric approximates arithmetic floating-point (FP) scalar uops
        fraction the CPU has retired]
  tma_fp_vector
       [This metric approximates arithmetic floating-point (FP) vector uops
        fraction the CPU has retired aggregated across all vector widths]
  tma_int_vector_128b
       [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_int_vector_256b
       [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_l1_hit_latency
       [This metric roughly estimates fraction of cycles with demand load
        accesses that hit the L1 cache]
  tma_l3_hit_latency
       [This metric estimates fraction of cycles with demand load accesses that
        hit the L3 cache under unloaded scenarios (possibly L3 latency limited)]
  tma_lock_latency
       [This metric represents fraction of cycles the CPU spent handling cache
        misses due to lock operations]
  tma_mem_bandwidth
       [This metric estimates fraction of cycles where the core's performance
        was likely hurt due to approaching bandwidth limits of external memory -
        DRAM ([SPR-HBM] and/or HBM)]
  tma_mem_latency
       [This metric estimates fraction of cycles where the performance was
        likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or
        HBM)]
  tma_memory_fence
       [This metric represents fraction of cycles the CPU was stalled due to
        LFENCE Instructions]
  tma_mispredicts_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Branch Misprediction at execution stage]
  tma_nop_instructions
       [This metric represents fraction of slots where the CPU was retiring NOP
        (no op) instructions]
  tma_ports_utilized_0
       [This metric represents fraction of cycles CPU executed no uops on any
        execution port (Logical Processor cycles since ICL,Physical Core cycles
        otherwise)]
  tma_ports_utilized_1
       [This metric represents fraction of cycles where the CPU executed total
        of 1 uop per cycle on all execution ports (Logical Processor cycles
        since ICL,Physical Core cycles otherwise)]
  tma_ports_utilized_2
       [This metric represents fraction of cycles CPU executed total of 2 uops
        per cycle on all execution ports (Logical Processor cycles since ICL,
        Physical Core cycles otherwise)]
  tma_ports_utilized_3m
       [This metric represents fraction of cycles CPU executed total of 3 or
        more uops per cycle on all execution ports (Logical Processor cycles
        since ICL,Physical Core cycles otherwise)]
  tma_shuffles_256b
       [This metric represents fraction of slots where the CPU was retiring
        Shuffle operations of 256-bit vector size (FP or Integer)]
  tma_slow_pause
       [This metric represents fraction of cycles the CPU was stalled due to
        PAUSE Instructions]
  tma_split_loads
       [This metric estimates fraction of cycles handling memory load split
        accesses - load that cross 64-byte cache line boundary]
  tma_split_stores
       [This metric represents rate of split store accesses]
  tma_sq_full
       [This metric measures fraction of cycles where the Super Queue (SQ) was
        full taking into account all request-types and both hardware SMT threads
        (Logical Processors)]
  tma_store_fwd_blk
       [This metric roughly estimates fraction of cycles when the memory
        subsystem had loads blocked since they could not forward data from
        earlier (in program order) overlapping stores]
  tma_store_latency
       [This metric estimates fraction of cycles the CPU spent handling L1D
        store misses]
  tma_streaming_stores
       [This metric estimates how often CPU was stalled due to Streaming store
        memory accesses; Streaming store optimize out a read request required by
        RFO stores]
  tma_unknown_branches
       [This metric represents fraction of cycles the CPU was stalled due to new
        branch address clears]
  tma_x87_use
       [This metric serves as an approximation of legacy x87 usage]

TopdownL5: [Metrics for top-down breakdown at level 5]
  tma_alu_op_utilization
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution ports for ALU operations]
  tma_avx_assists
       [This metric estimates fraction of slots the CPU retired uops as a result
        of handing SSE to AVX* or AVX* to SSE transition Assists]
  tma_fp_assists
       [This metric roughly estimates fraction of slots the CPU retired uops as
        a result of handing Floating Point (FP) Assists]
  tma_fp_vector_128b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 128-bit wide vectors]
  tma_fp_vector_256b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 256-bit wide vectors]
  tma_load_op_utilization
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port for Load operations]
  tma_load_stlb_hit
       [This metric roughly estimates the fraction of cycles where the (first
        level) DTLB was missed by load accesses,that later on hit in
        second-level TLB (STLB)]
  tma_load_stlb_miss
       [This metric estimates the fraction of cycles where the Second-level TLB
        (STLB) was missed by load accesses,performing a hardware page walk]
  tma_mixing_vectors
       [This metric estimates penalty in terms of percentage of([SKL+] injected
        blend uops out of all Uops Issued -- the Count Domain; [ADL+] cycles)]
  tma_page_faults
       [This metric roughly estimates fraction of slots the CPU retired uops as
        a result of handing Page Faults]
  tma_store_op_utilization
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port for Store operations]
  tma_store_stlb_hit
       [This metric roughly estimates the fraction of cycles where the TLB was
        missed by store accesses,hitting in the second-level TLB (STLB)]
  tma_store_stlb_miss
       [This metric estimates the fraction of cycles where the STLB was missed
        by store accesses,performing a hardware page walk]

TopdownL6: [Metrics for top-down breakdown at level 6]
  tma_port_0
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch)]
  tma_port_1
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 1 (ALU)]
  tma_port_6
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 6 ([HSW+] Primary Branch and simple ALU)]

load_store_bound: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
  tma_info_load_miss_bound_%_loadmissbound_with_l2hit
       [Percentage of memory bound stalls where retirement is stalled due to an
        L1 miss that hit the L2]
  tma_info_load_miss_bound_%_loadmissbound_with_l3hit
       [Percentage of memory bound stalls where retirement is stalled due to an
        L1 miss that hit the L3]
  tma_info_load_miss_bound_%_loadmissbound_with_l3miss
       [Percentage of memory bound stalls where retirement is stalled due to an
        L1 miss that subsequently misses the L3]
  tma_info_load_store_bound_l1_bound
       [Counts the number of cycles that the oldest load of the load buffer is
        stalled at retirement due to a pipeline block]
  tma_info_load_store_bound_load_bound
       [Counts the number of cycles that the oldest load of the load buffer is
        stalled at retirement]
  tma_info_load_store_bound_store_bound
       [Counts the number of cycles the core is stalled due to store buffer full]

smi:
  smi_cycles
       [Percentage of cycles spent in System Management Interrupts]
  smi_num
       [Number of SMI interrupts]

tma_L1_group: [Metrics for top-down breakdown at level 1]
  tma_backend_bound
       [This category represents fraction of slots where no uops are being
        delivered due to a lack of required resources for accepting new uops in
        the Backend]
  tma_bad_speculation
       [This category represents fraction of slots wasted due to incorrect
        speculations]
  tma_frontend_bound
       [This category represents fraction of slots where the processor's
        Frontend undersupplies its Backend]
  tma_info_core_coreipc
       [Instructions Per Cycle across hyper-threads (per physical core)]
  tma_info_inst_mix_instructions
       [Total number of retired Instructions]
  tma_info_thread_slots
       [Total issue-pipeline slots (per-Physical Core till ICL; per-Logical
        Processor ICL onward)]
  tma_info_thread_slots_utilization
       [Fraction of Physical Core issue-slots utilized by this Logical Processor]
  tma_retiring
       [This category represents fraction of slots utilized by useful work i.e.
        issued uops that eventually get retired]

tma_L2_group: [Metrics for top-down breakdown at level 2]
  tma_branch_mispredicts
       [This metric represents fraction of slots the CPU has wasted due to
        Branch Misprediction]
  tma_core_bound
       [This metric represents fraction of slots where Core non-memory issues
        were of a bottleneck]
  tma_fetch_bandwidth
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend bandwidth issues]
  tma_fetch_latency
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend latency issues]
  tma_heavy_operations
       [This metric represents fraction of slots where the CPU was retiring
        heavy-weight operations -- instructions that require two or more uops or
        micro-coded sequences]
  tma_ifetch_bandwidth
       [Counts the number of issue slots that were not delivered by the frontend
        due to frontend bandwidth restrictions due to decode,predecode,cisc,and
        other limitations]
  tma_ifetch_latency
       [Counts the number of issue slots that were not delivered by the frontend
        due to frontend latency restrictions due to icache misses,itlb misses,
        branch detection,and resteer limitations]
  tma_light_operations
       [This metric represents fraction of slots where the CPU was retiring
        light-weight operations -- instructions that require no more than one
        uop (micro-operation)]
  tma_machine_clears
       [This metric represents fraction of slots the CPU has wasted due to
        Machine Clears]
  tma_memory_bound
       [This metric represents fraction of slots the Memory subsystem within the
        Backend was a bottleneck]
  tma_resource_bound
       [Counts the number of cycles the core is stalled due to a resource
        limitation]

tma_L3_group: [Metrics for top-down breakdown at level 3]
  tma_allocation_restriction
       [Counts the number of issue slots that were not consumed by the backend
        due to certain allocation restrictions]
  tma_branch_detect
       [Counts the number of issue slots that were not delivered by the frontend
        due to BACLEARS,which occurs when the Branch Target Buffer (BTB)
        prediction or lack thereof,was corrected by a later branch predictor in
        the frontend]
  tma_branch_resteer
       [Counts the number of issue slots that were not delivered by the frontend
        due to BTCLEARS,which occurs when the Branch Target Buffer (BTB)
        predicts a taken branch]
  tma_branch_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers]
  tma_cisc
       [Counts the number of issue slots that were not delivered by the frontend
        due to the microcode sequencer (MS)]
  tma_decode
       [Counts the number of issue slots that were not delivered by the frontend
        due to decode stalls]
  tma_divider
       [This metric represents fraction of cycles where the Divider unit was
        active]
  tma_dram_bound
       [This metric estimates how often the CPU was stalled on accesses to
        external memory (DRAM) by loads]
  tma_dsb
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to DSB (decoded uop cache) fetch pipeline]
  tma_dsb_switches
       [This metric represents fraction of cycles the CPU was stalled due to
        switches from DSB to MITE pipelines]
  tma_fast_nuke
       [Counts the number of issue slots that were not consumed by the backend
        due to a machine clear that does not require the use of microcode,
        classified as a fast nuke,due to memory ordering,memory disambiguation
        and memory renaming]
  tma_few_uops_instructions
       [This metric represents fraction of slots where the CPU was retiring
        instructions that that are decoder into two or up to ([SNB+] four;
        [ADL+] five) uops]
  tma_fp_arith
       [This metric represents overall arithmetic floating-point (FP) operations
        fraction the CPU has executed (retired)]
  tma_fused_instructions
       [This metric represents fraction of slots where the CPU was retiring
        fused instructions -- where one uop can represent multiple contiguous
        instructions]
  tma_icache_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        instruction cache misses]
  tma_int_operations
       [This metric represents overall Integer (Int) select operations fraction
        the CPU has executed (retired)]
  tma_itlb_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        Instruction TLB (ITLB) misses]
  tma_l1_bound
       [This metric estimates how often the CPU was stalled without loads
        missing the L1 data cache]
  tma_l2_bound
       [This metric estimates how often the CPU was stalled due to L2 cache
        accesses by loads]
  tma_l3_bound
       [This metric estimates how often the CPU was stalled due to loads
        accesses to L3 cache or contended with a sibling Core]
  tma_lcp
       [This metric represents fraction of cycles CPU was stalled due to Length
        Changing Prefixes (LCPs)]
  tma_lsd
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to LSD (Loop Stream Detector) unit]
  tma_mem_scheduler
       [Counts the number of issue slots that were not consumed by the backend
        due to memory reservation stalls in which a scheduler is not able to
        accept uops]
  tma_memory_operations
       [This metric represents fraction of slots where the CPU was retiring
        memory operations -- uops for memory load or store accesses]
  tma_microcode_sequencer
       [This metric represents fraction of slots the CPU was retiring uops
        fetched by the Microcode Sequencer (MS) unit]
  tma_mite
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to the MITE pipeline (the legacy decode pipeline)]
  tma_ms_switches
       [This metric estimates the fraction of cycles when the CPU was stalled
        due to switches of uop delivery to the Microcode Sequencer (MS)]
  tma_non_fused_branches
       [This metric represents fraction of slots where the CPU was retiring
        branch instructions that were not fused]
  tma_non_mem_scheduler
       [Counts the number of issue slots that were not consumed by the backend
        due to IEC or FPC RAT stalls,which can be due to FIQ or IEC reservation
        stalls in which the integer,floating point or SIMD scheduler is not able
        to accept uops]
  tma_nuke
       [Counts the number of issue slots that were not consumed by the backend
        due to a machine clear that requires the use of microcode (slow nuke)]
  tma_other_fb
       [Counts the number of issue slots that were not delivered by the frontend
        due to other common frontend stalls not categorized]
  tma_other_light_ops
       [This metric represents the remaining light uops fraction the CPU has
        executed - remaining means not covered by other sibling nodes]
  tma_other_mispredicts
       [This metric estimates fraction of slots the CPU was stalled due to other
        cases of misprediction (non-retired x86 branches or other types)]
  tma_other_nukes
       [This metric represents fraction of slots the CPU has wasted due to Nukes
        (Machine Clears) not related to memory ordering]
  tma_ports_utilization
       [This metric estimates fraction of cycles the CPU performance was
        potentially limited due to Core computation issues (non divider-related)]
  tma_predecode
       [Counts the number of issue slots that were not delivered by the frontend
        due to wrong predecodes]
  tma_register
       [Counts the number of issue slots that were not consumed by the backend
        due to the physical register file unable to accept an entry (marble
        stalls)]
  tma_reorder_buffer
       [Counts the number of issue slots that were not consumed by the backend
        due to the reorder buffer being full (ROB stalls)]
  tma_serialization
       [Counts the number of issue slots that were not consumed by the backend
        due to scoreboards from the instruction queue (IQ),jump execution unit
        (JEU),or microcode sequencer (MS)]
  tma_serializing_operation
       [This metric represents fraction of cycles the CPU issue-pipeline was
        stalled due to serializing operations]
  tma_store_bound
       [This metric estimates how often CPU was stalled due to RFO store memory
        accesses; RFO store issue a read-for-ownership request before the write]

tma_L4_group: [Metrics for top-down breakdown at level 4]
  tma_assists
       [This metric estimates fraction of slots the CPU retired uops delivered
        by the Microcode_Sequencer as a result of Assists]
  tma_c01_wait
       [This metric represents fraction of cycles the CPU was stalled due
        staying in C0.1 power-performance optimized state (Faster wakeup time;
        Smaller power savings)]
  tma_c02_wait
       [This metric represents fraction of cycles the CPU was stalled due
        staying in C0.2 power-performance optimized state (Slower wakeup time;
        Larger power savings)]
  tma_cisc
       [This metric estimates fraction of cycles the CPU retired uops originated
        from CISC (complex instruction set computer) instruction]
  tma_clears_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Machine Clears]
  tma_contested_accesses
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to contested accesses]
  tma_data_sharing
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to data-sharing accesses]
  tma_decoder0_alone
       [This metric represents fraction of cycles where decoder-0 was the only
        active decoder]
  tma_dtlb_load
       [This metric roughly estimates the fraction of cycles where the Data TLB
        (DTLB) was missed by load accesses]
  tma_dtlb_store
       [This metric roughly estimates the fraction of cycles spent handling
        first-level data TLB store misses]
  tma_false_sharing
       [This metric roughly estimates how often CPU was handling
        synchronizations due to False Sharing]
  tma_fb_full
       [This metric does a *rough estimation* of how often L1D Fill Buffer
        unavailability limited additional L1D miss memory access requests to
        proceed]
  tma_fp_scalar
       [This metric approximates arithmetic floating-point (FP) scalar uops
        fraction the CPU has retired]
  tma_fp_vector
       [This metric approximates arithmetic floating-point (FP) vector uops
        fraction the CPU has retired aggregated across all vector widths]
  tma_int_vector_128b
       [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_int_vector_256b
       [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_l1_hit_latency
       [This metric roughly estimates fraction of cycles with demand load
        accesses that hit the L1 cache]
  tma_l3_hit_latency
       [This metric estimates fraction of cycles with demand load accesses that
        hit the L3 cache under unloaded scenarios (possibly L3 latency limited)]
  tma_lock_latency
       [This metric represents fraction of cycles the CPU spent handling cache
        misses due to lock operations]
  tma_mem_bandwidth
       [This metric estimates fraction of cycles where the core's performance
        was likely hurt due to approaching bandwidth limits of external memory -
        DRAM ([SPR-HBM] and/or HBM)]
  tma_mem_latency
       [This metric estimates fraction of cycles where the performance was
        likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or
        HBM)]
  tma_memory_fence
       [This metric represents fraction of cycles the CPU was stalled due to
        LFENCE Instructions]
  tma_mispredicts_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Branch Misprediction at execution stage]
  tma_nop_instructions
       [This metric represents fraction of slots where the CPU was retiring NOP
        (no op) instructions]
  tma_ports_utilized_0
       [This metric represents fraction of cycles CPU executed no uops on any
        execution port (Logical Processor cycles since ICL,Physical Core cycles
        otherwise)]
  tma_ports_utilized_1
       [This metric represents fraction of cycles where the CPU executed total
        of 1 uop per cycle on all execution ports (Logical Processor cycles
        since ICL,Physical Core cycles otherwise)]
  tma_ports_utilized_2
       [This metric represents fraction of cycles CPU executed total of 2 uops
        per cycle on all execution ports (Logical Processor cycles since ICL,
        Physical Core cycles otherwise)]
  tma_ports_utilized_3m
       [This metric represents fraction of cycles CPU executed total of 3 or
        more uops per cycle on all execution ports (Logical Processor cycles
        since ICL,Physical Core cycles otherwise)]
  tma_shuffles_256b
       [This metric represents fraction of slots where the CPU was retiring
        Shuffle operations of 256-bit vector size (FP or Integer)]
  tma_slow_pause
       [This metric represents fraction of cycles the CPU was stalled due to
        PAUSE Instructions]
  tma_split_loads
       [This metric estimates fraction of cycles handling memory load split
        accesses - load that cross 64-byte cache line boundary]
  tma_split_stores
       [This metric represents rate of split store accesses]
  tma_sq_full
       [This metric measures fraction of cycles where the Super Queue (SQ) was
        full taking into account all request-types and both hardware SMT threads
        (Logical Processors)]
  tma_store_fwd_blk
       [This metric roughly estimates fraction of cycles when the memory
        subsystem had loads blocked since they could not forward data from
        earlier (in program order) overlapping stores]
  tma_store_latency
       [This metric estimates fraction of cycles the CPU spent handling L1D
        store misses]
  tma_streaming_stores
       [This metric estimates how often CPU was stalled due to Streaming store
        memory accesses; Streaming store optimize out a read request required by
        RFO stores]
  tma_unknown_branches
       [This metric represents fraction of cycles the CPU was stalled due to new
        branch address clears]
  tma_x87_use
       [This metric serves as an approximation of legacy x87 usage]

tma_L5_group: [Metrics for top-down breakdown at level 5]
  tma_alu_op_utilization
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution ports for ALU operations]
  tma_avx_assists
       [This metric estimates fraction of slots the CPU retired uops as a result
        of handing SSE to AVX* or AVX* to SSE transition Assists]
  tma_fp_assists
       [This metric roughly estimates fraction of slots the CPU retired uops as
        a result of handing Floating Point (FP) Assists]
  tma_fp_vector_128b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 128-bit wide vectors]
  tma_fp_vector_256b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 256-bit wide vectors]
  tma_load_op_utilization
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port for Load operations]
  tma_load_stlb_hit
       [This metric roughly estimates the fraction of cycles where the (first
        level) DTLB was missed by load accesses,that later on hit in
        second-level TLB (STLB)]
  tma_load_stlb_miss
       [This metric estimates the fraction of cycles where the Second-level TLB
        (STLB) was missed by load accesses,performing a hardware page walk]
  tma_mixing_vectors
       [This metric estimates penalty in terms of percentage of([SKL+] injected
        blend uops out of all Uops Issued -- the Count Domain; [ADL+] cycles)]
  tma_page_faults
       [This metric roughly estimates fraction of slots the CPU retired uops as
        a result of handing Page Faults]
  tma_store_op_utilization
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port for Store operations]
  tma_store_stlb_hit
       [This metric roughly estimates the fraction of cycles where the TLB was
        missed by store accesses,hitting in the second-level TLB (STLB)]
  tma_store_stlb_miss
       [This metric estimates the fraction of cycles where the STLB was missed
        by store accesses,performing a hardware page walk]

tma_L6_group: [Metrics for top-down breakdown at level 6]
  tma_port_0
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch)]
  tma_port_1
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 1 (ALU)]
  tma_port_6
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 6 ([HSW+] Primary Branch and simple ALU)]

tma_alu_op_utilization_group: [Metrics contributing to tma_alu_op_utilization category]
  tma_port_0
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch)]
  tma_port_1
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 1 (ALU)]
  tma_port_6
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 6 ([HSW+] Primary Branch and simple ALU)]

tma_assists_group: [Metrics contributing to tma_assists category]
  tma_avx_assists
       [This metric estimates fraction of slots the CPU retired uops as a result
        of handing SSE to AVX* or AVX* to SSE transition Assists]
  tma_fp_assists
       [This metric roughly estimates fraction of slots the CPU retired uops as
        a result of handing Floating Point (FP) Assists]
  tma_page_faults
       [This metric roughly estimates fraction of slots the CPU retired uops as
        a result of handing Page Faults]

tma_backend_bound_group: [Metrics contributing to tma_backend_bound category]
  tma_core_bound
       [This metric represents fraction of slots where Core non-memory issues
        were of a bottleneck]
  tma_memory_bound
       [This metric represents fraction of slots the Memory subsystem within the
        Backend was a bottleneck]
  tma_resource_bound
       [Counts the number of cycles the core is stalled due to a resource
        limitation]

tma_bad_speculation_group: [Metrics contributing to tma_bad_speculation category]
  tma_branch_mispredicts
       [This metric represents fraction of slots the CPU has wasted due to
        Branch Misprediction]
  tma_machine_clears
       [This metric represents fraction of slots the CPU has wasted due to
        Machine Clears]

tma_branch_mispredicts_group: [Metrics contributing to tma_branch_mispredicts category]
  tma_other_mispredicts
       [This metric estimates fraction of slots the CPU was stalled due to other
        cases of misprediction (non-retired x86 branches or other types)]

tma_branch_resteers_group: [Metrics contributing to tma_branch_resteers category]
  tma_clears_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Machine Clears]
  tma_mispredicts_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Branch Misprediction at execution stage]
  tma_unknown_branches
       [This metric represents fraction of cycles the CPU was stalled due to new
        branch address clears]

tma_core_bound_group: [Metrics contributing to tma_core_bound category]
  tma_allocation_restriction
       [Counts the number of issue slots that were not consumed by the backend
        due to certain allocation restrictions]
  tma_divider
       [This metric represents fraction of cycles where the Divider unit was
        active]
  tma_ports_utilization
       [This metric estimates fraction of cycles the CPU performance was
        potentially limited due to Core computation issues (non divider-related)]
  tma_serializing_operation
       [This metric represents fraction of cycles the CPU issue-pipeline was
        stalled due to serializing operations]

tma_dram_bound_group: [Metrics contributing to tma_dram_bound category]
  tma_mem_bandwidth
       [This metric estimates fraction of cycles where the core's performance
        was likely hurt due to approaching bandwidth limits of external memory -
        DRAM ([SPR-HBM] and/or HBM)]
  tma_mem_latency
       [This metric estimates fraction of cycles where the performance was
        likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or
        HBM)]

tma_dtlb_load_group: [Metrics contributing to tma_dtlb_load category]
  tma_load_stlb_hit
       [This metric roughly estimates the fraction of cycles where the (first
        level) DTLB was missed by load accesses,that later on hit in
        second-level TLB (STLB)]
  tma_load_stlb_miss
       [This metric estimates the fraction of cycles where the Second-level TLB
        (STLB) was missed by load accesses,performing a hardware page walk]

tma_dtlb_store_group: [Metrics contributing to tma_dtlb_store category]
  tma_store_stlb_hit
       [This metric roughly estimates the fraction of cycles where the TLB was
        missed by store accesses,hitting in the second-level TLB (STLB)]
  tma_store_stlb_miss
       [This metric estimates the fraction of cycles where the STLB was missed
        by store accesses,performing a hardware page walk]

tma_fetch_bandwidth_group: [Metrics contributing to tma_fetch_bandwidth category]
  tma_dsb
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to DSB (decoded uop cache) fetch pipeline]
  tma_lsd
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to LSD (Loop Stream Detector) unit]
  tma_mite
       [This metric represents Core fraction of cycles in which CPU was likely
        limited due to the MITE pipeline (the legacy decode pipeline)]

tma_fetch_latency_group: [Metrics contributing to tma_fetch_latency category]
  tma_branch_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers]
  tma_dsb_switches
       [This metric represents fraction of cycles the CPU was stalled due to
        switches from DSB to MITE pipelines]
  tma_icache_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        instruction cache misses]
  tma_itlb_misses
       [This metric represents fraction of cycles the CPU was stalled due to
        Instruction TLB (ITLB) misses]
  tma_lcp
       [This metric represents fraction of cycles CPU was stalled due to Length
        Changing Prefixes (LCPs)]
  tma_ms_switches
       [This metric estimates the fraction of cycles when the CPU was stalled
        due to switches of uop delivery to the Microcode Sequencer (MS)]

tma_fp_arith_group: [Metrics contributing to tma_fp_arith category]
  tma_fp_scalar
       [This metric approximates arithmetic floating-point (FP) scalar uops
        fraction the CPU has retired]
  tma_fp_vector
       [This metric approximates arithmetic floating-point (FP) vector uops
        fraction the CPU has retired aggregated across all vector widths]
  tma_x87_use
       [This metric serves as an approximation of legacy x87 usage]

tma_fp_vector_group: [Metrics contributing to tma_fp_vector category]
  tma_fp_vector_128b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 128-bit wide vectors]
  tma_fp_vector_256b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 256-bit wide vectors]

tma_frontend_bound_group: [Metrics contributing to tma_frontend_bound category]
  tma_fetch_bandwidth
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend bandwidth issues]
  tma_fetch_latency
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend latency issues]
  tma_ifetch_bandwidth
       [Counts the number of issue slots that were not delivered by the frontend
        due to frontend bandwidth restrictions due to decode,predecode,cisc,and
        other limitations]
  tma_ifetch_latency
       [Counts the number of issue slots that were not delivered by the frontend
        due to frontend latency restrictions due to icache misses,itlb misses,
        branch detection,and resteer limitations]

tma_heavy_operations_group: [Metrics contributing to tma_heavy_operations category]
  tma_few_uops_instructions
       [This metric represents fraction of slots where the CPU was retiring
        instructions that that are decoder into two or up to ([SNB+] four;
        [ADL+] five) uops]
  tma_microcode_sequencer
       [This metric represents fraction of slots the CPU was retiring uops
        fetched by the Microcode Sequencer (MS) unit]

tma_ifetch_bandwidth_group: [Metrics contributing to tma_ifetch_bandwidth category]
  tma_cisc
       [Counts the number of issue slots that were not delivered by the frontend
        due to the microcode sequencer (MS)]
  tma_decode
       [Counts the number of issue slots that were not delivered by the frontend
        due to decode stalls]
  tma_other_fb
       [Counts the number of issue slots that were not delivered by the frontend
        due to other common frontend stalls not categorized]
  tma_predecode
       [Counts the number of issue slots that were not delivered by the frontend
        due to wrong predecodes]

tma_ifetch_latency_group: [Metrics contributing to tma_ifetch_latency category]
  tma_branch_detect
       [Counts the number of issue slots that were not delivered by the frontend
        due to BACLEARS,which occurs when the Branch Target Buffer (BTB)
        prediction or lack thereof,was corrected by a later branch predictor in
        the frontend]
  tma_branch_resteer
       [Counts the number of issue slots that were not delivered by the frontend
        due to BTCLEARS,which occurs when the Branch Target Buffer (BTB)
        predicts a taken branch]
  tma_icache_misses
       [Counts the number of issue slots that were not delivered by the frontend
        due to instruction cache misses]
  tma_itlb_misses
       [Counts the number of issue slots that were not delivered by the frontend
        due to Instruction Table Lookaside Buffer (ITLB) misses]

tma_info_bottleneck_%_dtlb_miss_bound_cycles:
  tma_info_bottleneck_%_dtlb_miss_bound_cycles
       [Percentage of time that retirement is stalled due to a first level data
        TLB miss]

tma_info_br_inst_mix_ipbranch:
  tma_info_br_inst_mix_ipbranch
       [Instructions per Branch (lower number means higher occurrence rate)]

tma_info_br_inst_mix_ipcall:
  tma_info_br_inst_mix_ipcall
       [Instruction per (near) call (lower number means higher occurrence rate)]

tma_info_br_inst_mix_ipfarbranch:
  tma_info_br_inst_mix_ipfarbranch
       [Instructions per Far Branch ( Far Branches apply upon transition from
        application to operating system,handling interrupts,exceptions) [lower
        number means higher occurrence rate]]

tma_info_br_inst_mix_ipmisp_cond_ntaken:
  tma_info_br_inst_mix_ipmisp_cond_ntaken
       [Instructions per retired conditional Branch Misprediction where the
        branch was not taken]

tma_info_br_inst_mix_ipmisp_cond_taken:
  tma_info_br_inst_mix_ipmisp_cond_taken
       [Instructions per retired conditional Branch Misprediction where the
        branch was taken]

tma_info_br_inst_mix_ipmisp_indirect:
  tma_info_br_inst_mix_ipmisp_indirect
       [Instructions per retired indirect call or jump Branch Misprediction]

tma_info_br_inst_mix_ipmisp_ret:
  tma_info_br_inst_mix_ipmisp_ret
       [Instructions per retired return Branch Misprediction]

tma_info_br_inst_mix_ipmispredict:
  tma_info_br_inst_mix_ipmispredict
       [Instructions per retired Branch Misprediction]

tma_info_br_mispredict_bound_branch_mispredict_ratio:
  tma_info_br_mispredict_bound_branch_mispredict_ratio
       [Ratio of all branches which mispredict]

tma_info_br_mispredict_bound_branch_mispredict_to_unknown_branch_ratio:
  tma_info_br_mispredict_bound_branch_mispredict_to_unknown_branch_ratio
       [Ratio between Mispredicted branches and unknown branches]

tma_info_buffer_stalls_%_load_buffer_stall_cycles:
  tma_info_buffer_stalls_%_load_buffer_stall_cycles
       [Percentage of time that allocation is stalled due to load buffer full]

tma_info_buffer_stalls_%_mem_rsv_stall_cycles:
  tma_info_buffer_stalls_%_mem_rsv_stall_cycles
       [Percentage of time that allocation is stalled due to memory reservation
        stations full]

tma_info_buffer_stalls_%_store_buffer_stall_cycles:
  tma_info_buffer_stalls_%_store_buffer_stall_cycles
       [Percentage of time that allocation is stalled due to store buffer full]

tma_info_core_cpi:
  tma_info_core_cpi
       [Cycles Per Instruction]

tma_info_core_ipc:
  tma_info_core_ipc
       [Instructions Per Cycle]

tma_info_core_upi:
  tma_info_core_upi
       [Uops Per Instruction]

tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l2hit:
  tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l2hit
       [Percentage of ifetch miss bound stalls,where the ifetch miss hits in the
        L2]

tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l3hit:
  tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l3hit
       [Percentage of ifetch miss bound stalls,where the ifetch miss hits in the
        L3]

tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l3miss:
  tma_info_ifetch_miss_bound_%_ifetchmissbound_with_l3miss
       [Percentage of ifetch miss bound stalls,where the ifetch miss
        subsequently misses in the L3]

tma_info_machine_clear_bound_machine_clears_disamb_pki:
  tma_info_machine_clear_bound_machine_clears_disamb_pki
       [Counts the number of machine clears relative to thousands of
        instructions retired,due to memory disambiguation]

tma_info_machine_clear_bound_machine_clears_fp_assist_pki:
  tma_info_machine_clear_bound_machine_clears_fp_assist_pki
       [Counts the number of machine clears relative to thousands of
        instructions retired,due to floating point assists]

tma_info_machine_clear_bound_machine_clears_monuke_pki:
  tma_info_machine_clear_bound_machine_clears_monuke_pki
       [Counts the number of machine clears relative to thousands of
        instructions retired,due to memory ordering]

tma_info_machine_clear_bound_machine_clears_mrn_pki:
  tma_info_machine_clear_bound_machine_clears_mrn_pki
       [Counts the number of machine clears relative to thousands of
        instructions retired,due to memory renaming]

tma_info_machine_clear_bound_machine_clears_page_fault_pki:
  tma_info_machine_clear_bound_machine_clears_page_fault_pki
       [Counts the number of machine clears relative to thousands of
        instructions retired,due to page faults]

tma_info_machine_clear_bound_machine_clears_smc_pki:
  tma_info_machine_clear_bound_machine_clears_smc_pki
       [Counts the number of machine clears relative to thousands of
        instructions retired,due to self-modifying code]

tma_info_mem_exec_blocks_%_loads_with_adressaliasing:
  tma_info_mem_exec_blocks_%_loads_with_adressaliasing
       [Percentage of total non-speculative loads with an address aliasing block]

tma_info_mem_exec_blocks_%_loads_with_storefwdblk:
  tma_info_mem_exec_blocks_%_loads_with_storefwdblk
       [Percentage of total non-speculative loads with a store forward or
        unknown store address block]

tma_info_mem_exec_bound_%_loadhead_with_l1miss:
  tma_info_mem_exec_bound_%_loadhead_with_l1miss
       [Percentage of Memory Execution Bound due to a first level data cache
        miss]

tma_info_mem_exec_bound_%_loadhead_with_otherpipelineblks:
  tma_info_mem_exec_bound_%_loadhead_with_otherpipelineblks
       [Percentage of Memory Execution Bound due to other block cases,such as
        pipeline conflicts,fences,etc]

tma_info_mem_exec_bound_%_loadhead_with_pagewalk:
  tma_info_mem_exec_bound_%_loadhead_with_pagewalk
       [Percentage of Memory Execution Bound due to a pagewalk]

tma_info_mem_exec_bound_%_loadhead_with_stlbhit:
  tma_info_mem_exec_bound_%_loadhead_with_stlbhit
       [Percentage of Memory Execution Bound due to a second level TLB miss]

tma_info_mem_exec_bound_%_loadhead_with_storefwding:
  tma_info_mem_exec_bound_%_loadhead_with_storefwding
       [Percentage of Memory Execution Bound due to a store forward address
        match]

tma_info_mem_mix_ipload:
  tma_info_mem_mix_ipload
       [Instructions per Load]

tma_info_mem_mix_ipstore:
  tma_info_mem_mix_ipstore
       [Instructions per Store]

tma_info_mem_mix_load_locks_ratio:
  tma_info_mem_mix_load_locks_ratio
       [Percentage of total non-speculative loads that perform one or more locks]

tma_info_mem_mix_load_splits_ratio:
  tma_info_mem_mix_load_splits_ratio
       [Percentage of total non-speculative loads that are splits]

tma_info_mem_mix_memload_ratio:
  tma_info_mem_mix_memload_ratio
       [Ratio of mem load uops to all uops]

tma_info_serialization _%_tpause_cycles:
  tma_info_serialization _%_tpause_cycles
       [Percentage of time that the core is stalled due to a TPAUSE or UMWAIT
        instruction]

tma_info_system_cpu_utilization:
  tma_info_system_cpu_utilization
       [Average CPU Utilization]

tma_info_uop_mix_fpdiv_uop_ratio:
  tma_info_uop_mix_fpdiv_uop_ratio
       [Percentage of all uops which are FPDiv uops]

tma_info_uop_mix_idiv_uop_ratio:
  tma_info_uop_mix_idiv_uop_ratio
       [Percentage of all uops which are IDiv uops]

tma_info_uop_mix_microcode_uop_ratio:
  tma_info_uop_mix_microcode_uop_ratio
       [Percentage of all uops which are microcode ops]

tma_info_uop_mix_x87_uop_ratio:
  tma_info_uop_mix_x87_uop_ratio
       [Percentage of all uops which are x87 uops]

tma_int_operations_group: [Metrics contributing to tma_int_operations category]
  tma_int_vector_128b
       [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_int_vector_256b
       [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]

tma_issue2P: [Metrics related by the issue $issue2P]
  tma_fp_scalar
       [This metric approximates arithmetic floating-point (FP) scalar uops
        fraction the CPU has retired]
  tma_fp_vector
       [This metric approximates arithmetic floating-point (FP) vector uops
        fraction the CPU has retired aggregated across all vector widths]
  tma_fp_vector_128b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 128-bit wide vectors]
  tma_fp_vector_256b
       [This metric approximates arithmetic FP vector uops fraction the CPU has
        retired for 256-bit wide vectors]
  tma_int_vector_128b
       [This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_int_vector_256b
       [This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI
        (Vector Neural Network Instructions) uops fraction the CPU has retired]
  tma_port_0
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch)]
  tma_port_1
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 1 (ALU)]
  tma_port_6
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port 6 ([HSW+] Primary Branch and simple ALU)]
  tma_ports_utilized_2
       [This metric represents fraction of cycles CPU executed total of 2 uops
        per cycle on all execution ports (Logical Processor cycles since ICL,
        Physical Core cycles otherwise)]

tma_issueBM: [Metrics related by the issue $issueBM]
  tma_branch_mispredicts
       [This metric represents fraction of slots the CPU has wasted due to
        Branch Misprediction]
  tma_info_bad_spec_branch_misprediction_cost
       [Branch Misprediction Cost: Fraction of TMA slots wasted per
        non-speculative branch misprediction (retired JEClear)]
  tma_info_bottleneck_mispredictions
       [Total pipeline cost of Branch Misprediction related bottlenecks]
  tma_mispredicts_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Branch Misprediction at execution stage]

tma_issueBW: [Metrics related by the issue $issueBW]
  tma_fb_full
       [This metric does a *rough estimation* of how often L1D Fill Buffer
        unavailability limited additional L1D miss memory access requests to
        proceed]
  tma_info_bottleneck_cache_memory_bandwidth
       [Total pipeline cost of external Memory- or Cache-Bandwidth related
        bottlenecks]
  tma_info_system_dram_bw_use
       [Average external Memory Bandwidth Use for reads and writes [GB / sec]]
  tma_mem_bandwidth
       [This metric estimates fraction of cycles where the core's performance
        was likely hurt due to approaching bandwidth limits of external memory -
        DRAM ([SPR-HBM] and/or HBM)]
  tma_sq_full
       [This metric measures fraction of cycles where the Super Queue (SQ) was
        full taking into account all request-types and both hardware SMT threads
        (Logical Processors)]

tma_issueComp: [Metrics related by the issue $issueComp]
  tma_info_bottleneck_compute_bound_est
       [Total pipeline cost when the execution is compute-bound - an estimation]

tma_issueD0: [Metrics related by the issue $issueD0]
  tma_decoder0_alone
       [This metric represents fraction of cycles where decoder-0 was the only
        active decoder]
  tma_few_uops_instructions
       [This metric represents fraction of slots where the CPU was retiring
        instructions that that are decoder into two or up to ([SNB+] four;
        [ADL+] five) uops]

tma_issueFB: [Metrics related by the issue $issueFB]
  tma_dsb_switches
       [This metric represents fraction of cycles the CPU was stalled due to
        switches from DSB to MITE pipelines]
  tma_fetch_bandwidth
       [This metric represents fraction of slots the CPU was stalled due to
        Frontend bandwidth issues]
  tma_info_botlnk_l2_dsb_bandwidth
       [Total pipeline cost of DSB (uop cache) hits - subset of the
        Instruction_Fetch_BW Bottleneck]
  tma_info_botlnk_l2_dsb_misses
       [Total pipeline cost of DSB (uop cache) misses - subset of the
        Instruction_Fetch_BW Bottleneck]
  tma_info_frontend_dsb_coverage
       [Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)]
  tma_info_inst_mix_iptb
       [Instructions per taken branch]
  tma_lcp
       [This metric represents fraction of cycles CPU was stalled due to Length
        Changing Prefixes (LCPs)]

tma_issueFL: [Metrics related by the issue $issueFL]
  tma_info_botlnk_l2_ic_misses
       [Total pipeline cost of Instruction Cache misses - subset of the Big_Code
        Bottleneck]

tma_issueL1: [Metrics related by the issue $issueL1]
  tma_l1_bound
       [This metric estimates how often the CPU was stalled without loads
        missing the L1 data cache]
  tma_ports_utilized_1
       [This metric represents fraction of cycles where the CPU executed total
        of 1 uop per cycle on all execution ports (Logical Processor cycles
        since ICL,Physical Core cycles otherwise)]

tma_issueLat: [Metrics related by the issue $issueLat]
  tma_info_bottleneck_cache_memory_latency
       [Total pipeline cost of external Memory- or Cache-Latency related
        bottlenecks]
  tma_l3_hit_latency
       [This metric estimates fraction of cycles with demand load accesses that
        hit the L3 cache under unloaded scenarios (possibly L3 latency limited)]
  tma_mem_latency
       [This metric estimates fraction of cycles where the performance was
        likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or
        HBM)]

tma_issueMC: [Metrics related by the issue $issueMC]
  tma_clears_resteers
       [This metric represents fraction of cycles the CPU was stalled due to
        Branch Resteers as a result of Machine Clears]
  tma_l1_bound
       [This metric estimates how often the CPU was stalled without loads
        missing the L1 data cache]
  tma_machine_clears
       [This metric represents fraction of slots the CPU has wasted due to
        Machine Clears]
  tma_microcode_sequencer
       [This metric represents fraction of slots the CPU was retiring uops
        fetched by the Microcode Sequencer (MS) unit]
  tma_ms_switches
       [This metric estimates the fraction of cycles when the CPU was stalled
        due to switches of uop delivery to the Microcode Sequencer (MS)]

tma_issueMS: [Metrics related by the issue $issueMS]
  tma_info_bottleneck_irregular_overhead
       [Total pipeline cost of irregular execution (e.g]
  tma_microcode_sequencer
       [This metric represents fraction of slots the CPU was retiring uops
        fetched by the Microcode Sequencer (MS) unit]
  tma_ms_switches
       [This metric estimates the fraction of cycles when the CPU was stalled
        due to switches of uop delivery to the Microcode Sequencer (MS)]

tma_issueMV: [Metrics related by the issue $issueMV]
  tma_mixing_vectors
       [This metric estimates penalty in terms of percentage of([SKL+] injected
        blend uops out of all Uops Issued -- the Count Domain; [ADL+] cycles)]
  tma_ms_switches
       [This metric estimates the fraction of cycles when the CPU was stalled
        due to switches of uop delivery to the Microcode Sequencer (MS)]

tma_issueRFO: [Metrics related by the issue $issueRFO]
  tma_lock_latency
       [This metric represents fraction of cycles the CPU spent handling cache
        misses due to lock operations]
  tma_store_latency
       [This metric estimates fraction of cycles the CPU spent handling L1D
        store misses]

tma_issueSL: [Metrics related by the issue $issueSL]
  tma_fb_full
       [This metric does a *rough estimation* of how often L1D Fill Buffer
        unavailability limited additional L1D miss memory access requests to
        proceed]
  tma_store_latency
       [This metric estimates fraction of cycles the CPU spent handling L1D
        store misses]

tma_issueSO: [Metrics related by the issue $issueSO]
  tma_ms_switches
       [This metric estimates the fraction of cycles when the CPU was stalled
        due to switches of uop delivery to the Microcode Sequencer (MS)]
  tma_serializing_operation
       [This metric represents fraction of cycles the CPU issue-pipeline was
        stalled due to serializing operations]

tma_issueSmSt: [Metrics related by the issue $issueSmSt]
  tma_fb_full
       [This metric does a *rough estimation* of how often L1D Fill Buffer
        unavailability limited additional L1D miss memory access requests to
        proceed]
  tma_streaming_stores
       [This metric estimates how often CPU was stalled due to Streaming store
        memory accesses; Streaming store optimize out a read request required by
        RFO stores]

tma_issueSpSt: [Metrics related by the issue $issueSpSt]
  tma_split_stores
       [This metric represents rate of split store accesses]

tma_issueSyncxn: [Metrics related by the issue $issueSyncxn]
  tma_contested_accesses
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to contested accesses]
  tma_data_sharing
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to data-sharing accesses]
  tma_false_sharing
       [This metric roughly estimates how often CPU was handling
        synchronizations due to False Sharing]
  tma_machine_clears
       [This metric represents fraction of slots the CPU has wasted due to
        Machine Clears]

tma_issueTLB: [Metrics related by the issue $issueTLB]
  tma_dtlb_load
       [This metric roughly estimates the fraction of cycles where the Data TLB
        (DTLB) was missed by load accesses]
  tma_dtlb_store
       [This metric roughly estimates the fraction of cycles spent handling
        first-level data TLB store misses]
  tma_info_bottleneck_memory_data_tlbs
       [Total pipeline cost of Memory Address Translation related bottlenecks
        (data-side TLBs)]
  tma_info_bottleneck_memory_synchronization
       [Total pipeline cost of Memory Synchronization related bottlenecks (data
        transfers and coherency updates across processors)]

tma_l1_bound_group: [Metrics contributing to tma_l1_bound category]
  tma_dtlb_load
       [This metric roughly estimates the fraction of cycles where the Data TLB
        (DTLB) was missed by load accesses]
  tma_fb_full
       [This metric does a *rough estimation* of how often L1D Fill Buffer
        unavailability limited additional L1D miss memory access requests to
        proceed]
  tma_l1_hit_latency
       [This metric roughly estimates fraction of cycles with demand load
        accesses that hit the L1 cache]
  tma_lock_latency
       [This metric represents fraction of cycles the CPU spent handling cache
        misses due to lock operations]
  tma_split_loads
       [This metric estimates fraction of cycles handling memory load split
        accesses - load that cross 64-byte cache line boundary]
  tma_store_fwd_blk
       [This metric roughly estimates fraction of cycles when the memory
        subsystem had loads blocked since they could not forward data from
        earlier (in program order) overlapping stores]

tma_l3_bound_group: [Metrics contributing to tma_l3_bound category]
  tma_contested_accesses
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to contested accesses]
  tma_data_sharing
       [This metric estimates fraction of cycles while the memory subsystem was
        handling synchronizations due to data-sharing accesses]
  tma_l3_hit_latency
       [This metric estimates fraction of cycles with demand load accesses that
        hit the L3 cache under unloaded scenarios (possibly L3 latency limited)]
  tma_sq_full
       [This metric measures fraction of cycles where the Super Queue (SQ) was
        full taking into account all request-types and both hardware SMT threads
        (Logical Processors)]

tma_light_operations_group: [Metrics contributing to tma_light_operations category]
  tma_fp_arith
       [This metric represents overall arithmetic floating-point (FP) operations
        fraction the CPU has executed (retired)]
  tma_fused_instructions
       [This metric represents fraction of slots where the CPU was retiring
        fused instructions -- where one uop can represent multiple contiguous
        instructions]
  tma_int_operations
       [This metric represents overall Integer (Int) select operations fraction
        the CPU has executed (retired)]
  tma_memory_operations
       [This metric represents fraction of slots where the CPU was retiring
        memory operations -- uops for memory load or store accesses]
  tma_non_fused_branches
       [This metric represents fraction of slots where the CPU was retiring
        branch instructions that were not fused]
  tma_other_light_ops
       [This metric represents the remaining light uops fraction the CPU has
        executed - remaining means not covered by other sibling nodes]

tma_machine_clears_group: [Metrics contributing to tma_machine_clears category]
  tma_fast_nuke
       [Counts the number of issue slots that were not consumed by the backend
        due to a machine clear that does not require the use of microcode,
        classified as a fast nuke,due to memory ordering,memory disambiguation
        and memory renaming]
  tma_nuke
       [Counts the number of issue slots that were not consumed by the backend
        due to a machine clear that requires the use of microcode (slow nuke)]
  tma_other_nukes
       [This metric represents fraction of slots the CPU has wasted due to Nukes
        (Machine Clears) not related to memory ordering]

tma_memory_bound_group: [Metrics contributing to tma_memory_bound category]
  tma_dram_bound
       [This metric estimates how often the CPU was stalled on accesses to
        external memory (DRAM) by loads]
  tma_l1_bound
       [This metric estimates how often the CPU was stalled without loads
        missing the L1 data cache]
  tma_l2_bound
       [This metric estimates how often the CPU was stalled due to L2 cache
        accesses by loads]
  tma_l3_bound
       [This metric estimates how often the CPU was stalled due to loads
        accesses to L3 cache or contended with a sibling Core]
  tma_store_bound
       [This metric estimates how often CPU was stalled due to RFO store memory
        accesses; RFO store issue a read-for-ownership request before the write]

tma_microcode_sequencer_group: [Metrics contributing to tma_microcode_sequencer category]
  tma_assists
       [This metric estimates fraction of slots the CPU retired uops delivered
        by the Microcode_Sequencer as a result of Assists]
  tma_cisc
       [This metric estimates fraction of cycles the CPU retired uops originated
        from CISC (complex instruction set computer) instruction]

tma_mite_group: [Metrics contributing to tma_mite category]
  tma_decoder0_alone
       [This metric represents fraction of cycles where decoder-0 was the only
        active decoder]

tma_other_light_ops_group: [Metrics contributing to tma_other_light_ops category]
  tma_nop_instructions
       [This metric represents fraction of slots where the CPU was retiring NOP
        (no op) instructions]
  tma_shuffles_256b
       [This metric represents fraction of slots where the CPU was retiring
        Shuffle operations of 256-bit vector size (FP or Integer)]

tma_ports_utilization_group: [Metrics contributing to tma_ports_utilization category]
  tma_ports_utilized_0
       [This metric represents fraction of cycles CPU executed no uops on any
        execution port (Logical Processor cycles since ICL,Physical Core cycles
        otherwise)]
  tma_ports_utilized_1
       [This metric represents fraction of cycles where the CPU executed total
        of 1 uop per cycle on all execution ports (Logical Processor cycles
        since ICL,Physical Core cycles otherwise)]
  tma_ports_utilized_2
       [This metric represents fraction of cycles CPU executed total of 2 uops
        per cycle on all execution ports (Logical Processor cycles since ICL,
        Physical Core cycles otherwise)]
  tma_ports_utilized_3m
       [This metric represents fraction of cycles CPU executed total of 3 or
        more uops per cycle on all execution ports (Logical Processor cycles
        since ICL,Physical Core cycles otherwise)]

tma_ports_utilized_0_group: [Metrics contributing to tma_ports_utilized_0 category]
  tma_mixing_vectors
       [This metric estimates penalty in terms of percentage of([SKL+] injected
        blend uops out of all Uops Issued -- the Count Domain; [ADL+] cycles)]

tma_ports_utilized_3m_group: [Metrics contributing to tma_ports_utilized_3m category]
  tma_alu_op_utilization
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution ports for ALU operations]
  tma_load_op_utilization
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port for Load operations]
  tma_store_op_utilization
       [This metric represents Core fraction of cycles CPU dispatched uops on
        execution port for Store operations]

tma_resource_bound_group: [Metrics contributing to tma_resource_bound category]
  tma_mem_scheduler
       [Counts the number of issue slots that were not consumed by the backend
        due to memory reservation stalls in which a scheduler is not able to
        accept uops]
  tma_non_mem_scheduler
       [Counts the number of issue slots that were not consumed by the backend
        due to IEC or FPC RAT stalls,which can be due to FIQ or IEC reservation
        stalls in which the integer,floating point or SIMD scheduler is not able
        to accept uops]
  tma_register
       [Counts the number of issue slots that were not consumed by the backend
        due to the physical register file unable to accept an entry (marble
        stalls)]
  tma_reorder_buffer
       [Counts the number of issue slots that were not consumed by the backend
        due to the reorder buffer being full (ROB stalls)]
  tma_serialization
       [Counts the number of issue slots that were not consumed by the backend
        due to scoreboards from the instruction queue (IQ),jump execution unit
        (JEU),or microcode sequencer (MS)]

tma_retiring_group: [Metrics contributing to tma_retiring category]
  tma_heavy_operations
       [This metric represents fraction of slots where the CPU was retiring
        heavy-weight operations -- instructions that require two or more uops or
        micro-coded sequences]
  tma_light_operations
       [This metric represents fraction of slots where the CPU was retiring
        light-weight operations -- instructions that require no more than one
        uop (micro-operation)]

tma_serializing_operation_group: [Metrics contributing to tma_serializing_operation category]
  tma_c01_wait
       [This metric represents fraction of cycles the CPU was stalled due
        staying in C0.1 power-performance optimized state (Faster wakeup time;
        Smaller power savings)]
  tma_c02_wait
       [This metric represents fraction of cycles the CPU was stalled due
        staying in C0.2 power-performance optimized state (Slower wakeup time;
        Larger power savings)]
  tma_memory_fence
       [This metric represents fraction of cycles the CPU was stalled due to
        LFENCE Instructions]
  tma_slow_pause
       [This metric represents fraction of cycles the CPU was stalled due to
        PAUSE Instructions]

tma_store_bound_group: [Metrics contributing to tma_store_bound category]
  tma_dtlb_store
       [This metric roughly estimates the fraction of cycles spent handling
        first-level data TLB store misses]
  tma_false_sharing
       [This metric roughly estimates how often CPU was handling
        synchronizations due to False Sharing]
  tma_split_stores
       [This metric represents rate of split store accesses]
  tma_store_latency
       [This metric estimates fraction of cycles the CPU spent handling L1D
        store misses]
  tma_streaming_stores
       [This metric estimates how often CPU was stalled due to Streaming store
        memory accesses; Streaming store optimize out a read request required by
        RFO stores]

transaction:
  tsx_aborted_cycles
       [Percentage of cycles in aborted transactions]
  tsx_cycles_per_elision
       [Number of cycles within a transaction divided by the number of elisions]
  tsx_cycles_per_transaction
       [Number of cycles within a transaction divided by the number of
        transactions]
  tsx_transactional_cycles
       [Percentage of cycles within a transaction region]

Examine the output of the following in a terminal:

  • perf top
  • perf top -z
  • perf top -e cache-misses
  • perf top -e cache-misses,cycles
In [3]:
%%writefile tmp/transpose.c

#include <stdio.h>
#include <stdlib.h>
 
int main()
{
    const int m = 1024;
    const int n = 1024;
    int *matrix = malloc(sizeof(int) * m * n);
    int *transpose = malloc(sizeof(int) * m * n);
    
    for (int c = 0; c < m; c++)
       for(int d = 0; d < n; d++)
          matrix[c*m + d] = c+d;

    for (int i = 0; i < 300; ++i)
        for (int c = 0; c < m; c++)
           for(int d = 0 ; d < n ; d++)
              transpose[d*n + c] = matrix[c*m + d];
 
    printf("Transpose of the matrix:\n");
 
    int sum = 0;
    for (int c = 0; c < n; c++)
       for (int d = 0; d < m; d++)
          sum += transpose[d*n + c];
    printf("sum: %d\n", sum);

    return 0;
}
Overwriting tmp/transpose.c
In [4]:
!(cd tmp; gcc transpose.c -O3 -o transpose)
!bash -c "time ./tmp/transpose"
Transpose of the matrix:
sum: 1072693248

real	0m1,016s
user	0m1,012s
sys	0m0,004s
In [5]:
!perf record -e cycles,instructions ./tmp/transpose
Transpose of the matrix:
sum: 1072693248
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0,427 MB perf.data (8461 samples) ]
  • Examine perf report in the terminal.
  • Now retry, this time building with -g instead of -O3
In [6]:
%%writefile tmp/matvec.py

import numpy as np

n = 4096
A = np.random.randn(n, n)
b = np.random.randn(n)

for i in range(10):
    A @ b
Writing tmp/matvec.py
In [9]:
!OPENBLAS_NUM_THREADS=1 perf record python tmp/matvec.py
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,112 MB perf.data (1501 samples) ]
In [23]:
%%writefile tmp/matmat.py

import numpy as np

n = 2048
A = np.random.randn(n, n)
B = np.random.randn(n, n)

for i in range(20):
    A @ B
Overwriting tmp/matmat.py
In [24]:
!OPENBLAS_NUM_THREADS=1 perf record python tmp/matmat.py
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 1,377 MB perf.data (29133 samples) ]

Run in shell separately:

perf record \
  -e cycles,L1-dcache-load-misses \
  -e fp_arith_inst_retired.256b_packed_double \
  -c 10 \
  python tmp/matvec.py
  • Also try -c 100

Look at:

  • perf help
  • perf help record

Aspects to mention:

  • Measuring parts of a program?
  • Granularity for ratios?
  • Scope of collection
  • Call graph collection (-g)
  • Precise events

Using pmu-tools / toplev¶

This uses toplev.py from Andi Kleen's pmu-tools.

  • Try the command below for a few different levels.
  • Try the command below for the matvec and the matmat.
In [25]:
%%bash

OPENBLAS_NUM_THREADS=1 python ~/pack/pmu-tools/toplev.py -l4 python tmp/matmat.py
Consider disabling nmi watchdog to minimize multiplexing
(echo 0 | sudo tee /proc/sys/kernel/nmi_watchdog or
 echo kernel.nmi_watchdog=0 >> /etc/sysctl.conf ; sysctl -p as root)
BR_MISP_RETIRED.COND_NTAKEN_COST event not found for cpu_core
BR_MISP_RETIRED.COND_TAKEN_COST event not found for cpu_core
BR_MISP_RETIRED.INDIRECT_CALL_COST event not found for cpu_core
BR_MISP_RETIRED.INDIRECT_COST event not found for cpu_core
BR_MISP_RETIRED.RET_COST event not found for cpu_core
MEM_INST_RETIRED.STLB_HIT_LOADS event not found for cpu_core
MEM_INST_RETIRED.STLB_HIT_STORES event not found for cpu_core
TOPDOWN_FE_BOUND.ALL_P event not found for cpu_atom
TOPDOWN_FE_BOUND.ITLB_MISS event not found for cpu_atom
TOPDOWN_BAD_SPECULATION.ALL_P event not found for cpu_atom
TOPDOWN_BE_BOUND.ALL_P event not found for cpu_atom
TOPDOWN_RETIRING.ALL_P event not found for cpu_atom
14 events not counted
# 5.01-full-perf, 4 on 13th Gen Intel(R) Core(TM) i7-1365U [mtl]
core BE               Backend_Bound                                                 % Slots                       44.9   [ 8.0%]
core BE/Core          Backend_Bound.Core_Bound                                      % Slots                       31.8   [ 8.0%]
core RET              Retiring.Light_Operations.FP_Arith.FP_Scalar                  % Uops                         0.2   [ 4.0%]
core RET              Retiring.Light_Operations.Int_Operations.Int_Vector_128b      % Uops                         0.0   [ 4.0%]
core RET              Retiring.Light_Operations.Int_Operations.Int_Vector_256b      % Uops                         0.0   [ 4.0%]
core BE/Core          Backend_Bound.Core_Bound.Ports_Utilization                    % Clocks                      19.6   [ 4.0%]
core BE/Core          Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_2   % Clocks                      25.1   [ 4.0%]
	This metric represents fraction of cycles CPU executed total
	of 2 uops per cycle on all execution ports (Logical
	Processor cycles since ICL, Physical Core cycles otherwise)...
	Sampling events:  exe_activity.2_ports_util
core BE/Core          Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m  % Clocks                      57.5   [ 4.0%]<==
	This metric represents fraction of cycles CPU executed total
	of 3 or more uops per cycle on all execution ports (Logical
	Processor cycles since ICL, Physical Core cycles otherwise)...
	Sampling events:  uops_executed.cycles_ge_3
No node for atom crossed threshold
Run toplev --describe Ports_Utilized_3m^ to get more information on bottleneck for core
Some events not found. Consider running event_download to update event lists
Mismeasured (out of bound values):FP_Arith FP_Vector
13 nodes had zero counts: Branch_Detect Branch_Resteer Cisc Decode Fast_Nuke Mem_Scheduler Non_Mem_Scheduler Nuke Other_FB Predecode Register Reorder_Buffer Serialization
Add --run-sample to find locations
Add --nodes '!+Ports_Utilized_3m*/5,+MUX' for breakdown.

Using LIKWID¶

Uses pylikwid, a wrapper around likwid, which offers an analogous C API.

In [14]:
!likwid-perfctr -e
This architecture has 39 counters.
Counter tags(name, type<, options>):
BBOX0C1, Home Agent box 0, EDGEDETECT|THRESHOLD|INVERT
BBOX0C2, Home Agent box 0, EDGEDETECT|THRESHOLD|INVERT
BBOX0C3, Home Agent box 0, EDGEDETECT|THRESHOLD|INVERT
BBOX1C1, Home Agent box 1, EDGEDETECT|THRESHOLD|INVERT
BBOX1C2, Home Agent box 1, EDGEDETECT|THRESHOLD|INVERT
BBOX1C3, Home Agent box 1, EDGEDETECT|THRESHOLD|INVERT
MBOX2C1, Memory Controller 0 Channel 2, EDGEDETECT|THRESHOLD|INVERT
MBOX2C2, Memory Controller 0 Channel 2, EDGEDETECT|THRESHOLD|INVERT
MBOX2C3, Memory Controller 0 Channel 2, EDGEDETECT|THRESHOLD|INVERT
MBOX2FIX, Memory Controller 0 Channel 2 Fixed Counter, INVERT
MBOX3C1, Memory Controller 0 Channel 3, EDGEDETECT|THRESHOLD|INVERT
MBOX3C2, Memory Controller 0 Channel 3, EDGEDETECT|THRESHOLD|INVERT
MBOX3C3, Memory Controller 0 Channel 3, EDGEDETECT|THRESHOLD|INVERT
MBOX3FIX, Memory Controller 0 Channel 3 Fixed Counter, INVERT
MBOX6C1, Memory Controller 1 Channel 2, EDGEDETECT|THRESHOLD|INVERT
MBOX6C2, Memory Controller 1 Channel 2, EDGEDETECT|THRESHOLD|INVERT
MBOX6C3, Memory Controller 1 Channel 2, EDGEDETECT|THRESHOLD|INVERT
MBOX6FIX, Memory Controller 1 Channel 2 Fixed Counter, INVERT
MBOX7C1, Memory Controller 1 Channel 3, EDGEDETECT|THRESHOLD|INVERT
MBOX7C2, Memory Controller 1 Channel 3, EDGEDETECT|THRESHOLD|INVERT
MBOX7C3, Memory Controller 1 Channel 3, EDGEDETECT|THRESHOLD|INVERT
MBOX7FIX, Memory Controller 1 Channel 3 Fixed Counter, INVERT
PBOX1, Physical Layer box, EDGEDETECT|THRESHOLD|INVERT
PBOX2, Physical Layer box, EDGEDETECT|THRESHOLD|INVERT
PBOX3, Physical Layer box, EDGEDETECT|THRESHOLD|INVERT
RBOX0C1, Routing box 0, EDGEDETECT|THRESHOLD|INVERT
RBOX0C2, Routing box 0, EDGEDETECT|THRESHOLD|INVERT
RBOX1C1, Routing box 1, EDGEDETECT|THRESHOLD|INVERT
RBOX1C2, Routing box 1, EDGEDETECT|THRESHOLD|INVERT
QBOX0C1, QPI Link Layer 0, EDGEDETECT|THRESHOLD|INVERT
QBOX0C2, QPI Link Layer 0, EDGEDETECT|THRESHOLD|INVERT
QBOX0C3, QPI Link Layer 0, EDGEDETECT|THRESHOLD|INVERT
QBOX1C1, QPI Link Layer 1, EDGEDETECT|THRESHOLD|INVERT
QBOX1C2, QPI Link Layer 1, EDGEDETECT|THRESHOLD|INVERT
QBOX1C3, QPI Link Layer 1, EDGEDETECT|THRESHOLD|INVERT
QBOX0FIX1, QPI Link Layer rate status 0
QBOX0FIX2, QPI Link Layer rate status 0
QBOX1FIX1, QPI Link Layer rate status 1
QBOX1FIX2, QPI Link Layer rate status 1



This architecture has 1668 events.
Event tags (tag, id, umask, counters<, options>):
TEMP_CORE, 0x0, 0x0, TMP0
PWR_PKG_ENERGY, 0x2, 0x0, PWR0
PWR_PP0_ENERGY, 0x1, 0x0, PWR1
PWR_PP1_ENERGY, 0x4, 0x0, PWR2
PWR_DRAM_ENERGY, 0x3, 0x0, PWR3
INSTR_RETIRED_ANY, 0x0, 0x0, FIXC0
CPU_CLK_UNHALTED_CORE, 0x0, 0x0, FIXC1
CPU_CLK_UNHALTED_REF, 0x0, 0x0, FIXC2
LD_BLOCKS_STORE_FORWARD, 0x3, 0x2, PMC
LD_BLOCKS_NO_SR, 0x3, 0x8, PMC
MISALIGN_MEM_REF_LOADS, 0x5, 0x1, PMC
MISALIGN_MEM_REF_STORES, 0x5, 0x2, PMC
MISALIGN_MEM_REF_ANY, 0x5, 0x3, PMC
LD_BLOCKS_PARTIAL_ADDRESS_ALIAS, 0x7, 0x1, PMC
DTLB_LOAD_MISSES_CAUSES_A_WALK, 0x8, 0x1, PMC
DTLB_LOAD_MISSES_STLB_HIT, 0x8, 0x60, PMC
DTLB_LOAD_MISSES_WALK_COMPLETED, 0x8, 0xE, PMC
DTLB_LOAD_MISSES_STLB_HIT_4K, 0x8, 0x20, PMC
DTLB_LOAD_MISSES_WALK_COMPLETED_4K, 0x8, 0x2, PMC
DTLB_LOAD_MISSES_WALK_DURATION, 0x8, 0x10, PMC
INT_MISC_RECOVERY_CYCLES, 0xD, 0x3, PMC
INT_MISC_RECOVERY_COUNT, 0xD, 0x3, PMC
INT_MISC_RAT_STALL_CYCLES, 0xD, 0x8, PMC
INT_MISC_RAT_STALL_COUNT, 0xD, 0x8, PMC
UOPS_ISSUED_ANY, 0xE, 0x1, PMC
UOPS_ISSUED_FLAGS_MERGE, 0xE, 0x10, PMC
UOPS_ISSUED_SLOW_LEA, 0xE, 0x20, PMC
UOPS_ISSUED_SINGLE_MUL, 0xE, 0x40, PMC
UOPS_ISSUED_USED_CYCLES, 0xE, 0x1, PMC
UOPS_ISSUED_STALL_CYCLES, 0xE, 0x1, PMC
UOPS_ISSUED_TOTAL_CYCLES, 0xE, 0x1, PMC
UOPS_ISSUED_CORE_USED_CYCLES, 0xE, 0x1, PMC
UOPS_ISSUED_CORE_STALL_CYCLES, 0xE, 0x1, PMC
UOPS_ISSUED_CORE_TOTAL_CYCLES, 0xE, 0x1, PMC
UOPS_ISSUED_CYCLES_GE_1_UOPS_EXEC, 0xE, 0x1, PMC
UOPS_ISSUED_CYCLES_GE_2_UOPS_EXEC, 0xE, 0x1, PMC
UOPS_ISSUED_CYCLES_GE_3_UOPS_EXEC, 0xE, 0x1, PMC
UOPS_ISSUED_CYCLES_GE_4_UOPS_EXEC, 0xE, 0x1, PMC
UOPS_ISSUED_CYCLES_GE_5_UOPS_EXEC, 0xE, 0x1, PMC
UOPS_ISSUED_CYCLES_GE_6_UOPS_EXEC, 0xE, 0x1, PMC
ARITH_FPU_DIV_ACTIVE, 0x14, 0x1, PMC
L2_RQSTS_DEMAND_DATA_RD_MISS, 0x24, 0x21, PMC
L2_RQSTS_DEMAND_DATA_RD_HIT, 0x24, 0x41, PMC
L2_RQSTS_RFO_MISS, 0x24, 0x22, PMC
L2_RQSTS_RFO_HIT, 0x24, 0x42, PMC
L2_RQSTS_CODE_RD_MISS, 0x24, 0x24, PMC
L2_RQSTS_CODE_RD_HIT, 0x24, 0x44, PMC
L2_RQSTS_L2_PF_HIT, 0x24, 0x50, PMC
L2_RQSTS_L2_PF_MISS, 0x24, 0x30, PMC
L2_RQSTS_ALL_DEMAND_DATA_RD, 0x24, 0xE1, PMC
L2_RQSTS_ALL_DEMAND_MISS, 0x24, 0x27, PMC
L2_RQSTS_ALL_RFO, 0x24, 0xE2, PMC
L2_RQSTS_ALL_CODE_RD, 0x24, 0xE4, PMC
L2_RQSTS_ALL_DEMAND_REFERENCES, 0x24, 0xE7, PMC
L2_RQSTS_ALL_PF, 0x24, 0xF8, PMC
L2_RQSTS_MISS, 0x24, 0x3F, PMC
L2_RQSTS_REFERENCES, 0x24, 0xFF, PMC
L2_DEMAND_RQST_WB_HIT, 0x27, 0x50, PMC
LONGEST_LAT_CACHE_REFERENCE, 0x2E, 0x4F, PMC
LONGEST_LAT_CACHE_MISS, 0x2E, 0x41, PMC
CPU_CLOCK_UNHALTED_THREAD_P, 0x3C, 0x0, PMC
CPU_CLOCK_UNHALTED_REF_XCLK, 0x3C, 0x1, PMC
CPU_CLOCK_UNHALTED_ONE_THREAD_ACTIVE, 0x3C, 0x2, PMC
L1D_PEND_MISS_PENDING, 0x48, 0x1, PMC2
L1D_PEND_MISS_PENDING_CYCLES, 0x48, 0x1, PMC2
L1D_PEND_MISS_OCCURRENCES, 0x48, 0x1, PMC2
DTLB_STORE_MISSES_CAUSES_A_WALK, 0x49, 0x1, PMC
DTLB_STORE_MISSES_STLB_HIT, 0x49, 0x60, PMC
DTLB_STORE_MISSES_WALK_COMPLETED, 0x49, 0xE, PMC
DTLB_STORE_MISSES_STLB_HIT_4K, 0x49, 0x20, PMC
DTLB_STORE_MISSES_WALK_COMPLETED_4K, 0x49, 0x2, PMC
DTLB_STORE_MISSES_WALK_DURATION, 0x49, 0x10, PMC
LOAD_HIT_PRE_HW_PF, 0x4C, 0x2, PMC
EPT_WALK_CYCLES, 0x4F, 0x10, PMC
L1D_REPLACEMENT, 0x51, 0x1, PMC
L1D_M_EVICT, 0x51, 0x4, PMC
TX_MEM_ABORT_CONFLICT, 0x54, 0x1, PMC
TX_MEM_ABORT_CAPACITY_WRITE, 0x54, 0x2, PMC
TX_MEM_ABORT_HLE_STORE_TO_ELIDED_LOCK, 0x54, 0x4, PMC
TX_MEM_ABORT_HLE_ELISION_BUFFER_NOT_EMPTY, 0x54, 0x8, PMC
TX_MEM_ABORT_HLE_ELISION_BUFFER_MISMATCH, 0x54, 0x10, PMC
TX_MEM_ABORT_HLE_ELISION_BUFFER_UNSUPPORTED_ALIGNMENT, 0x54, 0x20, PMC
TX_MEM_HLE_ELISION_BUFFER_FULL, 0x54, 0x40, PMC
MOVE_ELIMINATION_INT_NOT_ELIMINATED, 0x58, 0x4, PMC
MOVE_ELIMINATION_SIMD_NOT_ELIMINATED, 0x58, 0x8, PMC
MOVE_ELIMINATION_INT_ELIMINATED, 0x58, 0x1, PMC
MOVE_ELIMINATION_SIMD_ELIMINATED, 0x58, 0x2, PMC
CPL_CYCLES_RING0, 0x5C, 0x1, PMC
CPL_CYCLES_RING123, 0x5C, 0x2, PMC
CPL_CYCLES_RING0_TRANS, 0x5C, 0x1, PMC
RS_EVENTS_EMPTY_CYCLES, 0x5E, 0x1, PMC
OFFCORE_REQUESTS_OUTSTANDING_DEMAND_DATA_RD, 0x60, 0x1, PMC
OFFCORE_REQUESTS_OUTSTANDING_DEMAND_CODE_RD, 0x60, 0x2, PMC
OFFCORE_REQUESTS_OUTSTANDING_DEMAND_RFO, 0x60, 0x4, PMC
OFFCORE_REQUESTS_OUTSTANDING_ALL_DATA_RD, 0x60, 0x8, PMC
LOCK_CYCLES_SPLIT_LOCK_UC_LOCK_DURATION, 0x63, 0x1, PMC
LOCK_CYCLES_CACHE_LOCK_DURATION, 0x63, 0x2, PMC
IDQ_EMPTY, 0x79, 0x2, PMC
IDQ_MITE_UOPS, 0x79, 0x4, PMC
IDQ_DSB_UOPS, 0x79, 0x8, PMC
IDQ_MS_DSB_UOPS, 0x79, 0x10, PMC
IDQ_MS_MITE_UOPS, 0x79, 0x20, PMC
IDQ_MS_UOPS, 0x79, 0x30, PMC
IDQ_DSB_UOPS, 0x79, 0x18, PMC
IDQ_MITE_ALL_UOPS, 0x79, 0x24, PMC
IDQ_ALL_UOPS, 0x79, 0x3C, PMC
IDQ_MITE_CYCLES, 0x79, 0x4, PMC
IDQ_MITE_CYCLES_1_UOPS, 0x79, 0x4, PMC
IDQ_MITE_CYCLES_2_UOPS, 0x79, 0x4, PMC
IDQ_MITE_CYCLES_3_UOPS, 0x79, 0x4, PMC
IDQ_MITE_CYCLES_4_UOPS, 0x79, 0x4, PMC
IDQ_DSB_CYCLES, 0x79, 0x8, PMC
IDQ_DSB_CYCLES_1_UOPS, 0x79, 0x8, PMC
IDQ_DSB_CYCLES_2_UOPS, 0x79, 0x8, PMC
IDQ_DSB_CYCLES_3_UOPS, 0x79, 0x8, PMC
IDQ_DSB_CYCLES_4_UOPS, 0x79, 0x8, PMC
IDQ_MS_DSB_CYCLES, 0x79, 0x10, PMC
IDQ_MS_DSB_CYCLES_1_UOPS, 0x79, 0x10, PMC
IDQ_MS_DSB_CYCLES_2_UOPS, 0x79, 0x10, PMC
IDQ_MS_DSB_CYCLES_3_UOPS, 0x79, 0x10, PMC
IDQ_MS_DSB_CYCLES_4_UOPS, 0x79, 0x10, PMC
IDQ_MS_DSB_OCCUR, 0x79, 0x10, PMC
IDQ_MS_MITE_CYCLES, 0x79, 0x20, PMC
IDQ_MS_MITE_CYCLES_1_UOPS, 0x79, 0x20, PMC
IDQ_MS_MITE_CYCLES_2_UOPS, 0x79, 0x20, PMC
IDQ_MS_MITE_CYCLES_3_UOPS, 0x79, 0x20, PMC
IDQ_MS_MITE_CYCLES_4_UOPS, 0x79, 0x20, PMC
IDQ_MS_CYCLES, 0x79, 0x30, PMC
IDQ_MS_CYCLES_1_UOPS, 0x79, 0x30, PMC
IDQ_MS_CYCLES_2_UOPS, 0x79, 0x30, PMC
IDQ_MS_CYCLES_3_UOPS, 0x79, 0x30, PMC
IDQ_MS_CYCLES_4_UOPS, 0x79, 0x30, PMC
IDQ_MS_SWITCHES, 0x79, 0x30, PMC
IDQ_ALL_DSB_CYCLES_ANY_UOPS, 0x79, 0x18, PMC
IDQ_ALL_DSB_CYCLES_1_UOPS, 0x79, 0x18, PMC
IDQ_ALL_DSB_CYCLES_2_UOPS, 0x79, 0x18, PMC
IDQ_ALL_DSB_CYCLES_3_UOPS, 0x79, 0x18, PMC
IDQ_ALL_DSB_CYCLES_4_UOPS, 0x79, 0x18, PMC
IDQ_ALL_MITE_CYCLES_ANY_UOPS, 0x79, 0x24, PMC
IDQ_ALL_MITE_CYCLES_1_UOPS, 0x79, 0x24, PMC
IDQ_ALL_MITE_CYCLES_2_UOPS, 0x79, 0x24, PMC
IDQ_ALL_MITE_CYCLES_3_UOPS, 0x79, 0x24, PMC
IDQ_ALL_MITE_CYCLES_4_UOPS, 0x79, 0x24, PMC
IDQ_ALL_CYCLES_ANY_UOPS, 0x79, 0x3C, PMC
IDQ_ALL_CYCLES_1_UOPS, 0x79, 0x3C, PMC
IDQ_ALL_CYCLES_2_UOPS, 0x79, 0x3C, PMC
IDQ_ALL_CYCLES_3_UOPS, 0x79, 0x3C, PMC
IDQ_ALL_CYCLES_4_UOPS, 0x79, 0x3C, PMC
ICACHE_HIT, 0x80, 0x1, PMC
ICACHE_MISSES, 0x80, 0x2, PMC
ICACHE_ACCESSES, 0x80, 0x3, PMC
ITLB_MISSES_CAUSES_A_WALK, 0x85, 0x1, PMC
ITLB_MISSES_STLB_HIT, 0x85, 0x60, PMC
ITLB_MISSES_WALK_COMPLETED, 0x85, 0xE, PMC
ITLB_MISSES_STLB_HIT_4K, 0x85, 0x20, PMC
ITLB_MISSES_WALK_COMPLETED_4K, 0x85, 0x2, PMC
ITLB_MISSES_WALK_DURATION, 0x85, 0x10, PMC
ILD_STALL_LCP, 0x87, 0x1, PMC
BR_INST_EXEC_COND_TAKEN, 0x88, 0x81, PMC
BR_INST_EXEC_COND_NON_TAKEN, 0x88, 0x41, PMC
BR_INST_EXEC_DIRECT_JMP_TAKEN, 0x88, 0x82, PMC
BR_INST_EXEC_INDIRECT_JMP_NON_CALL_RET_TAKEN, 0x88, 0x84, PMC
BR_INST_EXEC_RETURN_NEAR_TAKEN, 0x88, 0x88, PMC
BR_INST_EXEC_DIRECT_NEAR_CALL_TAKEN, 0x88, 0x90, PMC
BR_INST_EXEC_INDIRECT_NEAR_CALL_TAKEN, 0x88, 0xA0, PMC
BR_INST_EXEC_ALL_CONDITIONAL, 0x88, 0xC1, PMC
BR_INST_EXEC_ALL_DIRECT_JMP, 0x88, 0xC2, PMC
BR_INST_EXEC_ALL_DIRECT_NEAR_CALL, 0x88, 0xD0, PMC
BR_INST_EXEC_ALL_INDIRECT_JUMP_NON_CALL_RET, 0x88, 0xC4, PMC
BR_INST_EXEC_ALL_INDIRECT_NEAR_RETURN, 0x88, 0xC8, PMC
BR_INST_EXEC_ALL_BRANCHES, 0x88, 0xFF, PMC
BR_MISP_EXEC_COND_TAKEN, 0x89, 0x81, PMC
BR_MISP_EXEC_COND_NON_TAKEN, 0x89, 0x41, PMC
BR_MISP_EXEC_INDIRECT_JMP_NON_CALL_RET_TAKEN, 0x89, 0x84, PMC
BR_MISP_EXEC_RETURN_NEAR_TAKEN, 0x89, 0x88, PMC
BR_MISP_EXEC_DIRECT_NEAR_CALL_TAKEN, 0x89, 0x90, PMC
BR_MISP_EXEC_INDIRECT_NEAR_CALL_TAKEN, 0x89, 0xA0, PMC
BR_MISP_EXEC_ALL_CONDITIONAL, 0x89, 0xC1, PMC
BR_MISP_EXEC_ALL_INDIRECT_JUMP_NON_CALL_RET, 0x89, 0xC4, PMC
BR_MISP_EXEC_ALL_BRANCHES, 0x89, 0xFF, PMC
IDQ_UOPS_NOT_DELIVERED_CORE, 0x9C, 0x1, PMC
IDQ_UOPS_NOT_DELIVERED_CYCLES_0_UOPS_DELIV_CORE, 0x9C, 0x1, PMC
IDQ_UOPS_NOT_DELIVERED_CYCLES_LE_1_UOP_DELIV_CORE, 0x9C, 0x1, PMC
IDQ_UOPS_NOT_DELIVERED_CYCLES_LE_2_UOP_DELIV_CORE, 0x9C, 0x1, PMC
IDQ_UOPS_NOT_DELIVERED_CYCLES_LE_3_UOP_DELIV_CORE, 0x9C, 0x1, PMC
IDQ_UOPS_NOT_DELIVERED_CYCLES_FE_WAS_OK, 0x9C, 0x1, PMC
UOP_DISPATCHES_CANCELLED_SIMD_PRF, 0xA0, 0x3, PMC
UOPS_EXECUTED_PORT_PORT_0, 0xA1, 0x1, PMC
UOPS_EXECUTED_PORT_PORT_1, 0xA1, 0x2, PMC
UOPS_EXECUTED_PORT_PORT_2, 0xA1, 0x4, PMC
UOPS_EXECUTED_PORT_PORT_3, 0xA1, 0x8, PMC
UOPS_EXECUTED_PORT_PORT_4, 0xA1, 0x10, PMC
UOPS_EXECUTED_PORT_PORT_5, 0xA1, 0x20, PMC
UOPS_EXECUTED_PORT_PORT_6, 0xA1, 0x40, PMC
UOPS_EXECUTED_PORT_PORT_7, 0xA1, 0x80, PMC
UOPS_EXECUTED_PORT_PORT_0_CORE, 0xA1, 0x1, PMC
UOPS_EXECUTED_PORT_PORT_1_CORE, 0xA1, 0x2, PMC
UOPS_EXECUTED_PORT_PORT_2_CORE, 0xA1, 0x4, PMC
UOPS_EXECUTED_PORT_PORT_3_CORE, 0xA1, 0x8, PMC
UOPS_EXECUTED_PORT_PORT_4_CORE, 0xA1, 0x10, PMC
UOPS_EXECUTED_PORT_PORT_5_CORE, 0xA1, 0x20, PMC
UOPS_EXECUTED_PORT_PORT_6_CORE, 0xA1, 0x40, PMC
UOPS_EXECUTED_PORT_PORT_7_CORE, 0xA1, 0x80, PMC
RESOURCE_STALLS_ANY, 0xA2, 0x1, PMC
RESOURCE_STALLS_RS, 0xA2, 0x4, PMC
RESOURCE_STALLS_SB, 0xA2, 0x8, PMC
RESOURCE_STALLS_ROB, 0xA2, 0x10, PMC
CYCLE_ACTIVITY_CYCLES_L1D_MISS, 0xA3, 0x8, PMC2
CYCLE_ACTIVITY_CYCLES_L2_MISS, 0xA3, 0x1, PMC
CYCLE_ACTIVITY_CYCLES_L2_PENDING, 0xA3, 0x1, PMC
CYCLE_ACTIVITY_CYCLES_MEM_ANY, 0xA3, 0x2, PMC
CYCLE_ACTIVITY_CYCLES_LDM_PENDING, 0xA3, 0x2, PMC
CYCLE_ACTIVITY_CYCLES_NO_EXECUTE, 0xA3, 0x4, PMC
CYCLE_ACTIVITY_STALLS_L1D_MISS, 0xA3, 0xC, PMC2
CYCLE_ACTIVITY_STALLS_L2_MISS, 0xA3, 0x5, PMC
CYCLE_ACTIVITY_STALLS_L2_PENDING, 0xA3, 0x5, PMC
CYCLE_ACTIVITY_STALLS_MEM_ANY, 0xA3, 0x6, PMC
CYCLE_ACTIVITY_STALLS_LDM_PENDING, 0xA3, 0x6, PMC
LSD_UOPS, 0xA8, 0x1, PMC
LSD_CYCLES_1_UOPS, 0xA8, 0x1, PMC
LSD_CYCLES_2_UOPS, 0xA8, 0x1, PMC
LSD_CYCLES_3_UOPS, 0xA8, 0x1, PMC
LSD_CYCLES_4_UOPS, 0xA8, 0x1, PMC
LSD_CYCLES_ACTIVE, 0xA8, 0x1, PMC
LSD_CYCLES_INACTIVE, 0xA8, 0x1, PMC
DSB2MITE_SWITCHES_PENALTY_CYCLES, 0xAB, 0x2, PMC
ITLB_ITLB_FLUSH, 0xAE, 0x1, PMC
OFFCORE_REQUESTS_DEMAND_DATA_RD, 0xB0, 0x1, PMC
OFFCORE_REQUESTS_DEMAND_CODE_RD, 0xB0, 0x2, PMC
OFFCORE_REQUESTS_DEMAND_RFO, 0xB0, 0x4, PMC
OFFCORE_REQUESTS_ALL_DATA_RD, 0xB0, 0x8, PMC
UOPS_EXECUTED_THREAD, 0xB1, 0x1, PMC
UOPS_EXECUTED_USED_CYCLES, 0xB1, 0x1, PMC
UOPS_EXECUTED_STALL_CYCLES, 0xB1, 0x1, PMC
UOPS_EXECUTED_TOTAL_CYCLES, 0xB1, 0x1, PMC
UOPS_EXECUTED_CYCLES_GE_1_UOPS_EXEC, 0xB1, 0x1, PMC
UOPS_EXECUTED_CYCLES_GE_2_UOPS_EXEC, 0xB1, 0x1, PMC
UOPS_EXECUTED_CYCLES_GE_3_UOPS_EXEC, 0xB1, 0x1, PMC
UOPS_EXECUTED_CYCLES_GE_4_UOPS_EXEC, 0xB1, 0x1, PMC
UOPS_EXECUTED_CYCLES_GE_5_UOPS_EXEC, 0xB1, 0x1, PMC
UOPS_EXECUTED_CYCLES_GE_6_UOPS_EXEC, 0xB1, 0x1, PMC
UOPS_EXECUTED_CYCLES_GE_7_UOPS_EXEC, 0xB1, 0x1, PMC
UOPS_EXECUTED_CYCLES_GE_8_UOPS_EXEC, 0xB1, 0x1, PMC
UOPS_EXECUTED_CORE, 0xB1, 0x2, PMC
UOPS_EXECUTED_CORE_USED_CYCLES, 0xB1, 0x2, PMC
UOPS_EXECUTED_CORE_STALL_CYCLES, 0xB1, 0x2, PMC
UOPS_EXECUTED_CORE_TOTAL_CYCLES, 0xB1, 0x2, PMC
UOPS_EXECUTED_CORE_CYCLES_GE_1_UOPS_EXEC, 0xB1, 0x2, PMC
UOPS_EXECUTED_CORE_CYCLES_GE_2_UOPS_EXEC, 0xB1, 0x2, PMC
UOPS_EXECUTED_CORE_CYCLES_GE_3_UOPS_EXEC, 0xB1, 0x2, PMC
UOPS_EXECUTED_CORE_CYCLES_GE_4_UOPS_EXEC, 0xB1, 0x2, PMC
UOPS_EXECUTED_CORE_CYCLES_GE_5_UOPS_EXEC, 0xB1, 0x2, PMC
UOPS_EXECUTED_CORE_CYCLES_GE_6_UOPS_EXEC, 0xB1, 0x2, PMC
UOPS_EXECUTED_CORE_CYCLES_GE_7_UOPS_EXEC, 0xB1, 0x2, PMC
UOPS_EXECUTED_CORE_CYCLES_GE_8_UOPS_EXEC, 0xB1, 0x2, PMC
OFFCORE_REQUESTS_BUFFER_SQ_FULL, 0xB2, 0x1, PMC
PAGE_WALKER_LOADS_DTLB_L1, 0xBC, 0x11, PMC
PAGE_WALKER_LOADS_ITLB_L1, 0xBC, 0x21, PMC
PAGE_WALKER_LOADS_DTLB_L2, 0xBC, 0x12, PMC
PAGE_WALKER_LOADS_ITLB_L2, 0xBC, 0x22, PMC
PAGE_WALKER_LOADS_DTLB_L3, 0xBC, 0x14, PMC
PAGE_WALKER_LOADS_ITLB_L3, 0xBC, 0x24, PMC
PAGE_WALKER_LOADS_DTLB_MEMORY, 0xBC, 0x18, PMC
INST_RETIRED_ANY_P, 0xC0, 0x0, PMC
INST_RETIRED_X87, 0xC0, 0x2, PMC
INST_RETIRED_PREC_DIST, 0xC0, 0x1, PMC1
OTHER_ASSISTS_AVX_TO_SSE, 0xC1, 0x8, PMC
OTHER_ASSISTS_SSE_TO_AVX, 0xC1, 0x10, PMC
OTHER_ASSISTS_ANY_WB_ASSIST, 0xC1, 0x40, PMC
UOPS_RETIRED_ALL, 0xC2, 0x1, PMC
UOPS_RETIRED_CORE_ALL, 0xC2, 0x1, PMC
UOPS_RETIRED_RETIRE_SLOTS, 0xC2, 0x2, PMC
UOPS_RETIRED_CORE_RETIRE_SLOTS, 0xC2, 0x2, PMC
UOPS_RETIRED_USED_CYCLES, 0xC2, 0x1, PMC
UOPS_RETIRED_STALL_CYCLES, 0xC2, 0x1, PMC
UOPS_RETIRED_TOTAL_CYCLES, 0xC2, 0x1, PMC
UOPS_RETIRED_CORE_ALL, 0xC2, 0x1, PMC
UOPS_RETIRED_CORE_RETIRE_SLOTS, 0xC2, 0x2, PMC
UOPS_RETIRED_CORE_USED_CYCLES, 0xC2, 0x1, PMC
UOPS_RETIRED_CORE_STALL_CYCLES, 0xC2, 0x1, PMC
UOPS_RETIRED_CORE_TOTAL_CYCLES, 0xC2, 0x1, PMC
UOPS_RETIRED_CYCLES_GE_1_UOPS_EXEC, 0xC2, 0x1, PMC
UOPS_RETIRED_CYCLES_GE_2_UOPS_EXEC, 0xC2, 0x1, PMC
UOPS_RETIRED_CYCLES_GE_3_UOPS_EXEC, 0xC2, 0x1, PMC
UOPS_RETIRED_CYCLES_GE_4_UOPS_EXEC, 0xC2, 0x1, PMC
UOPS_RETIRED_CYCLES_GE_5_UOPS_EXEC, 0xC2, 0x1, PMC
UOPS_RETIRED_CYCLES_GE_6_UOPS_EXEC, 0xC2, 0x1, PMC
UOPS_RETIRED_CYCLES_GE_7_UOPS_EXEC, 0xC2, 0x1, PMC
UOPS_RETIRED_CYCLES_GE_8_UOPS_EXEC, 0xC2, 0x1, PMC
MACHINE_CLEARS_COUNT, 0xC3, 0x1, PMC
MACHINE_CLEARS_CYCLES, 0xC3, 0x1, PMC
MACHINE_CLEARS_MEMORY_ORDERING, 0xC3, 0x2, PMC
MACHINE_CLEARS_SMC, 0xC3, 0x4, PMC
MACHINE_CLEARS_MASKMOV, 0xC3, 0x20, PMC
BR_INST_RETIRED_ALL_BRANCHES, 0xC4, 0x0, PMC
BR_INST_RETIRED_CONDITIONAL, 0xC4, 0x1, PMC
BR_INST_RETIRED_NEAR_CALL, 0xC4, 0x2, PMC
BR_INST_RETIRED_NEAR_RETURN, 0xC4, 0x8, PMC
BR_INST_RETIRED_NOT_TAKEN, 0xC4, 0x10, PMC
BR_INST_RETIRED_NEAR_TAKEN, 0xC4, 0x20, PMC
BR_INST_RETIRED_FAR_BRANCH, 0xC4, 0x40, PMC
BR_MISP_RETIRED_ALL_BRANCHES, 0xC5, 0x0, PMC
BR_MISP_RETIRED_CONDITIONAL, 0xC5, 0x1, PMC
BR_MISP_RETIRED_NEAR_TAKEN, 0xC5, 0x20, PMC
FP_ARITH_INST_RETIRED_SCALAR_DOUBLE, 0xC7, 0x1, PMC
FP_ARITH_INST_RETIRED_SCALAR_SINGLE, 0xC7, 0x2, PMC
FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE, 0xC7, 0x4, PMC
FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE, 0xC7, 0x8, PMC
FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE, 0xC7, 0x10, PMC
FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE, 0xC7, 0x20, PMC
FP_ARITH_INST_RETIRED_SCALAR, 0xC7, 0x3, PMC
FP_ARITH_INST_RETIRED_PACKED, 0xC7, 0x3C, PMC
FP_ARITH_INST_RETIRED_DOUBLE, 0xC7, 0x15, PMC
FP_ARITH_INST_RETIRED_SINGLE, 0xC7, 0x2A, PMC
HLE_RETIRED_START, 0xC8, 0x1, PMC
HLE_RETIRED_COMMIT, 0xC8, 0x2, PMC
HLE_RETIRED_ABORTED, 0xC8, 0x4, PMC
HLE_RETIRED_ABORTED_MISC1, 0xC8, 0x8, PMC
HLE_RETIRED_ABORTED_MISC2, 0xC8, 0x10, PMC
HLE_RETIRED_ABORTED_MISC3, 0xC8, 0x20, PMC
HLE_RETIRED_ABORTED_MISC4, 0xC8, 0x40, PMC
HLE_RETIRED_ABORTED_MISC5, 0xC8, 0x80, PMC
RTM_RETIRED_START, 0xC9, 0x1, PMC
RTM_RETIRED_COMMIT, 0xC9, 0x2, PMC
RTM_RETIRED_ABORTED, 0xC9, 0x4, PMC
RTM_RETIRED_ABORTED_MISC1, 0xC9, 0x8, PMC
RTM_RETIRED_ABORTED_MISC2, 0xC9, 0x10, PMC
RTM_RETIRED_ABORTED_MISC3, 0xC9, 0x20, PMC
RTM_RETIRED_ABORTED_MISC4, 0xC9, 0x40, PMC
RTM_RETIRED_ABORTED_MISC5, 0xC9, 0x80, PMC
FP_ASSIST_X87_OUTPUT, 0xCA, 0x2, PMC
FP_ASSIST_X87_INPUT, 0xCA, 0x4, PMC
FP_ASSIST_SIMD_OUTPUT, 0xCA, 0x8, PMC
FP_ASSIST_SIMD_INPUT, 0xCA, 0x10, PMC
FP_ASSIST_ANY, 0xCA, 0x1E, PMC
ROB_MISC_EVENT_LBR_INSERTS, 0xCC, 0x20, PMC
MEM_UOPS_RETIRED_LOADS_ALL, 0xD0, 0x81, PMC
MEM_UOPS_RETIRED_STORES_ALL, 0xD0, 0x82, PMC
MEM_UOPS_RETIRED_LOADS_LOCK, 0xD0, 0x21, PMC
MEM_UOPS_RETIRED_LOADS_STLB_MISS, 0xD0, 0x11, PMC
MEM_UOPS_RETIRED_STORES_STLB_MISS, 0xD0, 0x12, PMC
MEM_UOPS_RETIRED_LOADS_SPLIT, 0xD0, 0x41, PMC
MEM_UOPS_RETIRED_STORES_SPLIT, 0xD0, 0x42, PMC
MEM_LOAD_UOPS_RETIRED_L1_HIT, 0xD1, 0x1, PMC
MEM_LOAD_UOPS_RETIRED_L1_MISS, 0xD1, 0x8, PMC
MEM_LOAD_UOPS_RETIRED_L1_ALL, 0xD1, 0x9, PMC
MEM_LOAD_UOPS_RETIRED_L2_HIT, 0xD1, 0x2, PMC
MEM_LOAD_UOPS_RETIRED_L2_MISS, 0xD1, 0x10, PMC
MEM_LOAD_UOPS_RETIRED_L2_ALL, 0xD1, 0x12, PMC
MEM_LOAD_UOPS_RETIRED_L3_HIT, 0xD1, 0x4, PMC
MEM_LOAD_UOPS_RETIRED_L3_MISS, 0xD1, 0x20, PMC
MEM_LOAD_UOPS_RETIRED_L3_ALL, 0xD1, 0x24, PMC
MEM_LOAD_UOPS_RETIRED_HIT_LFB, 0xD1, 0x40, PMC
MEM_LOAD_UOPS_L3_HIT_RETIRED_XSNP_MISS, 0xD2, 0x1, PMC
MEM_LOAD_UOPS_L3_HIT_RETIRED_XSNP_HIT, 0xD2, 0x2, PMC
MEM_LOAD_UOPS_L3_HIT_RETIRED_XSNP_HITM, 0xD2, 0x4, PMC
MEM_LOAD_UOPS_L3_HIT_RETIRED_XSNP_NONE, 0xD2, 0x8, PMC
MEM_LOAD_UOPS_L3_MISS_RETIRED_LOCAL_DRAM, 0xD3, 0x1, PMC
MEM_LOAD_UOPS_L3_MISS_RETIRED_REMOTE_DRAM, 0xD3, 0x4, PMC
MEM_LOAD_UOPS_L3_MISS_RETIRED_REMOTE_HITM, 0xD3, 0x10, PMC
MEM_LOAD_UOPS_L3_MISS_RETIRED_REMOTE_FWD, 0xD3, 0x20, PMC
BACLEARS_ANY, 0xE6, 0x1F, PMC
L2_TRANS_DEMAND_DATA_RD, 0xF0, 0x1, PMC
L2_TRANS_RFO, 0xF0, 0x2, PMC
L2_TRANS_CODE_RD, 0xF0, 0x4, PMC
L2_TRANS_ALL_PF, 0xF0, 0x8, PMC
L2_TRANS_L1D_WB, 0xF0, 0x10, PMC
L2_TRANS_L2_FILL, 0xF0, 0x20, PMC
L2_TRANS_L2_WB, 0xF0, 0x40, PMC
L2_TRANS_ALL_REQUESTS, 0xF0, 0x80, PMC
L2_LINES_IN_I, 0xF1, 0x1, PMC
L2_LINES_IN_S, 0xF1, 0x2, PMC
L2_LINES_IN_E, 0xF1, 0x4, PMC
L2_LINES_IN_ALL, 0xF1, 0x7, PMC
L2_LINES_OUT_DEMAND_CLEAN, 0xF2, 0x5, PMC
L2_LINES_OUT_DEMAND_DIRTY, 0xF2, 0x6, PMC
OFFCORE_RESPONSE_0_OPTIONS, 0xB7, 0x1, PMC, MATCH0|MATCH1
OFFCORE_RESPONSE_0_DMND_DATA_RD_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_DMND_RFO_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_DMND_CODE_RD_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_WB_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_PF_L2_DATA_RD_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_PF_L2_RFO_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_PF_L2_CODE_RD_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_PF_L3_DATA_RD_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_PF_L3_RFO_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_PF_L3_CODE_RD_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_SPLIT_LOCK_UC_LOCK_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_STREAMING_STORES_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_0_OTHER_ANY, 0xB7, 0x1, PMC
OFFCORE_RESPONSE_1_OPTIONS, 0xBB, 0x1, PMC, MATCH0|MATCH1
OFFCORE_RESPONSE_1_DMND_DATA_RD_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_DMND_RFO_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_DMND_CODE_RD_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_WB_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_PF_L2_DATA_RD_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_PF_L2_RFO_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_PF_L2_CODE_RD_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_PF_L3_DATA_RD_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_PF_L3_RFO_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_PF_L3_CODE_RD_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_SPLIT_LOCK_UC_LOCK_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_STREAMING_STORES_ANY, 0xBB, 0x1, PMC
OFFCORE_RESPONSE_1_OTHER_ANY, 0xBB, 0x1, PMC
EVENT_MSG_DOORBELL_RCVD, 0x42, 0x8, UBOX
PHOLD_CYCLES_ASSERT_TO_ACK, 0x45, 0x1, UBOX
RACU_REQUESTS, 0x46, 0x0, UBOX
UNCORE_CLOCK, 0x0, 0x0, UBOXFIX
CBOX_CLOCKTICKS, 0x0, 0x0, CBOX
TXR_INSERTS_AD_CACHE, 0x2, 0x1, CBOX
TXR_INSERTS_AK_CACHE, 0x2, 0x2, CBOX
TXR_INSERTS_BL_CACHE, 0x2, 0x4, CBOX
TXR_INSERTS_IV_CACHE, 0x2, 0x8, CBOX
TXR_INSERTS_AD_CORE, 0x2, 0x10, CBOX
TXR_INSERTS_AK_CORE, 0x2, 0x20, CBOX
TXR_INSERTS_BL_CORE, 0x2, 0x40, CBOX
TXR_ADS_USED_AD, 0x4, 0x1, CBOX
TXR_ADS_USED_AK, 0x4, 0x2, CBOX
TXR_ADS_USED_BL, 0x4, 0x4, CBOX
RING_BOUNCES_AD, 0x5, 0x1, CBOX
RING_BOUNCES_AK, 0x5, 0x2, CBOX
RING_BOUNCES_BL, 0x5, 0x4, CBOX
RING_BOUNCES_IV, 0x5, 0x10, CBOX
RING_SRC_THRTL, 0x7, 0x0, CBOX
FAST_ASSERTED, 0x9, 0x0, CBOX0C0|CBOX0C1|CBOX1C0|CBOX1C1|CBOX2C0|CBOX2C1|CBOX3C0|CBOX3C1|CBOX4C0|CBOX4C1|CBOX5C0|CBOX5C1|CBOX6C0|CBOX6C1|CBOX7C0|CBOX7C1|CBOX8C0|CBOX8C1|CBOX9C0|CBOX9C1|CBOX10C0|CBOX10C1|CBOX11C0|CBOX11C1|CBOX12C0|CBOX12C1|CBOX13C0|CBOX13C1|CBOX14C0|CBOX14C1|CBOX15C0|CBOX15C1|CBOX16C0|CBOX16C1|CBOX17C0|CBOX17C1|CBOX18C0|CBOX18C1|CBOX19C0|CBOX19C1|CBOX20C0|CBOX20C1|CBOX21C0|CBOX21C1|CBOX22C0|CBOX22C1|CBOX23C0|CBOX23C1
BOUNCE_CONTROL, 0xA, 0x0, CBOX
RING_AD_USED_UP_EVEN, 0x1B, 0x1, CBOX
RING_AD_USED_UP_ODD, 0x1B, 0x2, CBOX
RING_AD_USED_UP, 0x1B, 0x3, CBOX
RING_AD_USED_DOWN_EVEN, 0x1B, 0x4, CBOX
RING_AD_USED_DOWN_ODD, 0x1B, 0x8, CBOX
RING_AD_USED_DOWN, 0x1B, 0xC, CBOX
RING_AD_USED_ANY, 0x1B, 0xF, CBOX
RING_AK_USED_UP_EVEN, 0x1C, 0x1, CBOX
RING_AK_USED_UP_ODD, 0x1C, 0x2, CBOX
RING_AK_USED_UP, 0x1C, 0x3, CBOX
RING_AK_USED_DOWN_EVEN, 0x1C, 0x4, CBOX
RING_AK_USED_DOWN_ODD, 0x1C, 0x8, CBOX
RING_AK_USED_DOWN, 0x1C, 0xC, CBOX
RING_AK_USED_ANY, 0x1C, 0xF, CBOX
RING_BL_USED_UP_EVEN, 0x1D, 0x1, CBOX
RING_BL_USED_UP_ODD, 0x1D, 0x2, CBOX
RING_BL_USED_UP, 0x1D, 0x3, CBOX
RING_BL_USED_DOWN_EVEN, 0x1D, 0x4, CBOX
RING_BL_USED_DOWN_ODD, 0x1D, 0x8, CBOX
RING_BL_USED_DOWN, 0x1D, 0xC, CBOX
RING_BL_USED_ANY, 0x1D, 0xF, CBOX
RING_IV_USED_UP, 0x1E, 0x3, CBOX
RING_IV_USED_DN, 0x1E, 0xC, CBOX
RING_IV_USED_ANY, 0x1E, 0xF, CBOX
RING_IV_USED_DOWN, 0x1E, 0x33, CBOX
COUNTER0_OCCUPANCY, 0x1F, 0x0, CBOX
COUNTER0_OCCUPANCY_COUNT, 0x1F, 0x0, CBOX
RXR_OCCUPANCY_IRQ, 0x11, 0x1, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
RXR_OCCUPANCY_IRQ_REJ, 0x11, 0x2, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
RXR_OCCUPANCY_IPQ, 0x11, 0x4, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
RXR_OCCUPANCY_PRQ_REJ, 0x11, 0x20, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
RXR_EXT_STARVED_IRQ, 0x12, 0x1, CBOX
RXR_EXT_STARVED_IPQ, 0x12, 0x2, CBOX
RXR_EXT_STARVED_PRQ, 0x12, 0x4, CBOX
RXR_EXT_STARVED_ISMQ_BIDS, 0x12, 0x8, CBOX
RXR_INSERTS_IRQ, 0x13, 0x1, CBOX
RXR_INSERTS_IRQ_REJ, 0x13, 0x2, CBOX
RXR_INSERTS_IPQ, 0x13, 0x4, CBOX
RXR_INSERTS_PRQ, 0x13, 0x10, CBOX
RXR_INSERTS_PRQ_REJ, 0x13, 0x20, CBOX
RXR_IPQ_RETRY_ANY, 0x31, 0x1, CBOX
RXR_IPQ_RETRY_FULL, 0x31, 0x2, CBOX
RXR_IPQ_RETRY_ADDR_CONFLICT, 0x31, 0x4, CBOX
RXR_IPQ_RETRY_QPI_CREDITS, 0x31, 0x10, CBOX
RXR_IPQ_RETRY2_AD_SBO, 0x28, 0x1, CBOX
RXR_IPQ_RETRY2_TARGET, 0x28, 0x40, CBOX, NID
RXR_IRQ_RETRY_ANY, 0x32, 0x1, CBOX
RXR_IRQ_RETRY_FULL, 0x32, 0x2, CBOX
RXR_IRQ_RETRY_ADDR_CONFLICT, 0x32, 0x4, CBOX
RXR_IRQ_RETRY_RTID, 0x32, 0x8, CBOX
RXR_IRQ_RETRY_QPI_CREDITS, 0x32, 0x10, CBOX
RXR_IRQ_RETRY_IIO_CREDITS, 0x32, 0x20, CBOX
RXR_IRQ_RETRY_NID, 0x32, 0x40, CBOX, NID
RXR_IRQ_RETRY2_AD_SBO, 0x29, 0x1, CBOX
RXR_IRQ_RETRY2_BL_SBO, 0x29, 0x2, CBOX
RXR_IRQ_RETRY2_TARGET, 0x29, 0x40, CBOX, NID
RXR_ISMQ_RETRY_ANY, 0x33, 0x1, CBOX
RXR_ISMQ_RETRY_FULL, 0x33, 0x2, CBOX
RXR_ISMQ_RETRY_RTID, 0x33, 0x8, CBOX
RXR_ISMQ_RETRY_QPI_CREDITS, 0x33, 0x10, CBOX
RXR_ISMQ_RETRY_IIO_CREDITS, 0x33, 0x20, CBOX
RXR_ISMQ_RETRY_NID, 0x33, 0x40, CBOX, NID
RXR_ISMQ_RETRY_WB_CREDITS, 0x33, 0x80, CBOX, NID
RXR_ISMQ_RETRY2_AD_SBO, 0x2A, 0x1, CBOX
RXR_ISMQ_RETRY2_BL_SBO, 0x2A, 0x2, CBOX
RXR_ISMQ_RETRY2_TARGET, 0x2A, 0x40, CBOX, NID
LLC_LOOKUP_DATA_READ, 0x34, 0x3, CBOX, STATE
LLC_LOOKUP_WRITE, 0x34, 0x5, CBOX, STATE
LLC_LOOKUP_REMOTE_SNOOP, 0x34, 0x9, CBOX, STATE
LLC_LOOKUP_ANY, 0x34, 0x11, CBOX, STATE
LLC_LOOKUP_READ, 0x34, 0x21, CBOX, STATE
LLC_LOOKUP_NID, 0x34, 0x41, CBOX, NID|STATE
LLC_VICTIMS_M, 0x37, 0x1, CBOX
LLC_VICTIMS_E, 0x37, 0x2, CBOX
LLC_VICTIMS_S, 0x37, 0x4, CBOX
LLC_VICTIMS_F, 0x37, 0x8, CBOX
LLC_VICTIMS_MISS, 0x37, 0x10, CBOX
LLC_VICTIMS_NID, 0x37, 0x40, CBOX, NID|STATE
TOR_INSERTS_OPCODE, 0x35, 0x1, CBOX, OPCODE
TOR_INSERTS_MISS_OPCODE, 0x35, 0x3, CBOX, OPCODE
TOR_INSERTS_EVICTION, 0x35, 0x4, CBOX
TOR_INSERTS_ALL, 0x35, 0x8, CBOX
TOR_INSERTS_WB, 0x35, 0x10, CBOX
TOR_INSERTS_LOCAL_OPCODE, 0x35, 0x21, CBOX, OPCODE
TOR_INSERTS_MISS_LOCAL_OPCODE, 0x35, 0x23, CBOX, OPCODE
TOR_INSERTS_LOCAL, 0x35, 0x28, CBOX
TOR_INSERTS_MISS_LOCAL, 0x35, 0x2A, CBOX
TOR_INSERTS_NID_OPCODE, 0x35, 0x41, CBOX, OPCODE|NID
TOR_INSERTS_NID_MISS_OPCODE, 0x35, 0x43, CBOX, OPCODE|NID
TOR_INSERTS_NID_EVICION, 0x35, 0x44, CBOX, NID
TOR_INSERTS_NID_ALL, 0x35, 0x48, CBOX, NID
TOR_INSERTS_NID_MISS_ALL, 0x35, 0x4A, CBOX, NID
TOR_INSERTS_NID_WB, 0x35, 0x50, CBOX, NID
TOR_INSERTS_REMOTE_OPCODE, 0x35, 0x81, CBOX, OPCODE
TOR_INSERTS_MISS_REMOTE_OPCODE, 0x35, 0x83, CBOX, OPCODE
TOR_INSERTS_REMOTE, 0x35, 0x88, CBOX
TOR_INSERTS_MISS_REMOTE, 0x35, 0x8A, CBOX
TOR_OCCUPANCY_OPCODE, 0x36, 0x1, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE
TOR_OCCUPANCY_MISS_OPCODE, 0x36, 0x3, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE
TOR_OCCUPANCY_EVICTION, 0x36, 0x4, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
TOR_OCCUPANCY_ALL, 0x36, 0x8, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
TOR_OCCUPANCY_MISS_ALL, 0x36, 0xA, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
TOR_OCCUPANCY_WB, 0x36, 0x10, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
TOR_OCCUPANCY_LOCAL_OPCODE, 0x36, 0x21, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
TOR_OCCUPANCY_MISS_LOCAL_OPCODE, 0x36, 0x23, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
TOR_OCCUPANCY_LOCAL, 0x36, 0x28, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
TOR_OCCUPANCY_MISS_LOCAL, 0x36, 0x2A, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
TOR_OCCUPANCY_NID_OPCODE, 0x36, 0x41, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE|NID
TOR_OCCUPANCY_NID_MISS_OPCODE, 0x36, 0x43, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE|NID
TOR_OCCUPANCY_NID_EVICTION, 0x36, 0x44, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, NID
TOR_OCCUPANCY_NID_ALL, 0x36, 0x48, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, NID
TOR_OCCUPANCY_NID_MISS_ALL, 0x36, 0x4A, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, NID
TOR_OCCUPANCY_NID_WB, 0x36, 0x50, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, NID
TOR_OCCUPANCY_REMOTE_OPCODE, 0x36, 0x81, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE
TOR_OCCUPANCY_MISS_REMOTE_OPCODE, 0x36, 0x83, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0, OPCODE
TOR_OCCUPANCY_REMOTE, 0x36, 0x88, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
TOR_OCCUPANCY_MISS_REMOTE, 0x36, 0x8A, CBOX0C0|CBOX1C0|CBOX2C0|CBOX3C0|CBOX4C0|CBOX5C0|CBOX6C0|CBOX7C0|CBOX8C0|CBOX9C0|CBOX10C0|CBOX11C0|CBOX12C0|CBOX13C0|CBOX14C0|CBOX15C0|CBOX16C0|CBOX17C0|CBOX18C0|CBOX19C0|CBOX20C0|CBOX21C0|CBOX22C0|CBOX23C0
MISC_RSPI_WAS_FSE, 0x39, 0x1, CBOX
MISC_WC_ALIASING, 0x39, 0x2, CBOX
MISC_STARTED, 0x39, 0x4, CBOX
MISC_RFO_HIT_S, 0x39, 0x8, CBOX
MISC_CVZERO_PREFETCH_VICTIM, 0x39, 0x10, CBOX
MISC_CVZERO_PREFETCH_MISS, 0x39, 0x20, CBOX
SBO_CREDITS_ACQUIRED_AD, 0x3D, 0x1, CBOX
SBO_CREDITS_ACQUIRED_BL, 0x3D, 0x2, CBOX
SBO_CREDITS_ACQUIRED_ANY, 0x3D, 0x3, CBOX
SBO_CREDIT_OCCUPANCY_AD, 0x3E, 0x1, CBOX
SBO_CREDIT_OCCUPANCY_BL, 0x3E, 0x2, CBOX
SBO_CREDIT_OCCUPANCY_ANY, 0x3E, 0x3, CBOX
WBOX_CLOCKTICKS, 0x0, 0x0, WBOX
CORE0_TRANSITION_CYCLES, 0x60, 0x0, WBOX
CORE1_TRANSITION_CYCLES, 0x61, 0x0, WBOX
CORE2_TRANSITION_CYCLES, 0x62, 0x0, WBOX
CORE3_TRANSITION_CYCLES, 0x63, 0x0, WBOX
CORE4_TRANSITION_CYCLES, 0x64, 0x0, WBOX
CORE5_TRANSITION_CYCLES, 0x65, 0x0, WBOX
CORE6_TRANSITION_CYCLES, 0x66, 0x0, WBOX
CORE7_TRANSITION_CYCLES, 0x67, 0x0, WBOX
CORE8_TRANSITION_CYCLES, 0x68, 0x0, WBOX
CORE9_TRANSITION_CYCLES, 0x69, 0x0, WBOX
CORE10_TRANSITION_CYCLES, 0x6A, 0x0, WBOX
CORE11_TRANSITION_CYCLES, 0x6B, 0x0, WBOX
CORE12_TRANSITION_CYCLES, 0x6C, 0x0, WBOX
CORE13_TRANSITION_CYCLES, 0x6D, 0x0, WBOX
CORE14_TRANSITION_CYCLES, 0x6E, 0x0, WBOX
CORE15_TRANSITION_CYCLES, 0x6F, 0x0, WBOX
CORE16_TRANSITION_CYCLES, 0x70, 0x0, WBOX
CORE17_TRANSITION_CYCLES, 0x71, 0x0, WBOX
FIVR_PS_PS0_CYCLES, 0x75, 0x0, WBOX
FIVR_PS_PS1_CYCLES, 0x75, 0x0, WBOX
FIVR_PS_PS2_CYCLES, 0x75, 0x0, WBOX
FIVR_PS_PS3_CYCLES, 0x75, 0x0, WBOX
DEMOTIONS_CORE0, 0x30, 0x0, WBOX
DEMOTIONS_CORE1, 0x31, 0x0, WBOX
DEMOTIONS_CORE2, 0x32, 0x0, WBOX
DEMOTIONS_CORE3, 0x33, 0x0, WBOX
DEMOTIONS_CORE4, 0x34, 0x0, WBOX
DEMOTIONS_CORE5, 0x35, 0x0, WBOX
DEMOTIONS_CORE6, 0x36, 0x0, WBOX
DEMOTIONS_CORE7, 0x37, 0x0, WBOX
DEMOTIONS_CORE8, 0x38, 0x0, WBOX
DEMOTIONS_CORE9, 0x39, 0x0, WBOX
DEMOTIONS_CORE10, 0x3A, 0x0, WBOX
DEMOTIONS_CORE11, 0x3B, 0x0, WBOX
DEMOTIONS_CORE12, 0x3C, 0x0, WBOX
DEMOTIONS_CORE13, 0x3D, 0x0, WBOX
DEMOTIONS_CORE14, 0x3E, 0x0, WBOX
DEMOTIONS_CORE15, 0x3F, 0x0, WBOX
DEMOTIONS_CORE16, 0x40, 0x0, WBOX
DEMOTIONS_CORE17, 0x41, 0x0, WBOX
FREQ_BAND0_CYCLES, 0xB, 0x0, WBOX, OCCUPANCY_FILTER
FREQ_BAND1_CYCLES, 0xC, 0x0, WBOX, OCCUPANCY_FILTER
FREQ_BAND2_CYCLES, 0xD, 0x0, WBOX, OCCUPANCY_FILTER
FREQ_BAND3_CYCLES, 0xE, 0x0, WBOX, OCCUPANCY_FILTER
FREQ_MAX_LIMIT_THERMAL_CYCLES, 0x4, 0x0, WBOX
FREQ_MAX_OS_CYCLES, 0x6, 0x0, WBOX
FREQ_MAX_POWER_CYCLES, 0x5, 0x0, WBOX
FREQ_MIN_IO_P_CYCLES, 0x73, 0x0, WBOX
FREQ_TRANS_CYCLES, 0x74, 0x0, WBOX
MEMORY_PHASE_SHEDDING_CYCLES, 0x2F, 0x0, WBOX
POWER_STATE_OCCUPANCY_CORES_C0, 0x80, 0x40, WBOX
POWER_STATE_OCCUPANCY_CORES_C3, 0x80, 0x80, WBOX
POWER_STATE_OCCUPANCY_CORES_C6, 0x80, 0xC0, WBOX
PROCHOT_EXTERNAL_CYCLES, 0xA, 0x0, WBOX
PROCHOT_INTERNAL_CYCLES, 0x9, 0x0, WBOX
TOTAL_TRANSITION_CYCLES, 0x72, 0x0, WBOX
VR_HOT_CYCLES, 0x42, 0x0, WBOX
UFS_BANDWIDTH_MAX_RANGE, 0x7E, 0x0, WBOX
UFS_TRANSITIONS_DOWN, 0x7C, 0x0, WBOX
UFS_TRANSITIONS_IO_P_LIMIT, 0x7D, 0x0, WBOX
UFS_TRANSITIONS_NO_CHANGE, 0x79, 0x0, WBOX
UFS_TRANSITIONS_UP_RING, 0x7A, 0x0, WBOX
UFS_TRANSITIONS_UP_STALL, 0x7B, 0x0, WBOX
CORES_IN_C3, 0x0, 0x0, WBOX0FIX
CORES_IN_C6, 0x0, 0x0, WBOX1FIX
BBOX_CLOCKTICKS, 0x0, 0x0, BBOX
ADDR_OPC_MATCH_ADDR, 0x20, 0x1, BBOX, MATCH0|MATCH1
ADDR_OPC_MATCH_OPC, 0x20, 0x2, BBOX, OPCODE
ADDR_OPC_MATCH_FILT, 0x20, 0x3, BBOX, OPCODE|MATCH0|MATCH1
ADDR_OPC_MATCH_AD, 0x20, 0x4, BBOX, OPCODE
ADDR_OPC_MATCH_BL, 0x20, 0x8, BBOX, OPCODE
ADDR_OPC_MATCH_AK, 0x20, 0x10, BBOX, OPCODE
BT_CYCLES_NE, 0x42, 0x0, BBOX
BT_OCCUPANCY, 0x43, 0x0, BBOX
BYPASS_IMC_TAKEN, 0x14, 0x1, BBOX
BYPASS_IMC_NOT_TAKEN, 0x14, 0x2, BBOX
CONFLICT_CYCLES, 0xB, 0x0, BBOX0C1|BBOX1C1
DIRECT2CORE_COUNT, 0x11, 0x0, BBOX
DIRECT2CORE_CYCLES_DISABLED, 0x12, 0x0, BBOX
DIRECT2CORE_TXN_OVERRIDE, 0x13, 0x0, BBOX
DIRECTORY_LAT_OPT, 0x41, 0x0, BBOX
DIRECTORY_LOOKUP_SNP, 0xC, 0x1, BBOX
DIRECTORY_LOOKUP_NO_SNP, 0xC, 0x2, BBOX
DIRECTORY_UPDATE_SET, 0xD, 0x1, BBOX
DIRECTORY_UPDATE_CLEAR, 0xD, 0x2, BBOX
DIRECTORY_UPDATE_ANY, 0xD, 0x3, BBOX
HITME_LOOKUP_READ_OR_INVITOE, 0x70, 0x1, BBOX
HITME_LOOKUP_WBMTOI, 0x70, 0x2, BBOX
HITME_LOOKUP_ACKCNFLTWBI, 0x70, 0x4, BBOX
HITME_LOOKUP_WBMTOE_OR_S, 0x70, 0x8, BBOX
HITME_LOOKUP_HOM, 0x70, 0xF, BBOX
HITME_LOOKUP_RSPFWDI_REMOTE, 0x70, 0x10, BBOX
HITME_LOOKUP_RSPFWDI_LOCAL, 0x70, 0x20, BBOX
HITME_LOOKUP_INVALS, 0x70, 0x26, BBOX
HITME_LOOKUP_RSPFWDS, 0x70, 0x40, BBOX
HITME_LOOKUP_ALLOCS, 0x70, 0x70, BBOX
HITME_LOOKUP_RSP, 0x70, 0x80, BBOX
HITME_LOOKUP_ALL, 0x70, 0xFF, BBOX
HITME_HIT_READ_OR_INVITOE, 0x71, 0x1, BBOX
HITME_HIT_WBMTOI, 0x71, 0x2, BBOX
HITME_HIT_ACKCNFLTWBI, 0x71, 0x4, BBOX
HITME_HIT_WBMTOE_OR_S, 0x71, 0x8, BBOX
HITME_HIT_HOM, 0x71, 0xF, BBOX
HITME_HIT_RSPFWDI_REMOTE, 0x71, 0x10, BBOX
HITME_HIT_RSPFWDI_LOCAL, 0x71, 0x20, BBOX
HITME_HIT_INVALS, 0x71, 0x26, BBOX
HITME_HIT_RSPFWDS, 0x71, 0x40, BBOX
HITME_HIT_EVICTS, 0x71, 0x42, BBOX
HITME_HIT_ALLOCS, 0x71, 0x70, BBOX
HITME_HIT_RSP, 0x71, 0x80, BBOX
HITME_HIT_ALL, 0x71, 0xFF, BBOX
HITME_HIT_PV_BITS_SET_READ_OR_INVITOE, 0x72, 0x1, BBOX
HITME_HIT_PV_BITS_SET_WBMTOI, 0x72, 0x2, BBOX
HITME_HIT_PV_BITS_SET_ACKCNFLTWBI, 0x72, 0x4, BBOX
HITME_HIT_PV_BITS_SET_WBMTOE_OR_S, 0x72, 0x8, BBOX
HITME_HIT_PV_BITS_SET_HOM, 0x72, 0xF, BBOX
HITME_HIT_PV_BITS_SET_RSPFWDI_REMOTE, 0x72, 0x10, BBOX
HITME_HIT_PV_BITS_SET_RSPFWDI_LOCAL, 0x72, 0x20, BBOX
HITME_HIT_PV_BITS_SET_RSPFWDS, 0x72, 0x40, BBOX
HITME_HIT_PV_BITS_SET_RSP, 0x72, 0x80, BBOX
HITME_HIT_PV_BITS_SET_ALL, 0x72, 0xFF, BBOX
IGR_NO_CREDIT_CYCLES_AD_QPI0, 0x22, 0x1, BBOX
IGR_NO_CREDIT_CYCLES_AD_QPI1, 0x22, 0x2, BBOX
IGR_NO_CREDIT_CYCLES_AD_QPI2, 0x22, 0x10, BBOX
IGR_NO_CREDIT_CYCLES_BL_QPI0, 0x22, 0x4, BBOX
IGR_NO_CREDIT_CYCLES_BL_QPI1, 0x22, 0x8, BBOX
IGR_NO_CREDIT_CYCLES_BL_QPI2, 0x22, 0x20, BBOX
IMC_READS_NORMAL, 0x17, 0x1, BBOX
IMC_RETRY, 0x1E, 0x0, BBOX
IMC_WRITES_FULL, 0x1A, 0x1, BBOX
IMC_WRITES_PARTIAL, 0x1A, 0x2, BBOX
IMC_WRITES_FULL_ISOCH, 0x1A, 0x4, BBOX
IMC_WRITES_PARTIAL_ISOCH, 0x1A, 0x8, BBOX
IMC_WRITES_ALL, 0x1A, 0xF, BBOX
OSB_READS_LOCAL, 0x53, 0x2, BBOX
OSB_INVITOE_LOCAL, 0x53, 0x4, BBOX
OSB_REMOTE, 0x53, 0x8, BBOX
OSB_CANCELLED, 0x53, 0x10, BBOX
OSB_READS_LOCAL_USEFUL, 0x53, 0x20, BBOX
OSB_REMOTE_USEFUL, 0x53, 0x40, BBOX
OSB_EDR_ALL, 0x54, 0x1, BBOX
OSB_EDR_READS_LOCAL_I, 0x54, 0x2, BBOX
OSB_EDR_READS_REMOTE_I, 0x54, 0x4, BBOX
OSB_EDR_READS_LOCAL_S, 0x54, 0x8, BBOX
OSB_EDR_READS_REMOTE_S, 0x54, 0x10, BBOX
REQUESTS_READS_LOCAL, 0x1, 0x1, BBOX
REQUESTS_READS_REMOTE, 0x1, 0x2, BBOX
REQUESTS_READS, 0x1, 0x3, BBOX
REQUESTS_WRITES_LOCAL, 0x1, 0x4, BBOX
REQUESTS_WRITES_REMOTE, 0x1, 0x8, BBOX
REQUESTS_WRITES, 0x1, 0xC, BBOX
REQUESTS_INVITOE_LOCAL, 0x1, 0x10, BBOX
REQUESTS_INVITOE_REMOTE, 0x1, 0x20, BBOX
REQUESTS_ALL_LOCAL, 0x1, 0x15, BBOX
REQUESTS_ALL_REMOTE, 0x1, 0x2A, BBOX
REQUESTS_ALL, 0x1, 0x3F, BBOX
RING_AD_USED_CW_EVEN, 0x3E, 0x1, BBOX
RING_AD_USED_CW_ODD, 0x3E, 0x2, BBOX
RING_AD_USED_CW, 0x3E, 0x3, BBOX
RING_AD_USED_CCW_EVEN, 0x3E, 0x4, BBOX
RING_AD_USED_CCW_ODD, 0x3E, 0x8, BBOX
RING_AD_USED_CCW, 0x3E, 0xC, BBOX
RING_AK_USED_CW_EVEN, 0x3F, 0x1, BBOX
RING_AK_USED_CW_ODD, 0x3F, 0x2, BBOX
RING_AK_USED_CW, 0x3F, 0x3, BBOX
RING_AK_USED_CCW_EVEN, 0x3F, 0x4, BBOX
RING_AK_USED_CCW_ODD, 0x3F, 0x8, BBOX
RING_AK_USED_CCW, 0x3F, 0xC, BBOX
RING_BL_USED_CW_EVEN, 0x40, 0x1, BBOX
RING_BL_USED_CW_ODD, 0x40, 0x2, BBOX
RING_BL_USED_CW, 0x40, 0x3, BBOX
RING_BL_USED_CCW_EVEN, 0x40, 0x4, BBOX
RING_BL_USED_CCW_ODD, 0x40, 0x8, BBOX
RING_BL_USED_CCW, 0x40, 0xC, BBOX
RPQ_CYCLES_NO_REG_CREDITS_CHN0, 0x15, 0x1, BBOX
RPQ_CYCLES_NO_REG_CREDITS_CHN1, 0x15, 0x2, BBOX
RPQ_CYCLES_NO_REG_CREDITS_CHN2, 0x15, 0x4, BBOX
RPQ_CYCLES_NO_REG_CREDITS_CHN3, 0x15, 0x8, BBOX
RPQ_CYCLES_NO_REG_CREDITS_ALL, 0x15, 0xF, BBOX
WPQ_CYCLES_NO_REG_CREDITS_CHN0, 0x18, 0x1, BBOX
WPQ_CYCLES_NO_REG_CREDITS_CHN1, 0x18, 0x2, BBOX
WPQ_CYCLES_NO_REG_CREDITS_CHN2, 0x18, 0x4, BBOX
WPQ_CYCLES_NO_REG_CREDITS_CHN3, 0x18, 0x8, BBOX
WPQ_CYCLES_NO_REG_CREDITS_ALL, 0x18, 0xF, BBOX
SBO0_CREDITS_ACQUIRED_AD, 0x68, 0x1, BBOX
SBO0_CREDITS_ACQUIRED_BL, 0x68, 0x2, BBOX
SBO0_CREDIT_OCCUPANCY_AD, 0x6A, 0x1, BBOX
SBO0_CREDIT_OCCUPANCY_BL, 0x6A, 0x2, BBOX
SBO1_CREDITS_ACQUIRED_AD, 0x69, 0x1, BBOX
SBO1_CREDITS_ACQUIRED_BL, 0x69, 0x2, BBOX
SBO1_CREDIT_OCCUPANCY_AD, 0x6B, 0x1, BBOX
SBO1_CREDIT_OCCUPANCY_BL, 0x6B, 0x2, BBOX
SNOOPS_RSP_AFTER_DATA_LOCAL, 0xA, 0x1, BBOX
SNOOPS_RSP_AFTER_DATA_REMOTE, 0xA, 0x2, BBOX
SNOOP_CYCLES_NE_LOCAL, 0x8, 0x1, BBOX
SNOOP_CYCLES_NE_REMOTE, 0x8, 0x2, BBOX
SNOOP_CYCLES_NE_ALL, 0x8, 0x3, BBOX
SNOOP_OCCUPANCY_LOCAL, 0x9, 0x1, BBOX
SNOOP_OCCUPANCY_REMOTE, 0x9, 0x2, BBOX
SNOOP_RESP_RSPI, 0x21, 0x1, BBOX
SNOOP_RESP_RSPS, 0x21, 0x2, BBOX
SNOOP_RESP_RSPIFWD, 0x21, 0x4, BBOX
SNOOP_RESP_RSPSFWD, 0x21, 0x8, BBOX
SNOOP_RESP_RSP_WB, 0x21, 0x10, BBOX
SNOOP_RESP_RSP_FWD_WB, 0x21, 0x20, BBOX
SNOOP_RESP_RSPCNFLCT, 0x21, 0x40, BBOX
SNP_RESP_RECV_LOCAL_RSPI, 0x60, 0x1, BBOX
SNP_RESP_RECV_LOCAL_RSPS, 0x60, 0x2, BBOX
SNP_RESP_RECV_LOCAL_RSPIFWD, 0x60, 0x4, BBOX
SNP_RESP_RECV_LOCAL_RSPSFWD, 0x60, 0x8, BBOX
SNP_RESP_RECV_LOCAL_RSPXWB, 0x60, 0x10, BBOX
SNP_RESP_RECV_LOCAL_RSPXFWDXWB, 0x60, 0x20, BBOX
SNP_RESP_RECV_LOCAL_RSPCNFLCT, 0x60, 0x40, BBOX
SNP_RESP_RECV_LOCAL_OTHER, 0x60, 0x80, BBOX
STALL_NO_SBO_CREDIT_SBO0_AD, 0x6C, 0x1, BBOX
STALL_NO_SBO_CREDIT_SBO1_AD, 0x6C, 0x2, BBOX
STALL_NO_SBO_CREDIT_SBO0_BL, 0x6C, 0x4, BBOX
STALL_NO_SBO_CREDIT_SBO0_BL, 0x6C, 0x8, BBOX
TAD_REQUESTS_G0_REGION0, 0x1B, 0x1, BBOX
TAD_REQUESTS_G0_REGION1, 0x1B, 0x2, BBOX
TAD_REQUESTS_G0_REGION2, 0x1B, 0x4, BBOX
TAD_REQUESTS_G0_REGION3, 0x1B, 0x8, BBOX
TAD_REQUESTS_G0_REGION4, 0x1B, 0x10, BBOX
TAD_REQUESTS_G0_REGION5, 0x1B, 0x20, BBOX
TAD_REQUESTS_G0_REGION6, 0x1B, 0x40, BBOX
TAD_REQUESTS_G0_REGION7, 0x1B, 0x80, BBOX
TAD_REQUESTS_G1_REGION8, 0x1C, 0x1, BBOX
TAD_REQUESTS_G1_REGION9, 0x1C, 0x2, BBOX
TAD_REQUESTS_G1_REGION10, 0x1C, 0x4, BBOX
TAD_REQUESTS_G1_REGION11, 0x1C, 0x8, BBOX
TRACKER_CYCLES_FULL_GP, 0x2, 0x1, BBOX
TRACKER_CYCLES_FULL_ALL, 0x2, 0x2, BBOX
TRACKER_CYCLES_NE_LOCAL, 0x3, 0x1, BBOX
TRACKER_CYCLES_NE_REMOTE, 0x3, 0x2, BBOX
TRACKER_CYCLES_NE_ALL, 0x3, 0x3, BBOX
TRACKER_OCCUPANCY_READS_LOCAL, 0x4, 0x4, BBOX
TRACKER_OCCUPANCY_READS_REMOTE, 0x4, 0x8, BBOX
TRACKER_OCCUPANCY_WRITES_LOCAL, 0x4, 0x10, BBOX
TRACKER_OCCUPANCY_WRITES_REMOTE, 0x4, 0x20, BBOX
TRACKER_OCCUPANCY_RW_LOCAL, 0x4, 0x14, BBOX
TRACKER_OCCUPANCY_RW_REMOTE, 0x4, 0x28, BBOX
TRACKER_OCCUPANCY_INVITOE_LOCAL, 0x4, 0x40, BBOX
TRACKER_OCCUPANCY_INVITOE_REMOTE, 0x4, 0x80, BBOX
TRACKER_OCCUPANCY_ALL_LOCAL, 0x4, 0x54, BBOX
TRACKER_OCCUPANCY_ALL_REMOTE, 0x4, 0xA8, BBOX
TRACKER_PENDING_OCCUPANCY_LOCAL, 0x5, 0x1, BBOX
TRACKER_PENDING_OCCUPANCY_REMOTE, 0x5, 0x2, BBOX
TRACKER_PENDING_OCCUPANCY_ALL, 0x5, 0x3, BBOX
TXR_AD_CYCLES_FULL_SCHED0, 0x2A, 0x1, BBOX
TXR_AD_CYCLES_FULL_SCHED1, 0x2A, 0x2, BBOX
TXR_AD_CYCLES_FULL_ALL, 0x2A, 0x3, BBOX
TXR_AK, 0xE, 0x0, BBOX
TXR_AK_CYCLES_FULL_SCHED0, 0x32, 0x1, BBOX
TXR_AK_CYCLES_FULL_SCHED1, 0x32, 0x2, BBOX
TXR_AK_CYCLES_FULL_ALL, 0x32, 0x3, BBOX
TXR_BL_DRS_CACHE, 0x10, 0x1, BBOX
TXR_BL_DRS_CORE, 0x10, 0x2, BBOX
TXR_BL_DRS_QPI, 0x10, 0x4, BBOX
TXR_BL_CYCLES_FULL_SCHED0, 0x36, 0x1, BBOX
TXR_BL_CYCLES_FULL_SCHED1, 0x36, 0x2, BBOX
TXR_BL_CYCLES_FULL_ALL, 0x36, 0x3, BBOX
TXR_BL_OCCUPANCY, 0x34, 0x0, BBOX
TXR_STARVED_AK, 0x6D, 0x1, BBOX
TXR_STARVED_BL, 0x6D, 0x2, BBOX
DRAM_CLOCKTICKS, 0x0, 0x0, MBOX
ACT_COUNT_RD, 0x1, 0x1, MBOX
ACT_COUNT_WR, 0x1, 0x2, MBOX
ACT_COUNT_BYP, 0x1, 0x8, MBOX
BYP_CMDS_ACT, 0xA1, 0x1, MBOX
BYP_CMDS_CAS, 0xA1, 0x2, MBOX
BYP_CMDS_PRE, 0xA1, 0x4, MBOX
CAS_COUNT_RD_REG, 0x4, 0x1, MBOX
CAS_COUNT_RD_UNDERFILL, 0x4, 0x2, MBOX
CAS_COUNT_RD, 0x4, 0x3, MBOX
CAS_COUNT_RD_WMM, 0x4, 0x10, MBOX
CAS_COUNT_RD_RMM, 0x4, 0x20, MBOX
CAS_COUNT_WR_WMM, 0x4, 0x4, MBOX
CAS_COUNT_WR_RMM, 0x4, 0x8, MBOX
CAS_COUNT_WR, 0x4, 0xC, MBOX
CAS_COUNT_ALL, 0x4, 0xF, MBOX
DRAM_PRE_ALL, 0x6, 0x0, MBOX
DRAM_REFRESH_PANIC, 0x5, 0x2, MBOX
DRAM_REFRESH_HIGH, 0x5, 0x4, MBOX
ECC_CORRECTABLE_ERRORS, 0x9, 0x0, MBOX
MAJOR_MODES_READ, 0x7, 0x1, MBOX
MAJOR_MODES_WRITE, 0x7, 0x2, MBOX
MAJOR_MODES_PARTIAL, 0x7, 0x3, MBOX
MAJOR_MODES_ISOCH, 0x7, 0x4, MBOX
POWER_CHANNEL_DLLOFF, 0x84, 0x0, MBOX
POWER_CHANNEL_PPD, 0x85, 0x0, MBOX
POWER_CKE_CYCLES_RANK0, 0x83, 0x1, MBOX
POWER_CKE_CYCLES_RANK1, 0x83, 0x2, MBOX
POWER_CKE_CYCLES_RANK2, 0x83, 0x4, MBOX
POWER_CKE_CYCLES_RANK3, 0x83, 0x8, MBOX
POWER_CKE_CYCLES_RANK4, 0x83, 0x10, MBOX
POWER_CKE_CYCLES_RANK5, 0x83, 0x20, MBOX
POWER_CKE_CYCLES_RANK6, 0x83, 0x40, MBOX
POWER_CKE_CYCLES_RANK7, 0x83, 0x80, MBOX
POWER_CRITICAL_THROTTLE_CYCLES, 0x86, 0x0, MBOX
POWER_PCU_THROTTLING, 0x42, 0x0, MBOX
POWER_SELF_REFRESH, 0x43, 0x0, MBOX
POWER_THROTTLE_CYCLES_RANK0, 0x41, 0x1, MBOX
POWER_THROTTLE_CYCLES_RANK1, 0x41, 0x2, MBOX
POWER_THROTTLE_CYCLES_RANK2, 0x41, 0x4, MBOX
POWER_THROTTLE_CYCLES_RANK3, 0x41, 0x8, MBOX
POWER_THROTTLE_CYCLES_RANK4, 0x41, 0x10, MBOX
POWER_THROTTLE_CYCLES_RANK5, 0x41, 0x20, MBOX
POWER_THROTTLE_CYCLES_RANK6, 0x41, 0x40, MBOX
POWER_THROTTLE_CYCLES_RANK7, 0x41, 0x80, MBOX
PREEMPTION_RD_PREEMPT_RD, 0x8, 0x1, MBOX
PREEMPTION_RD_PREEMPT_WR, 0x8, 0x2, MBOX
PRE_COUNT_PAGE_MISS, 0x2, 0x1, MBOX
PRE_COUNT_PAGE_CLOSE, 0x2, 0x2, MBOX
PRE_COUNT_RD, 0x2, 0x4, MBOX
PRE_COUNT_WR, 0x2, 0x8, MBOX
PRE_COUNT_BYP, 0x2, 0x10, MBOX
RD_CAS_PRIO_LOW, 0xA0, 0x1, MBOX
RD_CAS_PRIO_MED, 0xA0, 0x2, MBOX
RD_CAS_PRIO_HIGH, 0xA0, 0x4, MBOX
RD_CAS_PRIO_PANIC, 0xA0, 0x8, MBOX
RD_CAS_RANK0_BANK0, 0xB0, 0x0, MBOX
RD_CAS_RANK0_BANK1, 0xB0, 0x1, MBOX
RD_CAS_RANK0_BANK2, 0xB0, 0x2, MBOX
RD_CAS_RANK0_BANK3, 0xB0, 0x3, MBOX
RD_CAS_RANK0_BANK4, 0xB0, 0x4, MBOX
RD_CAS_RANK0_BANK5, 0xB0, 0x5, MBOX
RD_CAS_RANK0_BANK6, 0xB0, 0x6, MBOX
RD_CAS_RANK0_BANK7, 0xB0, 0x7, MBOX
RD_CAS_RANK0_BANK8, 0xB0, 0x8, MBOX
RD_CAS_RANK0_BANK9, 0xB0, 0x9, MBOX
RD_CAS_RANK0_BANK10, 0xB0, 0xA, MBOX
RD_CAS_RANK0_BANK11, 0xB0, 0xB, MBOX
RD_CAS_RANK0_BANK12, 0xB0, 0xC, MBOX
RD_CAS_RANK0_BANK13, 0xB0, 0xD, MBOX
RD_CAS_RANK0_BANK14, 0xB0, 0xE, MBOX
RD_CAS_RANK0_BANK15, 0xB0, 0xF, MBOX
RD_CAS_RANK0_ALLBANKS, 0xB0, 0x10, MBOX
RD_CAS_RANK0_BANKG0, 0xB0, 0x11, MBOX
RD_CAS_RANK0_BANKG1, 0xB0, 0x12, MBOX
RD_CAS_RANK0_BANKG2, 0xB0, 0x13, MBOX
RD_CAS_RANK0_BANKG3, 0xB0, 0x14, MBOX
RD_CAS_RANK1_BANK0, 0xB1, 0x0, MBOX
RD_CAS_RANK1_BANK1, 0xB1, 0x1, MBOX
RD_CAS_RANK1_BANK2, 0xB1, 0x2, MBOX
RD_CAS_RANK1_BANK3, 0xB1, 0x3, MBOX
RD_CAS_RANK1_BANK4, 0xB1, 0x4, MBOX
RD_CAS_RANK1_BANK5, 0xB1, 0x5, MBOX
RD_CAS_RANK1_BANK6, 0xB1, 0x6, MBOX
RD_CAS_RANK1_BANK7, 0xB1, 0x7, MBOX
RD_CAS_RANK1_BANK8, 0xB1, 0x8, MBOX
RD_CAS_RANK1_BANK9, 0xB1, 0x9, MBOX
RD_CAS_RANK1_BANK10, 0xB1, 0xA, MBOX
RD_CAS_RANK1_BANK11, 0xB1, 0xB, MBOX
RD_CAS_RANK1_BANK12, 0xB1, 0xC, MBOX
RD_CAS_RANK1_BANK13, 0xB1, 0xD, MBOX
RD_CAS_RANK1_BANK14, 0xB1, 0xE, MBOX
RD_CAS_RANK1_BANK15, 0xB1, 0xF, MBOX
RD_CAS_RANK1_ALLBANKS, 0xB1, 0x10, MBOX
RD_CAS_RANK1_BANKG0, 0xB1, 0x11, MBOX
RD_CAS_RANK1_BANKG1, 0xB1, 0x12, MBOX
RD_CAS_RANK1_BANKG2, 0xB1, 0x13, MBOX
RD_CAS_RANK1_BANKG3, 0xB1, 0x14, MBOX
RD_CAS_RANK2_BANK0, 0xB2, 0x0, MBOX
RD_CAS_RANK2_BANK1, 0xB2, 0x1, MBOX
RD_CAS_RANK2_BANK2, 0xB2, 0x2, MBOX
RD_CAS_RANK2_BANK3, 0xB2, 0x3, MBOX
RD_CAS_RANK2_BANK4, 0xB2, 0x4, MBOX
RD_CAS_RANK2_BANK5, 0xB2, 0x5, MBOX
RD_CAS_RANK2_BANK6, 0xB2, 0x6, MBOX
RD_CAS_RANK2_BANK7, 0xB2, 0x7, MBOX
RD_CAS_RANK2_BANK8, 0xB2, 0x8, MBOX
RD_CAS_RANK2_BANK9, 0xB2, 0x9, MBOX
RD_CAS_RANK2_BANK10, 0xB2, 0xA, MBOX
RD_CAS_RANK2_BANK11, 0xB2, 0xB, MBOX
RD_CAS_RANK2_BANK12, 0xB2, 0xC, MBOX
RD_CAS_RANK2_BANK13, 0xB2, 0xD, MBOX
RD_CAS_RANK2_BANK14, 0xB2, 0xE, MBOX
RD_CAS_RANK2_BANK15, 0xB2, 0xF, MBOX
RD_CAS_RANK2_ALLBANKS, 0xB2, 0x10, MBOX
RD_CAS_RANK2_BANKG0, 0xB2, 0x11, MBOX
RD_CAS_RANK2_BANKG1, 0xB2, 0x12, MBOX
RD_CAS_RANK2_BANKG2, 0xB2, 0x13, MBOX
RD_CAS_RANK2_BANKG3, 0xB2, 0x14, MBOX
RD_CAS_RANK3_BANK0, 0xB3, 0x0, MBOX
RD_CAS_RANK3_BANK1, 0xB3, 0x1, MBOX
RD_CAS_RANK3_BANK2, 0xB3, 0x2, MBOX
RD_CAS_RANK3_BANK3, 0xB3, 0x3, MBOX
RD_CAS_RANK3_BANK4, 0xB3, 0x4, MBOX
RD_CAS_RANK3_BANK5, 0xB3, 0x5, MBOX
RD_CAS_RANK3_BANK6, 0xB3, 0x6, MBOX
RD_CAS_RANK3_BANK7, 0xB3, 0x7, MBOX
RD_CAS_RANK3_BANK8, 0xB3, 0x8, MBOX
RD_CAS_RANK3_BANK9, 0xB3, 0x9, MBOX
RD_CAS_RANK3_BANK10, 0xB3, 0xA, MBOX
RD_CAS_RANK3_BANK11, 0xB3, 0xB, MBOX
RD_CAS_RANK3_BANK12, 0xB3, 0xC, MBOX
RD_CAS_RANK3_BANK13, 0xB3, 0xD, MBOX
RD_CAS_RANK3_BANK14, 0xB3, 0xE, MBOX
RD_CAS_RANK3_BANK15, 0xB3, 0xF, MBOX
RD_CAS_RANK3_ALLBANKS, 0xB3, 0x10, MBOX
RD_CAS_RANK3_BANKG0, 0xB3, 0x11, MBOX
RD_CAS_RANK3_BANKG1, 0xB3, 0x12, MBOX
RD_CAS_RANK3_BANKG2, 0xB3, 0x13, MBOX
RD_CAS_RANK3_BANKG3, 0xB3, 0x14, MBOX
RD_CAS_RANK4_BANK0, 0xB4, 0x0, MBOX
RD_CAS_RANK4_BANK1, 0xB4, 0x1, MBOX
RD_CAS_RANK4_BANK2, 0xB4, 0x2, MBOX
RD_CAS_RANK4_BANK3, 0xB4, 0x3, MBOX
RD_CAS_RANK4_BANK4, 0xB4, 0x4, MBOX
RD_CAS_RANK4_BANK5, 0xB4, 0x5, MBOX
RD_CAS_RANK4_BANK6, 0xB4, 0x6, MBOX
RD_CAS_RANK4_BANK7, 0xB4, 0x7, MBOX
RD_CAS_RANK4_BANK8, 0xB4, 0x8, MBOX
RD_CAS_RANK4_BANK9, 0xB4, 0x9, MBOX
RD_CAS_RANK4_BANK10, 0xB4, 0xA, MBOX
RD_CAS_RANK4_BANK11, 0xB4, 0xB, MBOX
RD_CAS_RANK4_BANK12, 0xB4, 0xC, MBOX
RD_CAS_RANK4_BANK13, 0xB4, 0xD, MBOX
RD_CAS_RANK4_BANK14, 0xB4, 0xE, MBOX
RD_CAS_RANK4_BANK15, 0xB4, 0xF, MBOX
RD_CAS_RANK4_ALLBANKS, 0xB4, 0x10, MBOX
RD_CAS_RANK4_BANKG0, 0xB4, 0x11, MBOX
RD_CAS_RANK4_BANKG1, 0xB4, 0x12, MBOX
RD_CAS_RANK4_BANKG2, 0xB4, 0x13, MBOX
RD_CAS_RANK4_BANKG3, 0xB4, 0x14, MBOX
RD_CAS_RANK5_BANK0, 0xB5, 0x0, MBOX
RD_CAS_RANK5_BANK1, 0xB5, 0x1, MBOX
RD_CAS_RANK5_BANK2, 0xB5, 0x2, MBOX
RD_CAS_RANK5_BANK3, 0xB5, 0x3, MBOX
RD_CAS_RANK5_BANK4, 0xB5, 0x4, MBOX
RD_CAS_RANK5_BANK5, 0xB5, 0x5, MBOX
RD_CAS_RANK5_BANK6, 0xB5, 0x6, MBOX
RD_CAS_RANK5_BANK7, 0xB5, 0x7, MBOX
RD_CAS_RANK5_BANK8, 0xB5, 0x8, MBOX
RD_CAS_RANK5_BANK9, 0xB5, 0x9, MBOX
RD_CAS_RANK5_BANK10, 0xB5, 0xA, MBOX
RD_CAS_RANK5_BANK11, 0xB5, 0xB, MBOX
RD_CAS_RANK5_BANK12, 0xB5, 0xC, MBOX
RD_CAS_RANK5_BANK13, 0xB5, 0xD, MBOX
RD_CAS_RANK5_BANK14, 0xB5, 0xE, MBOX
RD_CAS_RANK5_BANK15, 0xB5, 0xF, MBOX
RD_CAS_RANK5_ALLBANKS, 0xB5, 0x10, MBOX
RD_CAS_RANK5_BANKG0, 0xB5, 0x11, MBOX
RD_CAS_RANK5_BANKG1, 0xB5, 0x12, MBOX
RD_CAS_RANK5_BANKG2, 0xB5, 0x13, MBOX
RD_CAS_RANK5_BANKG3, 0xB5, 0x14, MBOX
RD_CAS_RANK6_BANK0, 0xB6, 0x0, MBOX
RD_CAS_RANK6_BANK1, 0xB6, 0x1, MBOX
RD_CAS_RANK6_BANK2, 0xB6, 0x2, MBOX
RD_CAS_RANK6_BANK3, 0xB6, 0x3, MBOX
RD_CAS_RANK6_BANK4, 0xB6, 0x4, MBOX
RD_CAS_RANK6_BANK5, 0xB6, 0x5, MBOX
RD_CAS_RANK6_BANK6, 0xB6, 0x6, MBOX
RD_CAS_RANK6_BANK7, 0xB6, 0x7, MBOX
RD_CAS_RANK6_BANK8, 0xB6, 0x8, MBOX
RD_CAS_RANK6_BANK9, 0xB6, 0x9, MBOX
RD_CAS_RANK6_BANK10, 0xB6, 0xA, MBOX
RD_CAS_RANK6_BANK11, 0xB6, 0xB, MBOX
RD_CAS_RANK6_BANK12, 0xB6, 0xC, MBOX
RD_CAS_RANK6_BANK13, 0xB6, 0xD, MBOX
RD_CAS_RANK6_BANK14, 0xB6, 0xE, MBOX
RD_CAS_RANK6_BANK15, 0xB6, 0xF, MBOX
RD_CAS_RANK6_ALLBANKS, 0xB6, 0x10, MBOX
RD_CAS_RANK6_BANKG0, 0xB6, 0x11, MBOX
RD_CAS_RANK6_BANKG1, 0xB6, 0x12, MBOX
RD_CAS_RANK6_BANKG2, 0xB6, 0x13, MBOX
RD_CAS_RANK6_BANKG3, 0xB6, 0x14, MBOX
RD_CAS_RANK7_BANK0, 0xB7, 0x0, MBOX
RD_CAS_RANK7_BANK1, 0xB7, 0x1, MBOX
RD_CAS_RANK7_BANK2, 0xB7, 0x2, MBOX
RD_CAS_RANK7_BANK3, 0xB7, 0x3, MBOX
RD_CAS_RANK7_BANK4, 0xB7, 0x4, MBOX
RD_CAS_RANK7_BANK5, 0xB7, 0x5, MBOX
RD_CAS_RANK7_BANK6, 0xB7, 0x6, MBOX
RD_CAS_RANK7_BANK7, 0xB7, 0x7, MBOX
RD_CAS_RANK7_BANK8, 0xB7, 0x8, MBOX
RD_CAS_RANK7_BANK9, 0xB7, 0x9, MBOX
RD_CAS_RANK7_BANK10, 0xB7, 0xA, MBOX
RD_CAS_RANK7_BANK11, 0xB7, 0xB, MBOX
RD_CAS_RANK7_BANK12, 0xB7, 0xC, MBOX
RD_CAS_RANK7_BANK13, 0xB7, 0xD, MBOX
RD_CAS_RANK7_BANK14, 0xB7, 0xE, MBOX
RD_CAS_RANK7_BANK15, 0xB7, 0xF, MBOX
RD_CAS_RANK7_ALLBANKS, 0xB7, 0x10, MBOX
RD_CAS_RANK7_BANKG0, 0xB7, 0x11, MBOX
RD_CAS_RANK7_BANKG1, 0xB7, 0x12, MBOX
RD_CAS_RANK7_BANKG2, 0xB7, 0x13, MBOX
RD_CAS_RANK7_BANKG3, 0xB7, 0x14, MBOX
RPQ_CYCLES_NE, 0x11, 0x0, MBOX
RPQ_INSERTS, 0x10, 0x0, MBOX
RPQ_CYCLES_FULL, 0x12, 0x0, MBOX
VMSE_MXB_WR_OCCUPANCY, 0x91, 0x0, MBOX
VMSE_WR_PUSH_WMM, 0x90, 0x1, MBOX
VMSE_WR_PUSH_RMM, 0x90, 0x2, MBOX
WMM_TO_RMM_LOW_THRESH, 0xC0, 0x1, MBOX
WMM_TO_RMM_STARVE, 0xC0, 0x2, MBOX
WMM_TO_RMM_VMSE_RETRY, 0xC0, 0x4, MBOX
WPQ_INSERTS, 0x20, 0x0, MBOX
WPQ_CYCLES_FULL, 0x22, 0x0, MBOX
WPQ_CYCLES_NE, 0x21, 0x0, MBOX
WPQ_READ_HIT, 0x23, 0x0, MBOX
WPQ_WRITE_HIT, 0x24, 0x0, MBOX
WRONG_MM, 0xC1, 0x0, MBOX
WR_CAS_RANK0_BANK0, 0xB8, 0x0, MBOX
WR_CAS_RANK0_BANK1, 0xB8, 0x1, MBOX
WR_CAS_RANK0_BANK2, 0xB8, 0x2, MBOX
WR_CAS_RANK0_BANK3, 0xB8, 0x3, MBOX
WR_CAS_RANK0_BANK4, 0xB8, 0x4, MBOX
WR_CAS_RANK0_BANK5, 0xB8, 0x5, MBOX
WR_CAS_RANK0_BANK6, 0xB8, 0x6, MBOX
WR_CAS_RANK0_BANK7, 0xB8, 0x7, MBOX
WR_CAS_RANK0_BANK8, 0xB8, 0x8, MBOX
WR_CAS_RANK0_BANK9, 0xB8, 0x9, MBOX
WR_CAS_RANK0_BANK10, 0xB8, 0xA, MBOX
WR_CAS_RANK0_BANK11, 0xB8, 0xB, MBOX
WR_CAS_RANK0_BANK12, 0xB8, 0xC, MBOX
WR_CAS_RANK0_BANK13, 0xB8, 0xD, MBOX
WR_CAS_RANK0_BANK14, 0xB8, 0xE, MBOX
WR_CAS_RANK0_BANK15, 0xB8, 0xF, MBOX
WR_CAS_RANK0_ALLBANKS, 0xB8, 0x10, MBOX
WR_CAS_RANK0_BANKG0, 0xB8, 0x11, MBOX
WR_CAS_RANK0_BANKG1, 0xB8, 0x12, MBOX
WR_CAS_RANK0_BANKG2, 0xB8, 0x13, MBOX
WR_CAS_RANK0_BANKG3, 0xB8, 0x14, MBOX
WR_CAS_RANK1_BANK0, 0xB9, 0x0, MBOX
WR_CAS_RANK1_BANK1, 0xB9, 0x1, MBOX
WR_CAS_RANK1_BANK2, 0xB9, 0x2, MBOX
WR_CAS_RANK1_BANK3, 0xB9, 0x3, MBOX
WR_CAS_RANK1_BANK4, 0xB9, 0x4, MBOX
WR_CAS_RANK1_BANK5, 0xB9, 0x5, MBOX
WR_CAS_RANK1_BANK6, 0xB9, 0x6, MBOX
WR_CAS_RANK1_BANK7, 0xB9, 0x7, MBOX
WR_CAS_RANK1_BANK8, 0xB9, 0x8, MBOX
WR_CAS_RANK1_BANK9, 0xB9, 0x9, MBOX
WR_CAS_RANK1_BANK10, 0xB9, 0xA, MBOX
WR_CAS_RANK1_BANK11, 0xB9, 0xB, MBOX
WR_CAS_RANK1_BANK12, 0xB9, 0xC, MBOX
WR_CAS_RANK1_BANK13, 0xB9, 0xD, MBOX
WR_CAS_RANK1_BANK14, 0xB9, 0xE, MBOX
WR_CAS_RANK1_BANK15, 0xB9, 0xF, MBOX
WR_CAS_RANK1_ALLBANKS, 0xB9, 0x10, MBOX
WR_CAS_RANK1_BANKG0, 0xB9, 0x11, MBOX
WR_CAS_RANK1_BANKG1, 0xB9, 0x12, MBOX
WR_CAS_RANK1_BANKG2, 0xB9, 0x13, MBOX
WR_CAS_RANK1_BANKG3, 0xB9, 0x14, MBOX
WR_CAS_RANK2_BANK0, 0xBA, 0x0, MBOX
WR_CAS_RANK2_BANK1, 0xBA, 0x1, MBOX
WR_CAS_RANK2_BANK2, 0xBA, 0x2, MBOX
WR_CAS_RANK2_BANK3, 0xBA, 0x3, MBOX
WR_CAS_RANK2_BANK4, 0xBA, 0x4, MBOX
WR_CAS_RANK2_BANK5, 0xBA, 0x5, MBOX
WR_CAS_RANK2_BANK6, 0xBA, 0x6, MBOX
WR_CAS_RANK2_BANK7, 0xBA, 0x7, MBOX
WR_CAS_RANK2_BANK8, 0xBA, 0x8, MBOX
WR_CAS_RANK2_BANK9, 0xBA, 0x9, MBOX
WR_CAS_RANK2_BANK10, 0xBA, 0xA, MBOX
WR_CAS_RANK2_BANK11, 0xBA, 0xB, MBOX
WR_CAS_RANK2_BANK12, 0xBA, 0xC, MBOX
WR_CAS_RANK2_BANK13, 0xBA, 0xD, MBOX
WR_CAS_RANK2_BANK14, 0xBA, 0xE, MBOX
WR_CAS_RANK2_BANK15, 0xBA, 0xF, MBOX
WR_CAS_RANK2_ALLBANKS, 0xBA, 0x10, MBOX
WR_CAS_RANK2_BANKG0, 0xBA, 0x11, MBOX
WR_CAS_RANK2_BANKG1, 0xBA, 0x12, MBOX
WR_CAS_RANK2_BANKG2, 0xBA, 0x13, MBOX
WR_CAS_RANK2_BANKG3, 0xBA, 0x14, MBOX
WR_CAS_RANK3_BANK0, 0xBB, 0x0, MBOX
WR_CAS_RANK3_BANK1, 0xBB, 0x1, MBOX
WR_CAS_RANK3_BANK2, 0xBB, 0x2, MBOX
WR_CAS_RANK3_BANK3, 0xBB, 0x3, MBOX
WR_CAS_RANK3_BANK4, 0xBB, 0x4, MBOX
WR_CAS_RANK3_BANK5, 0xBB, 0x5, MBOX
WR_CAS_RANK3_BANK6, 0xBB, 0x6, MBOX
WR_CAS_RANK3_BANK7, 0xBB, 0x7, MBOX
WR_CAS_RANK3_BANK8, 0xBB, 0x8, MBOX
WR_CAS_RANK3_BANK9, 0xBB, 0x9, MBOX
WR_CAS_RANK3_BANK10, 0xBB, 0xA, MBOX
WR_CAS_RANK3_BANK11, 0xBB, 0xB, MBOX
WR_CAS_RANK3_BANK12, 0xBB, 0xC, MBOX
WR_CAS_RANK3_BANK13, 0xBB, 0xD, MBOX
WR_CAS_RANK3_BANK14, 0xBB, 0xE, MBOX
WR_CAS_RANK3_BANK15, 0xBB, 0xF, MBOX
WR_CAS_RANK3_ALLBANKS, 0xBB, 0x10, MBOX
WR_CAS_RANK3_BANKG0, 0xBB, 0x11, MBOX
WR_CAS_RANK3_BANKG1, 0xBB, 0x12, MBOX
WR_CAS_RANK3_BANKG2, 0xBB, 0x13, MBOX
WR_CAS_RANK3_BANKG3, 0xBB, 0x14, MBOX
WR_CAS_RANK4_BANK0, 0xBC, 0x0, MBOX
WR_CAS_RANK4_BANK1, 0xBC, 0x1, MBOX
WR_CAS_RANK4_BANK2, 0xBC, 0x2, MBOX
WR_CAS_RANK4_BANK3, 0xBC, 0x3, MBOX
WR_CAS_RANK4_BANK4, 0xBC, 0x4, MBOX
WR_CAS_RANK4_BANK5, 0xBC, 0x5, MBOX
WR_CAS_RANK4_BANK6, 0xBC, 0x6, MBOX
WR_CAS_RANK4_BANK7, 0xBC, 0x7, MBOX
WR_CAS_RANK4_BANK8, 0xBC, 0x8, MBOX
WR_CAS_RANK4_BANK9, 0xBC, 0x9, MBOX
WR_CAS_RANK4_BANK10, 0xBC, 0xA, MBOX
WR_CAS_RANK4_BANK11, 0xBC, 0xB, MBOX
WR_CAS_RANK4_BANK12, 0xBC, 0xC, MBOX
WR_CAS_RANK4_BANK13, 0xBC, 0xD, MBOX
WR_CAS_RANK4_BANK14, 0xBC, 0xE, MBOX
WR_CAS_RANK4_BANK15, 0xBC, 0xF, MBOX
WR_CAS_RANK4_ALLBANKS, 0xBC, 0x10, MBOX
WR_CAS_RANK4_BANKG0, 0xBC, 0x11, MBOX
WR_CAS_RANK4_BANKG1, 0xBC, 0x12, MBOX
WR_CAS_RANK4_BANKG2, 0xBC, 0x13, MBOX
WR_CAS_RANK4_BANKG3, 0xBC, 0x14, MBOX
WR_CAS_RANK5_BANK0, 0xBD, 0x0, MBOX
WR_CAS_RANK5_BANK1, 0xBD, 0x1, MBOX
WR_CAS_RANK5_BANK2, 0xBD, 0x2, MBOX
WR_CAS_RANK5_BANK3, 0xBD, 0x3, MBOX
WR_CAS_RANK5_BANK4, 0xBD, 0x4, MBOX
WR_CAS_RANK5_BANK5, 0xBD, 0x5, MBOX
WR_CAS_RANK5_BANK6, 0xBD, 0x6, MBOX
WR_CAS_RANK5_BANK7, 0xBD, 0x7, MBOX
WR_CAS_RANK5_BANK8, 0xBD, 0x8, MBOX
WR_CAS_RANK5_BANK9, 0xBD, 0x9, MBOX
WR_CAS_RANK5_BANK10, 0xBD, 0xA, MBOX
WR_CAS_RANK5_BANK11, 0xBD, 0xB, MBOX
WR_CAS_RANK5_BANK12, 0xBD, 0xC, MBOX
WR_CAS_RANK5_BANK13, 0xBD, 0xD, MBOX
WR_CAS_RANK5_BANK14, 0xBD, 0xE, MBOX
WR_CAS_RANK5_BANK15, 0xBD, 0xF, MBOX
WR_CAS_RANK5_ALLBANKS, 0xBD, 0x10, MBOX
WR_CAS_RANK5_BANKG0, 0xBD, 0x11, MBOX
WR_CAS_RANK5_BANKG1, 0xBD, 0x12, MBOX
WR_CAS_RANK5_BANKG2, 0xBD, 0x13, MBOX
WR_CAS_RANK5_BANKG3, 0xBD, 0x14, MBOX
WR_CAS_RANK6_BANK0, 0xBE, 0x0, MBOX
WR_CAS_RANK6_BANK1, 0xBE, 0x1, MBOX
WR_CAS_RANK6_BANK2, 0xBE, 0x2, MBOX
WR_CAS_RANK6_BANK3, 0xBE, 0x3, MBOX
WR_CAS_RANK6_BANK4, 0xBE, 0x4, MBOX
WR_CAS_RANK6_BANK5, 0xBE, 0x5, MBOX
WR_CAS_RANK6_BANK6, 0xBE, 0x6, MBOX
WR_CAS_RANK6_BANK7, 0xBE, 0x7, MBOX
WR_CAS_RANK6_BANK8, 0xBE, 0x8, MBOX
WR_CAS_RANK6_BANK9, 0xBE, 0x9, MBOX
WR_CAS_RANK6_BANK10, 0xBE, 0xA, MBOX
WR_CAS_RANK6_BANK11, 0xBE, 0xB, MBOX
WR_CAS_RANK6_BANK12, 0xBE, 0xC, MBOX
WR_CAS_RANK6_BANK13, 0xBE, 0xD, MBOX
WR_CAS_RANK6_BANK14, 0xBE, 0xE, MBOX
WR_CAS_RANK6_BANK15, 0xBE, 0xF, MBOX
WR_CAS_RANK6_ALLBANKS, 0xBE, 0x10, MBOX
WR_CAS_RANK6_BANKG0, 0xBE, 0x11, MBOX
WR_CAS_RANK6_BANKG1, 0xBE, 0x12, MBOX
WR_CAS_RANK6_BANKG2, 0xBE, 0x13, MBOX
WR_CAS_RANK6_BANKG3, 0xBE, 0x14, MBOX
WR_CAS_RANK7_BANK0, 0xBF, 0x0, MBOX
WR_CAS_RANK7_BANK1, 0xBF, 0x1, MBOX
WR_CAS_RANK7_BANK2, 0xBF, 0x2, MBOX
WR_CAS_RANK7_BANK3, 0xBF, 0x3, MBOX
WR_CAS_RANK7_BANK4, 0xBF, 0x4, MBOX
WR_CAS_RANK7_BANK5, 0xBF, 0x5, MBOX
WR_CAS_RANK7_BANK6, 0xBF, 0x6, MBOX
WR_CAS_RANK7_BANK7, 0xBF, 0x7, MBOX
WR_CAS_RANK7_BANK8, 0xBF, 0x8, MBOX
WR_CAS_RANK7_BANK9, 0xBF, 0x9, MBOX
WR_CAS_RANK7_BANK10, 0xBF, 0xA, MBOX
WR_CAS_RANK7_BANK11, 0xBF, 0xB, MBOX
WR_CAS_RANK7_BANK12, 0xBF, 0xC, MBOX
WR_CAS_RANK7_BANK13, 0xBF, 0xD, MBOX
WR_CAS_RANK7_BANK14, 0xBF, 0xE, MBOX
WR_CAS_RANK7_BANK15, 0xBF, 0xF, MBOX
WR_CAS_RANK7_ALLBANKS, 0xBF, 0x10, MBOX
WR_CAS_RANK7_BANKG0, 0xBF, 0x11, MBOX
WR_CAS_RANK7_BANKG1, 0xBF, 0x12, MBOX
WR_CAS_RANK7_BANKG2, 0xBF, 0x13, MBOX
WR_CAS_RANK7_BANKG3, 0xBF, 0x14, MBOX
PBOX_CLOCKTICKS, 0x1, 0x0, PBOX
IIO_CREDIT_PRQ_QPI0, 0x2D, 0x1, PBOX0|PBOX1
IIO_CREDIT_PRQ_QPI1, 0x2D, 0x2, PBOX0|PBOX1
IIO_CREDIT_ISOCH_QPI0, 0x2D, 0x4, PBOX0|PBOX1
IIO_CREDIT_ISOCH_QPI1, 0x2D, 0x8, PBOX0|PBOX1
RING_AD_USED_CW_EVEN, 0x7, 0x1, PBOX
RING_AD_USED_CW_ODD, 0x7, 0x2, PBOX
RING_AD_USED_CW, 0x7, 0x3, PBOX
RING_AD_USED_CCW_EVEN, 0x7, 0x4, PBOX
RING_AD_USED_CCW_ODD, 0x7, 0x8, PBOX
RING_AD_USED_CCW, 0x7, 0xC, PBOX
RING_AK_BOUNCES_UP, 0x12, 0x1, PBOX
RING_AK_BOUNCES_DN, 0x12, 0x2, PBOX
RING_AK_USED_CW_EVEN, 0x8, 0x1, PBOX
RING_AK_USED_CW_ODD, 0x8, 0x2, PBOX
RING_AK_USED_CW, 0x8, 0x3, PBOX
RING_AK_USED_CCW_EVEN, 0x8, 0x4, PBOX
RING_AK_USED_CCW_ODD, 0x8, 0x8, PBOX
RING_AK_USED_CCW, 0x8, 0xC, PBOX
RING_BL_USED_CW_EVEN, 0x9, 0x1, PBOX
RING_BL_USED_CW_ODD, 0x9, 0x2, PBOX
RING_BL_USED_CW, 0x9, 0x3, PBOX
RING_BL_USED_CCW_EVEN, 0x9, 0x4, PBOX
RING_BL_USED_CCW_ODD, 0x9, 0x8, PBOX
RING_BL_USED_CCW, 0x9, 0xC, PBOX
RING_IV_USED_CW, 0xA, 0x3, PBOX
RING_IV_USED_CCW, 0xA, 0xC, PBOX
RING_IV_USED_ANY, 0xA, 0xF, PBOX
RXR_CYCLES_NE_NCB, 0x10, 0x10, PBOX0|PBOX1
RXR_CYCLES_NE_NCS, 0x10, 0x20, PBOX0|PBOX1
RXR_INSERTS_NCB, 0x11, 0x10, PBOX0|PBOX1
RXR_INSERTS_NCS, 0x11, 0x20, PBOX0|PBOX1
RXR_OCCUPANCY_DRS, 0x13, 0x8, PBOX0
TXR_CYCLES_FULL_AD, 0x25, 0x1, PBOX0
TXR_CYCLES_FULL_AK, 0x25, 0x2, PBOX0
TXR_CYCLES_FULL_BL, 0x25, 0x4, PBOX0
TXR_CYCLES_NE_AD, 0x23, 0x1, PBOX0
TXR_CYCLES_NE_AK, 0x23, 0x2, PBOX0
TXR_CYCLES_NE_BL, 0x23, 0x4, PBOX0
TXR_NACK_CW_DN_AD, 0x26, 0x1, PBOX0|PBOX1
TXR_NACK_CW_DN_BL, 0x26, 0x2, PBOX0|PBOX1
TXR_NACK_CW_DN_AK, 0x26, 0x4, PBOX0|PBOX1
TXR_NACK_CW_UP_AD, 0x26, 0x8, PBOX0|PBOX1
TXR_NACK_CW_UP_BL, 0x26, 0x10, PBOX0|PBOX1
TXR_NACK_CW_UP_AK, 0x26, 0x20, PBOX0|PBOX1
SBO0_CREDITS_ACQUIRED_AD, 0x28, 0x1, PBOX0|PBOX1
SBO0_CREDITS_ACQUIRED_BL, 0x28, 0x2, PBOX0|PBOX1
STALL_NO_SBO_CREDIT_SBO0_AD, 0x2C, 0x1, PBOX0|PBOX1
STALL_NO_SBO_CREDIT_SBO1_AD, 0x2C, 0x2, PBOX0|PBOX1
STALL_NO_SBO_CREDIT_SBO0_BL, 0x2C, 0x4, PBOX0|PBOX1
STALL_NO_SBO_CREDIT_SBO0_BL, 0x2C, 0x8, PBOX0|PBOX1
CACHE_TOTAL_OCCUPANCY_ANY, 0x12, 0x1, IBOX
CACHE_TOTAL_OCCUPANCY_SOURCE, 0x12, 0x2, IBOX
COHERENT_OPS_PCIRDCUR, 0x13, 0x1, IBOX
COHERENT_OPS_CRD, 0x13, 0x2, IBOX
COHERENT_OPS_DRD, 0x13, 0x4, IBOX
COHERENT_OPS_RFO, 0x13, 0x8, IBOX
COHERENT_OPS_PCITOM, 0x13, 0x10, IBOX
COHERENT_OPS_PCIDCAHINT, 0x13, 0x20, IBOX
COHERENT_OPS_WBMTOI, 0x13, 0x40, IBOX
COHERENT_OPS_CLFLUSH, 0x13, 0x80, IBOX
MISC0_FAST_REQ, 0x14, 0x1, IBOX
MISC0_FAST_REJ, 0x14, 0x2, IBOX
MISC0_2ND_RD_INSERT, 0x14, 0x4, IBOX
MISC0_2ND_WR_INSERT, 0x14, 0x8, IBOX
MISC0_2ND_ATOMIC_INSERT, 0x14, 0x10, IBOX
MISC0_FAST_XFER, 0x14, 0x20, IBOX
MISC0_PF_ACK_HINT, 0x14, 0x40, IBOX
MISC0_PF_TIMEOUT, 0x14, 0x80, IBOX
MISC1_SLOW_I, 0x15, 0x1, IBOX
MISC1_SLOW_S, 0x15, 0x2, IBOX
MISC1_SLOW_E, 0x15, 0x4, IBOX
MISC1_SLOW_M, 0x15, 0x8, IBOX
MISC1_LOST_FWD, 0x15, 0x10, IBOX
MISC1_SEC_RCVD_INVLD, 0x15, 0x20, IBOX
MISC1_SEC_RCVD_VLD, 0x15, 0x40, IBOX
MISC1_DATA_THROTTLE, 0x15, 0x80, IBOX
SNOOP_RESP_MISS, 0x17, 0x1, IBOX
SNOOP_RESP_HIT_I, 0x17, 0x2, IBOX
SNOOP_RESP_HIT_ES, 0x17, 0x4, IBOX
SNOOP_RESP_HIT_M, 0x17, 0x8, IBOX
SNOOP_RESP_SNPCODE, 0x17, 0x10, IBOX
SNOOP_RESP_SNPDATA, 0x17, 0x20, IBOX
SNOOP_RESP_SNPINV, 0x17, 0x40, IBOX
TRANSACTIONS_READS, 0x16, 0x1, IBOX
TRANSACTIONS_WRITES, 0x16, 0x2, IBOX
TRANSACTIONS_RD_PREF, 0x16, 0x4, IBOX
TRANSACTIONS_WR_PREF, 0x16, 0x8, IBOX
TRANSACTIONS_ALL_READS, 0x16, 0x5, IBOX
TRANSACTIONS_ALL_WRITES, 0x16, 0xA, IBOX
TRANSACTIONS_ATOMIC, 0x16, 0x10, IBOX
TRANSACTIONS_OTHER, 0x16, 0x20, IBOX
TRANSACTIONS_ORDERINGQ, 0x16, 0x40, IBOX
RXR_AK_INSERTS, 0xA, 0x0, IBOX
RXR_BL_DRS_CYCLES_FULL, 0x4, 0x0, IBOX
RXR_BL_DRS_INSERTS, 0x1, 0x0, IBOX
RXR_BL_DRS_OCCUPANCY, 0x7, 0x0, IBOX
RXR_BL_NCB_CYCLES_FULL, 0x5, 0x0, IBOX
RXR_BL_NCB_INSERTS, 0x2, 0x0, IBOX
RXR_BL_NCB_OCCUPANCY, 0x8, 0x0, IBOX
RXR_BL_NCS_CYCLES_FULL, 0x6, 0x0, IBOX
RXR_BL_NCS_INSERTS, 0x3, 0x0, IBOX
RXR_BL_NCS_OCCUPANCY, 0x9, 0x0, IBOX
TXR_AD_STALL_CREDIT_CYCLES, 0x18, 0x0, IBOX
TXR_BL_STALL_CREDIT_CYCLES, 0x19, 0x0, IBOX
TXR_DATA_INSERTS_NCB, 0xE, 0x0, IBOX
TXR_DATA_INSERTS_NCS, 0xF, 0x0, IBOX
TXR_REQUEST_OCCUPANCY, 0xD, 0x0, IBOX
RBOX_CLOCKTICK, 0x1, 0x0, RBOX
C_HI_AD_CREDITS_EMPTY_CBO8, 0x1F, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_HI_AD_CREDITS_EMPTY_CBO9, 0x1F, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_HI_AD_CREDITS_EMPTY_CBO10, 0x1F, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_HI_AD_CREDITS_EMPTY_CBO11, 0x1F, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_HI_AD_CREDITS_EMPTY_CBO12, 0x1F, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_HI_AD_CREDITS_EMPTY_CBO13, 0x1F, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_HI_AD_CREDITS_EMPTY_CBO14_16, 0x1F, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_HI_AD_CREDITS_EMPTY_CBO15_17, 0x1F, 0x80, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_LO_AD_CREDITS_EMPTY_CBO0, 0x22, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_LO_AD_CREDITS_EMPTY_CBO1, 0x22, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_LO_AD_CREDITS_EMPTY_CBO2, 0x22, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_LO_AD_CREDITS_EMPTY_CBO3, 0x22, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_LO_AD_CREDITS_EMPTY_CBO4, 0x22, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_LO_AD_CREDITS_EMPTY_CBO5, 0x22, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_LO_AD_CREDITS_EMPTY_CBO6, 0x22, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
C_LO_AD_CREDITS_EMPTY_CBO7, 0x22, 0x80, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
HA_R2_BL_CREDITS_EMPTY_HA0, 0x2D, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
HA_R2_BL_CREDITS_EMPTY_HA1, 0x2D, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
HA_R2_BL_CREDITS_EMPTY_R2_NCB, 0x2D, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
HA_R2_BL_CREDITS_EMPTY_R2_NCS, 0x2D, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI0_AD_CREDITS_EMPTY_VNA, 0x20, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI0_AD_CREDITS_EMPTY_VN0_HOM, 0x20, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI0_AD_CREDITS_EMPTY_VN0_SNP, 0x20, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI0_AD_CREDITS_EMPTY_VN0_NDR, 0x20, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI0_AD_CREDITS_EMPTY_VN1_HOM, 0x20, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI0_AD_CREDITS_EMPTY_VN1_SNP, 0x20, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI0_AD_CREDITS_EMPTY_VN1_NDR, 0x20, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI1_AD_CREDITS_EMPTY_VNA, 0x2E, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI1_AD_CREDITS_EMPTY_VN1_HOM, 0x2E, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI1_AD_CREDITS_EMPTY_VN1_SNP, 0x2E, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI1_AD_CREDITS_EMPTY_VN1_NDR, 0x2E, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI0_BL_CREDITS_EMPTY_VNA, 0x21, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI0_BL_CREDITS_EMPTY_VN1_HOM, 0x21, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI0_BL_CREDITS_EMPTY_VN1_SNP, 0x21, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI0_BL_CREDITS_EMPTY_VN1_NDR, 0x21, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI1_BL_CREDITS_EMPTY_VNA, 0x2F, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI1_BL_CREDITS_EMPTY_VN0_HOM, 0x2F, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI1_BL_CREDITS_EMPTY_VN0_SNP, 0x2F, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI1_BL_CREDITS_EMPTY_VN0_NDR, 0x2F, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI1_BL_CREDITS_EMPTY_VN1_HOM, 0x2F, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI1_BL_CREDITS_EMPTY_VN1_SNP, 0x2F, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
QPI1_BL_CREDITS_EMPTY_VN1_NDR, 0x2F, 0x40, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RING_AD_USED_CW_EVEN, 0x7, 0x1, RBOX
RING_AD_USED_CW_ODD, 0x7, 0x2, RBOX
RING_AD_USED_CW, 0x7, 0x3, RBOX
RING_AD_USED_CCW_EVEN, 0x7, 0x4, RBOX
RING_AD_USED_CCW_ODD, 0x7, 0x8, RBOX
RING_AD_USED_CCW, 0x7, 0xC, RBOX
RING_AK_USED_CW_EVEN, 0x8, 0x1, RBOX
RING_AK_USED_CW_ODD, 0x8, 0x2, RBOX
RING_AK_USED_CW, 0x8, 0x3, RBOX
RING_AK_USED_CCW_EVEN, 0x8, 0x4, RBOX
RING_AK_USED_CCW_ODD, 0x8, 0x8, RBOX
RING_AK_USED_CCW, 0x8, 0xC, RBOX
RING_BL_USED_CW_EVEN, 0x9, 0x1, RBOX
RING_BL_USED_CW_ODD, 0x9, 0x2, RBOX
RING_BL_USED_CW, 0x9, 0x3, RBOX
RING_BL_USED_CCW_EVEN, 0x9, 0x4, RBOX
RING_BL_USED_CCW_ODD, 0x9, 0x8, RBOX
RING_BL_USED_CCW, 0x9, 0xC, RBOX
RING_IV_USED_CW, 0xA, 0x3, RBOX
RING_IV_USED_CCW, 0xA, 0xC, RBOX
RING_IV_USED_ANY, 0xA, 0xF, RBOX
RING_SINK_STARVED_AK, 0xE, 0x2, RBOX
RXR_CYCLES_NE_HOM, 0x10, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_CYCLES_NE_SNP, 0x10, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_CYCLES_NE_NDR, 0x10, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_CYCLES_NE_VN1_HOM, 0x14, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_CYCLES_NE_VN1_SNP, 0x14, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_CYCLES_NE_VN1_NDR, 0x14, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_CYCLES_NE_VN1_DRS, 0x14, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_CYCLES_NE_VN1_NCB, 0x14, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_CYCLES_NE_VN1_NCS, 0x14, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_HOM, 0x11, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_SNP, 0x11, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_NDR, 0x11, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_DRS, 0x11, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_NCB, 0x11, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_NCS, 0x11, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_VN1_HOM, 0x15, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_VN1_SNP, 0x15, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_VN1_NDR, 0x15, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_VN1_DRS, 0x15, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_VN1_NCB, 0x15, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_INSERTS_VN1_NCS, 0x15, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
RXR_OCCUPANCY_VN1_HOM, 0x13, 0x1, RBOX0C0|RBOX1C0
RXR_OCCUPANCY_VN1_SNP, 0x13, 0x2, RBOX0C0|RBOX1C0
RXR_OCCUPANCY_VN1_NDR, 0x13, 0x4, RBOX0C0|RBOX1C0
RXR_OCCUPANCY_VN1_DRS, 0x13, 0x8, RBOX0C0|RBOX1C0
RXR_OCCUPANCY_VN1_NCB, 0x13, 0x10, RBOX0C0|RBOX1C0
RXR_OCCUPANCY_VN1_NCS, 0x13, 0x20, RBOX0C0|RBOX1C0
TXR_CYCLES_FULL, 0x25, 0x0, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
TXR_CYCLES_NE, 0x23, 0x0, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
TXR_NACK_DN_AD, 0x26, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
TXR_NACK_DN_BL, 0x26, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
TXR_NACK_DN_AK, 0x26, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
TXR_NACK_UP_AD, 0x26, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
TXR_NACK_UP_BL, 0x26, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
TXR_NACK_UP_AK, 0x26, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
SBO0_CREDITS_ACQUIRED_AD, 0x28, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
SBO0_CREDITS_ACQUIRED_BL, 0x28, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
SBO1_CREDITS_ACQUIRED_AD, 0x29, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
SBO1_CREDITS_ACQUIRED_BL, 0x29, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
STALL_NO_SBO_CREDIT_SBO0_AD, 0x2C, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
STALL_NO_SBO_CREDIT_SBO1_AD, 0x2C, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
STALL_NO_SBO_CREDIT_SBO0_BL, 0x2C, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
STALL_NO_SBO_CREDIT_SBO1_BL, 0x2C, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_REJECT_HOM, 0x37, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_REJECT_SNP, 0x37, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_REJECT_NDR, 0x37, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_REJECT_DRS, 0x37, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_REJECT_NCB, 0x37, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_REJECT_NCS, 0x37, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_REJECT_HOM, 0x39, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_REJECT_SNP, 0x39, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_REJECT_NDR, 0x39, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_REJECT_DRS, 0x39, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_REJECT_NCB, 0x39, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_REJECT_NCS, 0x39, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VNA_CREDITS_REJECT_HOM, 0x34, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VNA_CREDITS_REJECT_SNP, 0x34, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VNA_CREDITS_REJECT_NDR, 0x34, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VNA_CREDITS_REJECT_DRS, 0x34, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VNA_CREDITS_REJECT_NCB, 0x34, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VNA_CREDITS_REJECT_NCS, 0x34, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_USED_HOM, 0x36, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_USED_SNP, 0x36, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_USED_NDR, 0x36, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_USED_DRS, 0x36, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_USED_NCB, 0x36, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN0_CREDITS_USED_NCS, 0x36, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_USED_HOM, 0x38, 0x1, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_USED_SNP, 0x38, 0x2, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_USED_NDR, 0x38, 0x4, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_USED_DRS, 0x38, 0x8, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_USED_NCB, 0x38, 0x10, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
VN1_CREDITS_USED_NCS, 0x38, 0x20, RBOX0C0|RBOX0C1|RBOX1C0|RBOX1C1
BOUNCE_CONTROL, 0xA, 0x0, SBOX
SBOX_CLOCKTICKS, 0x0, 0x0, SBOX
FAST_ASSERTED, 0x9, 0x0, SBOX
RING_AD_USED_ANY, 0x1B, 0xF, SBOX
RING_AD_USED_UP_EVEN, 0x1B, 0x1, SBOX
RING_AD_USED_UP_ODD, 0x1B, 0x2, SBOX
RING_AD_USED_UP, 0x1B, 0x3, SBOX
RING_AD_USED_DOWN_EVEN, 0x1B, 0x4, SBOX
RING_AD_USED_DOWN_ODD, 0x1B, 0x8, SBOX
RING_AD_USED_DOWN, 0x1B, 0xC, SBOX
RING_AK_USED_ANY, 0x1C, 0xF, SBOX
RING_AK_USED_UP_EVEN, 0x1C, 0x1, SBOX
RING_AK_USED_UP_ODD, 0x1C, 0x2, SBOX
RING_AK_USED_UP, 0x1C, 0x3, SBOX
RING_AK_USED_DOWN_EVEN, 0x1C, 0x4, SBOX
RING_AK_USED_DOWN_ODD, 0x1C, 0x8, SBOX
RING_AK_USED_DOWN, 0x1C, 0xC, SBOX
RING_BL_USED_ANY, 0x1D, 0xF, SBOX
RING_BL_USED_UP_EVEN, 0x1D, 0x1, SBOX
RING_BL_USED_UP_ODD, 0x1D, 0x2, SBOX
RING_BL_USED_UP, 0x1D, 0x3, SBOX
RING_BL_USED_DOWN_EVEN, 0x1D, 0x4, SBOX
RING_BL_USED_DOWN_ODD, 0x1D, 0x8, SBOX
RING_BL_USED_DOWN, 0x1D, 0xC, SBOX
RING_BOUNCES_AD_CACHE, 0x5, 0x1, SBOX
RING_BOUNCES_AK_CORE, 0x5, 0x2, SBOX
RING_BOUNCES_BL_CORE, 0x5, 0x4, SBOX
RING_BOUNCES_IV_CORE, 0x5, 0x8, SBOX
RING_IV_USED_ANY, 0x1E, 0xF, SBOX
RING_IV_USED_UP, 0x1E, 0x3, SBOX
RING_IV_USED_DOWN, 0x1E, 0xC, SBOX
RXR_BYPASS_AD_CRD, 0x12, 0x1, SBOX
RXR_BYPASS_AD_BNC, 0x12, 0x2, SBOX
RXR_BYPASS_BL_CRD, 0x12, 0x4, SBOX
RXR_BYPASS_BL_BNC, 0x12, 0x8, SBOX
RXR_BYPASS_AK, 0x12, 0x10, SBOX
RXR_BYPASS_IV, 0x12, 0x20, SBOX
RXR_INSERTS_AD_CRD, 0x12, 0x1, SBOX
RXR_INSERTS_AD_BNC, 0x12, 0x2, SBOX
RXR_INSERTS_BL_CRD, 0x12, 0x4, SBOX
RXR_INSERTS_BL_BNC, 0x12, 0x8, SBOX
RXR_INSERTS_AK, 0x12, 0x10, SBOX
RXR_INSERTS_IV, 0x12, 0x20, SBOX
RXR_OCCUPANCY_AD_CRD, 0x11, 0x1, SBOX
RXR_OCCUPANCY_AD_BNC, 0x11, 0x2, SBOX
RXR_OCCUPANCY_BL_CRD, 0x11, 0x4, SBOX
RXR_OCCUPANCY_BL_BNC, 0x11, 0x8, SBOX
RXR_OCCUPANCY_AK, 0x11, 0x10, SBOX
RXR_OCCUPANCY_IV, 0x11, 0x20, SBOX
TXR_ADS_USED_AD, 0x4, 0x1, SBOX
TXR_ADS_USED_AK, 0x4, 0x2, SBOX
TXR_ADS_USED_BL, 0x4, 0x4, SBOX
TXR_INSERTS_AD_CRD, 0x2, 0x1, SBOX
TXR_INSERTS_AD_BNC, 0x2, 0x2, SBOX
TXR_INSERTS_BL_CRD, 0x2, 0x4, SBOX
TXR_INSERTS_BL_BNC, 0x2, 0x8, SBOX
TXR_INSERTS_AK, 0x2, 0x10, SBOX
TXR_INSERTS_IV, 0x2, 0x20, SBOX
TXR_OCCUPANCY_AD_CRD, 0x1, 0x1, SBOX
TXR_OCCUPANCY_AD_BNC, 0x1, 0x2, SBOX
TXR_OCCUPANCY_BL_CRD, 0x1, 0x4, SBOX
TXR_OCCUPANCY_BL_BNC, 0x1, 0x8, SBOX
TXR_OCCUPANCY_AK, 0x1, 0x10, SBOX
TXR_OCCUPANCY_IV, 0x1, 0x20, SBOX
TXR_ORDERING_IV_SNOOPGO_UP, 0x7, 0x1, SBOX
TXR_ORDERING_IV_SNOOPGO_DN, 0x7, 0x2, SBOX
TXR_ORDERING_AK_U2C_UP_EVEN, 0x7, 0x4, SBOX
TXR_ORDERING_AK_U2C_UP_ODD, 0x7, 0x8, SBOX
TXR_ORDERING_AK_U2C_DN_EVEN, 0x7, 0x10, SBOX
TXR_ORDERING_AK_U2C_DN_ODD, 0x7, 0x20, SBOX
QBOX_CLOCKTICKS, 0x14, 0x0, QBOX
CTO_COUNT, 0x38, 0x0, QBOX, MATCH0|MATCH1|MATCH2|MATCH3|MASK0|MASK1|MASK2|MASK3
DIRECT2CORE_SUCCESS_RBT_HIT, 0x13, 0x1, QBOX
DIRECT2CORE_FAILURE_CREDITS, 0x13, 0x2, QBOX
DIRECT2CORE_FAILURE_RBT_HIT, 0x13, 0x4, QBOX
DIRECT2CORE_FAILURE_CREDITS_RBT, 0x13, 0x8, QBOX
DIRECT2CORE_FAILURE_MISS, 0x13, 0x10, QBOX
DIRECT2CORE_FAILURE_CREDITS_MISS, 0x13, 0x20, QBOX
DIRECT2CORE_FAILURE_RBT_MISS, 0x13, 0x40, QBOX
DIRECT2CORE_FAILURE_CREDITS_RBT_MISS, 0x13, 0x80, QBOX
L1_POWER_CYCLES, 0x12, 0x0, QBOX
RXL0P_POWER_CYCLES, 0x10, 0x0, QBOX
RXL0_POWER_CYCLES, 0xF, 0x0, QBOX
RXL_BYPASSED, 0x9, 0x0, QBOX
RXL_CREDITS_CONSUMED_VN0_DRS, 0x1E, 0x1, QBOX
RXL_CREDITS_CONSUMED_VN0_NCB, 0x1E, 0x2, QBOX
RXL_CREDITS_CONSUMED_VN0_NCS, 0x1E, 0x4, QBOX
RXL_CREDITS_CONSUMED_VN0_HOM, 0x1E, 0x8, QBOX
RXL_CREDITS_CONSUMED_VN0_SNP, 0x1E, 0x10, QBOX
RXL_CREDITS_CONSUMED_VN0_NDR, 0x1E, 0x20, QBOX
RXL_CREDITS_CONSUMED_VN1_DRS, 0x39, 0x1, QBOX
RXL_CREDITS_CONSUMED_VN1_NCB, 0x39, 0x2, QBOX
RXL_CREDITS_CONSUMED_VN1_NCS, 0x39, 0x4, QBOX
RXL_CREDITS_CONSUMED_VN1_HOM, 0x39, 0x8, QBOX
RXL_CREDITS_CONSUMED_VN1_SNP, 0x39, 0x10, QBOX
RXL_CREDITS_CONSUMED_VN1_NDR, 0x39, 0x20, QBOX
RXL_CREDITS_CONSUMED_VNA, 0x1D, 0x0, QBOX
RXL_CYCLES_NE, 0xA, 0x0, QBOX
RXL_FLITS_G0_IDLE, 0x1, 0x1, QBOX
RXL_FLITS_G0_DATA, 0x1, 0x2, QBOX
RXL_FLITS_G0_NON_DATA, 0x1, 0x4, QBOX
RXL_FLITS_G1_SNP, 0x2, 0x1, QBOX
RXL_FLITS_G1_HOM_REQ, 0x2, 0x2, QBOX
RXL_FLITS_G1_HOM_NONREQ, 0x2, 0x4, QBOX
RXL_FLITS_G1_HOM, 0x2, 0x6, QBOX
RXL_FLITS_G1_DRS_DATA, 0x2, 0x8, QBOX
RXL_FLITS_G1_DRS_NONDATA, 0x2, 0x10, QBOX
RXL_FLITS_G1_DRS, 0x2, 0x18, QBOX
RXL_FLITS_G2_NDR_AD, 0x3, 0x1, QBOX
RXL_FLITS_G2_NDR_AK, 0x3, 0x2, QBOX
RXL_FLITS_G2_NCB_DATA, 0x3, 0x4, QBOX
RXL_FLITS_G2_NCB_NONDATA, 0x3, 0x8, QBOX
RXL_FLITS_G2_NCB, 0x3, 0xC, QBOX
RXL_FLITS_G2_NCS, 0x3, 0x10, QBOX
RXL_INSERTS, 0x8, 0x0, QBOX
RXL_INSERTS_DRS_VN0, 0x9, 0x1, QBOX
RXL_INSERTS_DRS_VN1, 0x9, 0x2, QBOX
RXL_INSERTS_HOM_VN0, 0xC, 0x1, QBOX
RXL_INSERTS_HOM_VN1, 0xC, 0x2, QBOX
RXL_INSERTS_NCB_VN0, 0xA, 0x1, QBOX
RXL_INSERTS_NCB_VN1, 0xA, 0x2, QBOX
RXL_INSERTS_NCS_VN0, 0xB, 0x1, QBOX
RXL_INSERTS_NCS_VN1, 0xB, 0x2, QBOX
RXL_INSERTS_NDR_VN0, 0xE, 0x1, QBOX
RXL_INSERTS_NDR_VN1, 0xE, 0x2, QBOX
RXL_INSERTS_SNP_VN0, 0xD, 0x1, QBOX
RXL_INSERTS_SNP_VN1, 0xD, 0x2, QBOX
RXL_OCCUPANCY, 0xB, 0x0, QBOX
RXL_OCCUPANCY_DRS_VN0, 0x15, 0x1, QBOX
RXL_OCCUPANCY_DRS_VN1, 0x15, 0x2, QBOX
RXL_OCCUPANCY_HOM_VN0, 0x18, 0x1, QBOX
RXL_OCCUPANCY_HOM_VN1, 0x18, 0x2, QBOX
RXL_OCCUPANCY_NCB_VN0, 0x16, 0x1, QBOX
RXL_OCCUPANCY_NCB_VN1, 0x16, 0x2, QBOX
RXL_OCCUPANCY_NCS_VN0, 0x17, 0x1, QBOX
RXL_OCCUPANCY_NCS_VN1, 0x17, 0x2, QBOX
RXL_OCCUPANCY_NDR_VN0, 0x1A, 0x1, QBOX
RXL_OCCUPANCY_NDR_VN1, 0x1A, 0x2, QBOX
RXL_OCCUPANCY_SNP_VN0, 0x19, 0x1, QBOX
RXL_OCCUPANCY_SNP_VN1, 0x19, 0x2, QBOX
TXL0P_POWER_CYCLES, 0xD, 0x0, QBOX
TXL0_POWER_CYCLES, 0xC, 0x0, QBOX
TXL_BYPASSED, 0x5, 0x0, QBOX
TXL_CYCLES_NE, 0x6, 0x0, QBOX
TXL_FLITS_G0_IDLE, 0x0, 0x1, QBOX
TXL_FLITS_G0_DATA, 0x0, 0x2, QBOX
TXL_FLITS_G0_NON_DATA, 0x0, 0x4, QBOX
TXL_FLITS_G1_SNP, 0x0, 0x1, QBOX
TXL_FLITS_G1_HOM_REQ, 0x0, 0x2, QBOX
TXL_FLITS_G1_HOM_NONREQ, 0x0, 0x4, QBOX
TXL_FLITS_G1_HOM, 0x0, 0x6, QBOX
TXL_FLITS_G1_DRS_DATA, 0x0, 0x8, QBOX
TXL_FLITS_G1_DRS_NONDATA, 0x0, 0x10, QBOX
TXL_FLITS_G1_DRS, 0x0, 0x18, QBOX
TXL_FLITS_G2_NDR_AD, 0x1, 0x1, QBOX
TXL_FLITS_G2_NDR_AK, 0x1, 0x2, QBOX
TXL_FLITS_G2_NCB_DATA, 0x1, 0x4, QBOX
TXL_FLITS_G2_NCB_NONDATA, 0x1, 0x8, QBOX
TXL_FLITS_G2_NCB, 0x1, 0xC, QBOX
TXL_FLITS_G2_NCS, 0x1, 0x10, QBOX
TXL_INSERTS, 0x4, 0x0, QBOX
TXL_OCCUPANCY, 0x7, 0x0, QBOX
TXR_AD_HOM_CREDIT_ACQUIRED_VN0, 0x26, 0x1, QBOX
TXR_AD_HOM_CREDIT_ACQUIRED_VN1, 0x26, 0x2, QBOX
TXR_AD_HOM_CREDIT_OCCUPANCY_VN0, 0x22, 0x1, QBOX
TXR_AD_HOM_CREDIT_OCCUPANCY_VN1, 0x22, 0x2, QBOX
TXR_AD_NDR_CREDIT_ACQUIRED_VN0, 0x28, 0x1, QBOX
TXR_AD_NDR_CREDIT_ACQUIRED_VN1, 0x28, 0x2, QBOX
TXR_AD_NDR_CREDIT_OCCUPANCY_VN0, 0x24, 0x1, QBOX
TXR_AD_NDR_CREDIT_OCCUPANCY_VN1, 0x24, 0x2, QBOX
TXR_AD_SNP_CREDIT_ACQUIRED_VN0, 0x27, 0x1, QBOX
TXR_AD_SNP_CREDIT_ACQUIRED_VN1, 0x27, 0x2, QBOX
TXR_AD_SNP_CREDIT_OCCUPANCY_VN0, 0x23, 0x1, QBOX
TXR_AD_SNP_CREDIT_OCCUPANCY_VN1, 0x23, 0x2, QBOX
TXR_AK_NDR_CREDIT_ACQUIRED, 0x29, 0x0, QBOX
TXR_AK_NDR_CREDIT_OCCUPANCY, 0x25, 0x0, QBOX
TXR_BL_DRS_CREDIT_ACQUIRED_VN0, 0x2A, 0x1, QBOX
TXR_BL_DRS_CREDIT_ACQUIRED_VN1, 0x2A, 0x2, QBOX
TXR_BL_DRS_CREDIT_ACQUIRED_VN_SHR, 0x2A, 0x4, QBOX
TXR_BL_DRS_CREDIT_OCCUPANCY_VN0, 0x1F, 0x1, QBOX
TXR_BL_DRS_CREDIT_OCCUPANCY_VN1, 0x1F, 0x2, QBOX
TXR_BL_DRS_CREDIT_OCCUPANCY_VN_SHR, 0x1F, 0x4, QBOX
TXR_BL_NCB_CREDIT_ACQUIRED_VN0, 0x2B, 0x1, QBOX
TXR_BL_NCB_CREDIT_ACQUIRED_VN1, 0x2B, 0x2, QBOX
TXR_BL_NCB_CREDIT_OCCUPANCY_VN0, 0x20, 0x1, QBOX
TXR_BL_NCB_CREDIT_OCCUPANCY_VN1, 0x20, 0x2, QBOX
TXR_BL_NCS_CREDIT_ACQUIRED_VN0, 0x2C, 0x1, QBOX
TXR_BL_NCS_CREDIT_ACQUIRED_VN1, 0x2C, 0x2, QBOX
TXR_BL_NCS_CREDIT_OCCUPANCY_VN0, 0x21, 0x1, QBOX
TXR_BL_NCS_CREDIT_OCCUPANCY_VN1, 0x21, 0x2, QBOX
VNA_CREDIT_RETURNS, 0x1C, 0x0, QBOX
VNA_CREDIT_RETURN_OCCUPANCY, 0x1B, 0x0, QBOX
QPI_RATE, 0x0, 0x0, QBOX0FIX0|QBOX1FIX0|QBOX2FIX0
QPI_RX_IDLE, 0x1, 0x0, QBOX0FIX1|QBOX1FIX1|QBOX2FIX1
QPI_RX_LLR, 0x2, 0x0, QBOX0FIX2|QBOX1FIX2|QBOX2FIX2
In [3]:
!likwid-perfctr -a
 Group name	Description
--------------------------------------------------------------------------------
  FLOPS_AVX	Packed AVX MFLOP/s
  TLB_INSTR	L1 Instruction TLB miss rate/ratio
       NUMA	Local and remote memory accesses
     ENERGY	Power and Energy consumption
   TLB_DATA	L2 data TLB miss rate/ratio
      CLOCK	Power and Energy consumption
 PORT_USAGE	Execution port utilization
CYCLE_ACTIVITY	Cycle Activities
       UOPS	UOPs execution info
        QPI	QPI Link Layer data
         L2	L2 cache bandwidth in MBytes/s
     CACHES	Cache bandwidth in MBytes/s
     BRANCH	Branch prediction miss rate/ratio
       DATA	Load to store ratio
   RECOVERY	Recovery duration
  UOPS_EXEC	UOPs execution
        MEM	Main memory bandwidth in MBytes/s
 UOPS_ISSUE	UOPs issueing
     ICACHE	Instruction cache miss rate/ratio
    L3CACHE	L3 cache miss rate/ratio
    L2CACHE	L2 cache miss rate/ratio
       SBOX	Ring Transfer bandwidth
         HA	Main memory bandwidth in MBytes/s seen from Home agent
FALSE_SHARE	False sharing
UOPS_RETIRE	UOPs retirement
         L3	L3 cache bandwidth in MBytes/s
       CBOX	CBOX related data and metrics
In [28]:
!likwid-perfctr -H -g MEM
Group MEM:
Formulas:
Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0))*64.0/runtime
Memory read data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0))*64.0
Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC1))*64.0/runtime
Memory write data volume [GBytes] = 1.0E-09*(SUM(MBOXxC1))*64.0
Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0/runtime
Memory data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0
-
Profiling group to measure memory bandwidth drawn by all cores of a socket.
Since this group is based on Uncore events it is only possible to measure on a
per socket base. Some of the counters may not be available on your system.
Also outputs total data volume transferred from main memory.
The same metrics are provided by the HA group.

In [15]:
%%writefile tmp/perfctr.py

import numpy as np
import likwid

likwid.init_thread()
likwid.init_openmp_threads()

n = 2048

with likwid.Region("generation"):
    A = np.random.randn(n, n)
    b = np.random.randn(n)

with likwid.Region("matmul"):
    A @ A
Overwriting tmp/perfctr.py

Also add -m option below.

  • Advantages?
  • Disadvantages?

Make sure the MSR access daemon is SUID root:

chmod u+s /usr/sbin/likwid-accessD
In [18]:
!likwid-perfctr -C S0:0-7@S1:0-7 -M 1 -g MEM python3 ./tmp/perfctr.py
--------------------------------------------------------------------------------
CPU name:	Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
CPU type:	Intel Xeon Broadwell EN/EP/EX processor
CPU clock:	2.19 GHz
--------------------------------------------------------------------------------
Running without Marker API. Activate Marker API with -m on commandline.
--------------------------------------------------------------------------------
Group 1: MEM
+-----------------------+---------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------+-----------+
|         Event         | Counter |   Core 0   |   Core 1  |   Core 2  |   Core 3  |   Core 4  |   Core 5  |   Core 6  |   Core 7  |  Core 12  |  Core 13  |  Core 14  |  Core 15  |  Core 16  |  Core 17  |  Core 18 |  Core 19  |
+-----------------------+---------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------+-----------+
|   INSTR_RETIRED_ANY   |  FIXC0  | 1949520384 | 356458518 | 364873900 | 362027043 | 352323878 | 340961083 | 339418061 | 340180266 | 329162139 | 438126876 | 323908544 | 320824436 | 315188456 | 309798621 | 25306192 | 339338862 |
| CPU_CLK_UNHALTED_CORE |  FIXC1  | 1116681108 | 256556129 | 272782533 | 268002653 | 250929834 | 233567786 | 231861324 | 227811152 | 216251858 | 294065131 | 206608401 | 200982634 | 191193868 | 182247701 | 40194541 | 230682461 |
|  CPU_CLK_UNHALTED_REF |  FIXC2  | 1133365728 | 336271012 | 303399030 | 298917234 | 291128508 | 278965060 | 277807860 | 284598006 | 275326590 | 392295904 | 271997726 | 270141696 | 264090640 | 259688220 | 54866878 | 326025546 |
|      CAS_COUNT_RD     | MBOX0C0 |      -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -    |     -     |
|      CAS_COUNT_WR     | MBOX0C1 |      -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -    |     -     |
|      CAS_COUNT_RD     | MBOX1C0 |      -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -    |     -     |
|      CAS_COUNT_WR     | MBOX1C1 |      -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -    |     -     |
|      CAS_COUNT_RD     | MBOX2C0 |    1841226 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    297400 |         0 |         0 |         0 |         0 |         0 |        0 |         0 |
|      CAS_COUNT_WR     | MBOX2C1 |    1764762 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    169316 |         0 |         0 |         0 |         0 |         0 |        0 |         0 |
|      CAS_COUNT_RD     | MBOX3C0 |    1719502 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    395703 |         0 |         0 |         0 |         0 |         0 |        0 |         0 |
|      CAS_COUNT_WR     | MBOX3C1 |    1643243 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    268513 |         0 |         0 |         0 |         0 |         0 |        0 |         0 |
|      CAS_COUNT_RD     | MBOX4C0 |      -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -    |     -     |
|      CAS_COUNT_WR     | MBOX4C1 |      -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -    |     -     |
|      CAS_COUNT_RD     | MBOX5C0 |      -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -    |     -     |
|      CAS_COUNT_WR     | MBOX5C1 |      -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -     |     -    |     -     |
|      CAS_COUNT_RD     | MBOX6C0 |    1837196 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    293754 |         0 |         0 |         0 |         0 |         0 |        0 |         0 |
|      CAS_COUNT_WR     | MBOX6C1 |    1762917 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    167864 |         0 |         0 |         0 |         0 |         0 |        0 |         0 |
|      CAS_COUNT_RD     | MBOX7C0 |    1715048 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    392383 |         0 |         0 |         0 |         0 |         0 |        0 |         0 |
|      CAS_COUNT_WR     | MBOX7C1 |    1641179 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    266912 |         0 |         0 |         0 |         0 |         0 |        0 |         0 |
+-----------------------+---------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------+-----------+

+----------------------------+---------+------------+----------+------------+--------------+
|            Event           | Counter |     Sum    |    Min   |     Max    |      Avg     |
+----------------------------+---------+------------+----------+------------+--------------+
|   INSTR_RETIRED_ANY STAT   |  FIXC0  | 6807417259 | 25306192 | 1949520384 | 4.254636e+08 |
| CPU_CLK_UNHALTED_CORE STAT |  FIXC1  | 4420419114 | 40194541 | 1116681108 | 2.762762e+08 |
|  CPU_CLK_UNHALTED_REF STAT |  FIXC2  | 5318885638 | 54866878 | 1133365728 | 3.324304e+08 |
|      CAS_COUNT_RD STAT     | MBOX0C0 |          0 |        0 |          0 |            0 |
|      CAS_COUNT_WR STAT     | MBOX0C1 |          0 |        0 |          0 |            0 |
|      CAS_COUNT_RD STAT     | MBOX1C0 |          0 |        0 |          0 |            0 |
|      CAS_COUNT_WR STAT     | MBOX1C1 |          0 |        0 |          0 |            0 |
|      CAS_COUNT_RD STAT     | MBOX2C0 |    2138626 |        0 |    1841226 |  133664.1250 |
|      CAS_COUNT_WR STAT     | MBOX2C1 |    1934078 |        0 |    1764762 |  120879.8750 |
|      CAS_COUNT_RD STAT     | MBOX3C0 |    2115205 |        0 |    1719502 |  132200.3125 |
|      CAS_COUNT_WR STAT     | MBOX3C1 |    1911756 |        0 |    1643243 |  119484.7500 |
|      CAS_COUNT_RD STAT     | MBOX4C0 |          0 |        0 |          0 |            0 |
|      CAS_COUNT_WR STAT     | MBOX4C1 |          0 |        0 |          0 |            0 |
|      CAS_COUNT_RD STAT     | MBOX5C0 |          0 |        0 |          0 |            0 |
|      CAS_COUNT_WR STAT     | MBOX5C1 |          0 |        0 |          0 |            0 |
|      CAS_COUNT_RD STAT     | MBOX6C0 |    2130950 |        0 |    1837196 |  133184.3750 |
|      CAS_COUNT_WR STAT     | MBOX6C1 |    1930781 |        0 |    1762917 |  120673.8125 |
|      CAS_COUNT_RD STAT     | MBOX7C0 |    2107431 |        0 |    1715048 |  131714.4375 |
|      CAS_COUNT_WR STAT     | MBOX7C1 |    1908091 |        0 |    1641179 |  119255.6875 |
+----------------------------+---------+------------+----------+------------+--------------+

+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
|               Metric              |   Core 0  |   Core 1  |   Core 2  |   Core 3  |   Core 4  |   Core 5  |   Core 6  |   Core 7  |  Core 12  |  Core 13  |  Core 14  |  Core 15  |  Core 16  |  Core 17  |  Core 18  |  Core 19  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
|        Runtime (RDTSC) [s]        |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |    1.0019 |
|        Runtime unhalted [s]       |    0.5088 |    0.1169 |    0.1243 |    0.1221 |    0.1143 |    0.1064 |    0.1056 |    0.1038 |    0.0985 |    0.1340 |    0.0941 |    0.0916 |    0.0871 |    0.0830 |    0.0183 |    0.1051 |
|            Clock [MHz]            | 2162.4963 | 1674.5158 | 1973.3251 | 1967.8157 | 1891.7504 | 1837.6356 | 1831.8084 | 1756.8691 | 1723.8837 | 1645.2278 | 1667.1665 | 1632.9135 | 1588.9756 | 1540.3027 | 1607.8780 | 1552.9562 |
|                CPI                |    0.5728 |    0.7197 |    0.7476 |    0.7403 |    0.7122 |    0.6850 |    0.6831 |    0.6697 |    0.6570 |    0.6712 |    0.6379 |    0.6265 |    0.6066 |    0.5883 |    1.5883 |    0.6798 |
|  Memory read bandwidth [MBytes/s] |  454.3715 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |   88.1049 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
|  Memory read data volume [GBytes] |    0.4552 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    0.0883 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
| Memory write bandwidth [MBytes/s] |  435.1521 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |   55.7414 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
| Memory write data volume [GBytes] |    0.4360 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    0.0558 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
|    Memory bandwidth [MBytes/s]    |  889.5237 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |  143.8462 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
|    Memory data volume [GBytes]    |    0.8912 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    0.1441 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+

+----------------------------------------+------------+-----------+-----------+-----------+
|                 Metric                 |     Sum    |    Min    |    Max    |    Avg    |
+----------------------------------------+------------+-----------+-----------+-----------+
|        Runtime (RDTSC) [s] STAT        |    16.0304 |    1.0019 |    1.0019 |    1.0019 |
|        Runtime unhalted [s] STAT       |     2.0139 |    0.0183 |    0.5088 |    0.1259 |
|            Clock [MHz] STAT            | 28055.5204 | 1540.3027 | 2162.4963 | 1753.4700 |
|                CPI STAT                |    11.5860 |    0.5728 |    1.5883 |    0.7241 |
|  Memory read bandwidth [MBytes/s] STAT |   542.4764 |         0 |  454.3715 |   33.9048 |
|  Memory read data volume [GBytes] STAT |     0.5435 |         0 |    0.4552 |    0.0340 |
| Memory write bandwidth [MBytes/s] STAT |   490.8935 |         0 |  435.1521 |   30.6808 |
| Memory write data volume [GBytes] STAT |     0.4918 |         0 |    0.4360 |    0.0307 |
|    Memory bandwidth [MBytes/s] STAT    |  1033.3699 |         0 |  889.5237 |   64.5856 |
|    Memory data volume [GBytes] STAT    |     1.0353 |         0 |    0.8912 |    0.0647 |
+----------------------------------------+------------+-----------+-----------+-----------+

In [ ]: