

Review

# A Glimpse of a More Modern Processor



44

# A Glimpse of a More Modern Processor: Frontend

Sandy Bridge



[David Kanter / Realworldtech.com]

# A Glimpse of a More Modern Processor: Backend



- New concept: Instruction-level parallelism ("ILP", "superscalar")
- ▶ Where does the IPC number from earlier come from?

[David Kanter / Realworldtech.com]

#### A Glimpse of a More Modern Processor: Golden Cove





Demo: intro/More Pipeline Mysteries

# SMT/"Hyperthreading"







# SMT/"Hyperthreading"



Q: Potential issues?

#### Outline

#### Introduction

Notes Notes (unfilled, with empty boxes) Notes (source code on Github) About This Class Why Bother with Parallel Computers? Lowest Accessible Abstraction: Assembly Architecture of an Execution Pipeline Architecture of a Memory System Shared-Memory Multiprocessors

Machine Abstractions

Performance: Expectation, Experiment, Observation

Deutomana Oriented Languages and Abstractions

# More Bad News from Dennard

| Parameter       | Factor       |
|-----------------|--------------|
| Dimension       | $(1/\kappa)$ |
| Line Resistance | $\kappa$     |
| Voltage drop    | $\kappa$     |
| Response time   | 1            |
| Current density | $\kappa$     |

[Dennard et al. '74, via Bohr '07]

- The above scaling law is for on-chip interconnects.
- Current ~ Power vs. response time

Getting information from

- processor to memory
- one computer to the next

is

- ► slow (in *latency*) *←*
- power-hungry



#### Somewhere Behind the Interconnect: Memory

Performance characteristics of memory:

Bandwidth

Latency
 Flops are cheap
 Bandwidth is money
 Latency is physics — M. Hoemmen



Minor addition (but important for us)?

Bandwillth to mony and code structure.

# Latency is Physics: Distance





[Wikipedia 😇]

# Latency is Physics: Electrical Model



#### Latency is Physics: DRAM



# Latency is Physics: Performance Impact?

What is the performance impact of high memory latency?



Idea:

Put a look-up table of recently-used data onto the chip.



# Memory Hierarchy



#### A Basic Cache

Demands on cache implementation:

- ► Fast, small, cheap, low power
- ► Fine-grained
- ► High "hit"-rate (few "misses")



Design Goals: at odds with each other. Why?

# Caches: Engineering Trade-Offs

Engineering Decisions:

More data per unit of access matching logic

 — Larger "Cache Lines"

► Simpler/less access matching logic → Less than full "Associativity"

Eviction strategy

Size

#### Associativity

Direct Mapped:



2-way set associative:



Size/Associativity vs Hit Rate



Miss rate versus cache size on the Integer portion of SPEC CPU2000 [Cantin, Hill 2003]