1 of 25

Caches I

CS61C

Great Ideas

in�Computer Architecture

(a.k.a. Machine Structures)

UC Berkeley�Teaching Professor �Dan Garcia

cs61c.org

Garcia

Caches I (1)

2 of 25

Binary Prefix�(review)

Binary Prefix (review)
Library Analogy
Memory Hierarchy
Locality, Design, Management

Garcia

Caches I (2)

3 of 25

The way to remember #s

What is 2³⁴? How many bits to address 2.5 TiB?�(I.e., what’s ceil log₂ = lg of …)
Answer!�2^X^Y�means…

X=0	---	Y=0	1
X=1	kibi ~10³	Y=1	2
X=2	mebi ~10⁶	Y=2	4
X=3	gibi ~10⁹	Y=3	8
X=4	tebi ~10¹²	Y=4	16
X=5	pebi ~10¹⁵	Y=5	32
X=6	exbi ~10¹⁸	Y=6	64
X=7	zebi ~10²¹	Y=7	128
X=8	yobi ~10²⁴	Y=8	256
		Y=9	512

MEMORIZE!

[review]

Garcia

Caches I (3)

4 of 25

What units do different storage use?

Base-10 (SI): Hard Disk manufacturers & Telecomms

What is advertised as a 1 TB drive actually holds about 90% of what you expect.
A 1 Mbit/s connection transfers 10⁶ bps.

Base-2 (IEC): RAM and Caches, for the most part

We will try to be clear in this course

RAM and Caches use powers of 2.

Garcia

Caches I (4)

5 of 25

Library Analogy

Binary Prefix (review)
Library Analogy
Memory Hierarchy
Locality, Design, Management

Garcia

Caches I (5)

6 of 25

Why are Large Memories Slow? Library Analogy

Time to find a book in a large library

Search a large card catalog – (mapping title/author to index number)
Round-trip time to walk to the stacks and retrieve the desired book

Larger libraries worsen both delays
Electronic memories have same issue, plus the tech used to store a bit slow down as density increases (e.g., SRAM vs. DRAM vs. Disk)

What we want is a large, yet fast memory!

Garcia

Caches I (6)

7 of 25

What to do: Library Analogy

Write a report using library books

E.g., works of J.D. Salinger

Go to library, look up books, fetch from stacks, and place on desk in library
If need more, check out, keep on desk

But don’t return earlier books; might need them!

You hope this collection of ~10 books on desk enough to write report, despite 10 being only 0.00001% of books in UC Berkeley libraries

Garcia

Caches I (7)

8 of 25

Memory Hierarchy

Binary Prefix (review)
Library Analogy
Memory Hierarchy
Locality, Design, Management

Garcia

Caches I (8)

9 of 25

New-School Machine Structures

Parallel Requests

Assigned to computer

e.g., Search “Cats”

Parallel Threads

Assigned to core e.g., Lookup, Ads

Parallel Instructions�>1 instruction @ one time

e.g., 5 pipelined instructions

Parallel Data�>1 data item @ one time

e.g., Add of 4 pairs of words

Hardware descriptions�All gates work in parallel at same time

Smart�Phone

Warehouse Scale Computer

Software Hardware

Logic Gates

Core

…

Memory (Cache)

Input/Output

Computer

Main Memory

Exec. Unit(s)

Functional

Block(s)

A₀+B₀

A₁+B₁

Harness�Parallelism &

Achieve High�Performance

Garcia

Caches I (9)

10 of 25

Processor-DRAM Gap (Latency)�

1980 microprocessor executes ~one instruction in same time as DRAM access

2020 microprocessor executes ~1000 instructions in same time as DRAM access

Slow DRAM access has disastrous impact on CPU performance!

🗹

Garcia

Caches I (10)

11 of 25

Great Idea #3: Principle of Locality / Memory Hierarchy

Extremely fast

Extremely expensive�Tiny capacity

Processor chip

Fast

Priced reasonably�Medium capacity

DRAM chip –e.g. �DDR3/4/5�HBM/HBM2/3

Faster

Expensive�Small capacity

A UC Berkeley EECS invention: “Cache” sounds like “Cash”, so use shorthand $

Garcia

Caches I (11)

12 of 25

Computer without Cache

Garcia

Caches I (12)

13 of 25

Computer with Cache (copies of recent data)

Lines (i.e., blocks) of data are copied from memory to the cache.

Garcia

Caches I (13)

14 of 25

Jim Gray’s Storage Latency Analogy: �How Far Away is the Data?

Jim Gray�Turing Award

B.S. Cal 1966

Ph.D. Cal 1969

Registers

On-chip cache

Memory

1

2

100

Sacramento

This Room

My Head

1.5 hr

1 min

2 min

[ns]

Garcia

Caches I (14)

15 of 25

Characteristics of the Memory Hierarchy

Increasing distance from the processor in access time

$

Main Memory

Secondary Memory

Processor

(Relative) size of the memory at each level

Inclusive– what is in Level 1 cache (L1$) is a subset what is in MM that is a subset of is in SM

4-8 bytes (word)

1 to 4 blocks

1,024+ bytes�(disk sector = page)

8-32 bytes (block)

Garcia

Caches I (15)

16 of 25

Typical Memory Hierarchy

The Abstraction: present processor with as much memory as is available in the cheapest technology at the speed offered by the fastest technology.

Control

Processor

Datapath

Secondary

Memory

(Disk

Or Flash)

On-Chip Components

RegFile

Main

Memory

(DRAM)

Data

Cache

Instr

Cache

Speed (#cycles): ½’s 1’s 10’s 100’s 10,000’s

Size (bytes): 100’s 10K’s M’s G’s T’s

Cost: highest lowest

Garcia

Caches I (16)

Instead, the memory system of a modern computer consists of a series of black boxes ranging from the fastest to the slowest.

Besides variation in speed, these boxes also varies in size (smallest to biggest) and cost.

What makes this kind of arrangement work is one of the most important principle in computer design. The principle of locality. The principle of locality states that programs access a relatively small portion of the address space at any instant of time.

The design goal is to present the user with as much memory as is available in the cheapest technology (points to the disk).

While by taking advantage of the principle of locality, we like to provide the user an average access speed that is very close to the speed that is offered by the fastest technology.

(We will go over this slide in detail in the next lectures on caches).

17 of 25

Memory Hierarchy

If level closer to Processor, it is:

Smaller
Faster
More expensive
Subset of lower levels (contains most recently used data)

Lowest Level (usually disk=HDD/SSD) contains all available data (does it go beyond the disk?)
Memory Hierarchy presents the processor with the illusion of a very large & fast memory

🗹

Garcia

Caches I (17)

18 of 25

Main Memory is DRAM

Dynamic Random Access Memory (DRAM):

Latency to access first word: ~10ns (~30-40 proc cycles), �each successive (0.5ns – 1ns)
Each access brings 64 bits, supports ‘bursts’
Cost: ~$3/GB
Data is impermanent:

Dynamic: capacitors store bits, so needs periodic �refresh to maintain charge
Volatile: when power is removed, loses data.

Contrast with SRAM = Static RAM (for caches):

Static (no capacitors) but still volatile
Faster (0.5 ns)/more expensive/lower density
They now make non-volatile SRAM too (nvSRAM)

Useful for medical applications

Desktop/server DIMM

(dual inline memory module)

SRAM chip from NES clone

Garcia

Caches I (18)

19 of 25

Storage / “Disk” / Secondary Memory

Solid-State Drive (SSD)

Access: 40-100μs�(~100k proc. cycles)
$0.05-0.5/GB
Usually flash memory

Hard Disk Drive (HDD)

Access: <5-10ms�(10-20M proc. cycles)
$0.01-0.1/GB
Mechanical

Attached as a peripheral I/O device. Non-volatile

Garcia

Caches I (19)

20 of 25

Locality, Design, Management

Binary Prefix (review)
Library Analogy
Memory Hierarchy
Locality, Design, Management

Garcia

Caches I (20)

21 of 25

Caching: The basis of the memory hierarchy

A cache contains copies of data that are being used.
A cache works on the principles of temporal and spatial locality.

	Temporal Locality	Spatial Locality
Idea	If we use it now, chances are that we’ll want to use it again soon.	If we use a piece of memory, chances are we’ll use the neighboring pieces soon.
Library Analogy	We keep a book on the desk while we check out another book.	If we check out a book’s vol. 1 while we’re at it, we’ll also check out vol. 2.
Memory	If a memory location is referenced, then it will tend to be referenced again soon. Therefore, keep most recently accessed data items closer to the processor.	If a memory location is referenced, the locations with nearby addresses will tend to be referenced soon. Move lines consisting of contiguous words closer to the processor.

Garcia

Caches I (21)

22 of 25

Cache Design

How do we organize cache?
Where does each memory address map to?

(Remember that cache is subset of memory, so multiple memory addresses map to the same cache location.)

How do we know which elements are in cache?
How do we quickly locate them?

Garcia

Caches I (22)

23 of 25

Garcia

Caches I (23)

24 of 25

How is the Hierarchy Managed?

registers ↔ memory

By compiler (or assembly level programmer)

cache ↔ main memory

By the cache controller hardware

main memory ↔ disks (secondary storage)

By the operating system (virtual memory)
Virtual to physical address mapping assisted by the hardware (‘translation lookaside buffer’ or TLB)
By the programmer (files)

Also a type of cache

Garcia

Caches I (24)

25 of 25

“And in Conclusion…”

Caches provide an illusion to the processor that the memory is infinitely large and infinitely fast

…the power of Abstraction!

Garcia

Caches I (25)