1 of 25

Caches I

CS61C

Great Ideas

in�Computer Architecture

(a.k.a. Machine Structures)

UC Berkeley�Teaching Professor �Dan Garcia

cs61c.org

Garcia

Caches I (1)

2 of 25

Binary Prefix�(review)

  • Binary Prefix (review)
  • Library Analogy
  • Memory Hierarchy
  • Locality, Design, Management

Garcia

Caches I (2)

3 of 25

The way to remember #s

  • What is 234? How many bits to address 2.5 TiB?�(I.e., what’s ceil log2 = lg of …)
  • Answer!�2XY�means…

X=0

---

Y=0

1

X=1

kibi ~103

Y=1

2

X=2

mebi ~106

Y=2

4

X=3

gibi ~109

Y=3

8

X=4

tebi ~1012

Y=4

16

X=5

pebi ~1015

Y=5

32

X=6

exbi ~1018

Y=6

64

X=7

zebi ~1021

Y=7

128

X=8

yobi ~1024

Y=8

256

Y=9

512

MEMORIZE!

[review]

Garcia

Caches I (3)

4 of 25

What units do different storage use?

  • Base-10 (SI): Hard Disk manufacturers & Telecomms
    • What is advertised as a 1 TB drive actually holds about 90% of what you expect.
    • A 1 Mbit/s connection transfers 106 bps.
  • Base-2 (IEC): RAM and Caches, for the most part
    • We will try to be clear in this course

RAM and Caches use powers of 2.

Garcia

Caches I (4)

5 of 25

Library Analogy

  • Binary Prefix (review)
  • Library Analogy
  • Memory Hierarchy
  • Locality, Design, Management

Garcia

Caches I (5)

6 of 25

Why are Large Memories Slow? Library Analogy

  • Time to find a book in a large library
    • Search a large card catalog – (mapping title/author to index number)
    • Round-trip time to walk to the stacks and retrieve the desired book
  • Larger libraries worsen both delays
  • Electronic memories have same issue, plus the tech used to store a bit slow down as density increases (e.g., SRAM vs. DRAM vs. Disk)

What we want is a large, yet fast memory!

Garcia

Caches I (6)

7 of 25

What to do: Library Analogy

  • Write a report using library books
    • E.g., works of J.D. Salinger
  • Go to library, look up books, fetch from stacks, and place on desk in library
  • If need more, check out, keep on desk
    • But don’t return earlier books; might need them!
  • You hope this collection of ~10 books on desk enough to write report, despite 10 being only 0.00001% of books in UC Berkeley libraries

Garcia

Caches I (7)

8 of 25

Memory Hierarchy

  • Binary Prefix (review)
  • Library Analogy
  • Memory Hierarchy
  • Locality, Design, Management

Garcia

Caches I (8)

9 of 25

New-School Machine Structures

Parallel Requests

Assigned to computer

e.g., Search “Cats”

Parallel Threads

Assigned to core e.g., Lookup, Ads

Parallel Instructions>1 instruction @ one time

e.g., 5 pipelined instructions

Parallel Data�>1 data item @ one time

e.g., Add of 4 pairs of words

Hardware descriptions�All gates work in parallel at same time

Smart�Phone

Warehouse Scale Computer

Software Hardware

Logic Gates

Core

Core

Memory (Cache)

Input/Output

Computer

Main Memory

Exec. Unit(s)

Functional

Block(s)

A0+B0

A1+B1

Harness�Parallelism &

Achieve High�Performance

Garcia

Caches I (9)

10 of 25

Processor-DRAM Gap (Latency)�

1980 microprocessor executes ~one instruction in same time as DRAM access

2020 microprocessor executes ~1000 instructions in same time as DRAM access

Slow DRAM access has disastrous impact on CPU performance!

🗹

Garcia

Caches I (10)

11 of 25

Great Idea #3: Principle of Locality / Memory Hierarchy

Extremely fast

Extremely expensive�Tiny capacity

Processor chip

Fast

Priced reasonably�Medium capacity

DRAM chip –e.g. �DDR3/4/5�HBM/HBM2/3

Faster

Expensive�Small capacity

A UC Berkeley EECS invention: “Cache” sounds like “Cash”, so use shorthand $

Garcia

Caches I (11)

12 of 25

Computer without Cache

Garcia

Caches I (12)

13 of 25

Computer with Cache (copies of recent data)

Lines (i.e., blocks) of data are copied from memory to the cache.

Garcia

Caches I (13)

14 of 25

Jim Gray’s Storage Latency Analogy: �How Far Away is the Data?

Jim Gray�Turing Award

B.S. Cal 1966

Ph.D. Cal 1969

Registers

On-chip cache

Memory

1

2

100

Sacramento

This Room

My Head

1.5 hr

1 min

2 min

[ns]

Garcia

Caches I (14)

15 of 25

Characteristics of the Memory Hierarchy

Increasing distance from the processor in access time

$

Main Memory

Secondary Memory

Processor

(Relative) size of the memory at each level

Inclusive– what is in Level 1 cache (L1$) is a subset what is in MM that is a subset of is in SM

4-8 bytes (word)

1 to 4 blocks

1,024+ bytes�(disk sector = page)

8-32 bytes (block)

Garcia

Caches I (15)

16 of 25

Typical Memory Hierarchy

  • The Abstraction: present processor with as much memory as is available in the cheapest technology at the speed offered by the fastest technology.

Control

Processor

Datapath

Secondary

Memory

(Disk

Or Flash)

On-Chip Components

RegFile

Main

Memory

(DRAM)

Data

Cache

Instr

Cache

Speed (#cycles): ½’s 1’s 10’s 100’s 10,000’s

Size (bytes): 100’s 10K’s M’s G’s T’s

Cost: highest lowest

Garcia

Caches I (16)

17 of 25

Memory Hierarchy

  • If level closer to Processor, it is:
    • Smaller
    • Faster
    • More expensive
    • Subset of lower levels (contains most recently used data)
  • Lowest Level (usually disk=HDD/SSD) contains all available data (does it go beyond the disk?)
  • Memory Hierarchy presents the processor with the illusion of a very large & fast memory

🗹

Garcia

Caches I (17)

18 of 25

Main Memory is DRAM

  • Dynamic Random Access Memory (DRAM):
    • Latency to access first word: ~10ns (~30-40 proc cycles), �each successive (0.5ns – 1ns)
    • Each access brings 64 bits, supports ‘bursts’
    • Cost: ~$3/GB
    • Data is impermanent:
      • Dynamic: capacitors store bits, so needs periodic �refresh to maintain charge
      • Volatile: when power is removed, loses data.
  • Contrast with SRAM = Static RAM (for caches):
    • Static (no capacitors) but still volatile
    • Faster (0.5 ns)/more expensive/lower density
    • They now make non-volatile SRAM too (nvSRAM)
      • Useful for medical applications

Desktop/server DIMM

(dual inline memory module)

SRAM chip from NES clone

Garcia

Caches I (18)

19 of 25

Storage / “Disk” / Secondary Memory

  • Solid-State Drive (SSD)
    • Access: 40-100μs�(~100k proc. cycles)
    • $0.05-0.5/GB
    • Usually flash memory
  • Hard Disk Drive (HDD)
    • Access: <5-10ms�(10-20M proc. cycles)
    • $0.01-0.1/GB
    • Mechanical

Attached as a peripheral I/O device. Non-volatile

Garcia

Caches I (19)

20 of 25

Locality, Design, Management

  • Binary Prefix (review)
  • Library Analogy
  • Memory Hierarchy
  • Locality, Design, Management

Garcia

Caches I (20)

21 of 25

Caching: The basis of the memory hierarchy

  • A cache contains copies of data that are being used.
  • A cache works on the principles of temporal and spatial locality.

Temporal Locality

Spatial Locality

Idea

If we use it now, chances are that we’ll want to use it again soon.

If we use a piece of memory, chances are we’ll use the neighboring pieces soon.

Library Analogy

We keep a book on the desk while we check out another book.

If we check out a book’s vol. 1 while we’re at it, we’ll also check out vol. 2.

Memory

If a memory location is referenced, then it will tend to be referenced again soon. Therefore, keep most recently accessed data items closer to the processor.

If a memory location is referenced, the locations with nearby addresses will tend to be referenced soon. Move lines consisting of contiguous words closer to the processor.

Garcia

Caches I (21)

22 of 25

Cache Design

  • How do we organize cache?
  • Where does each memory address map to?
    • (Remember that cache is subset of memory, so multiple memory addresses map to the same cache location.)
  • How do we know which elements are in cache?
  • How do we quickly locate them?

Garcia

Caches I (22)

23 of 25

Garcia

Caches I (23)

24 of 25

How is the Hierarchy Managed?

  • registers ↔ memory
    • By compiler (or assembly level programmer)
  • cache ↔ main memory
    • By the cache controller hardware
  • main memory ↔ disks (secondary storage)
    • By the operating system (virtual memory)
    • Virtual to physical address mapping assisted by the hardware (‘translation lookaside buffer’ or TLB)
    • By the programmer (files)

Also a type of cache

Garcia

Caches I (24)

25 of 25

“And in Conclusion…”

  • Caches provide an illusion to the processor that the memory is infinitely large and infinitely fast
    • …the power of Abstraction!

Garcia

Caches I (25)