1 of 44

�Caches III

CS61C

UC Berkeley�Teaching Professor �Dan Garcia

cs61c.org

Great Ideas

in�Computer Architecture

(a.k.a. Machine Structures)

Garcia

Caches III (1)

Garcia

2 of 44

Direct Mapped Example

Caches III (2)

Garcia

3 of 44

Accessing data in a direct mapped cache

  • Ex.: 16KB of data, direct-mapped, �4 word blocks
    • Can you work out height, width, area?
  • Read 4 addresses
    1. 0x00000014
    2. 0x0000001C
    3. 0x00000034
    4. 0x00008014
  • Memory values here:

Address (hex)

Value of Word

Memory

00000010

00000014

00000018

0000001C

a

b

c

d

...

...

00000030

00000034

00000038

0000003C

e

f

g

h

00008010

00008014

00008018

0000801C

i

j

k

l

...

...

...

...

...

...

Caches III (3)

Garcia

Caches III (3)

Garcia

4 of 44

Accessing data in a direct mapped cache

  • 4 Addresses:
    • 0x00000014, 0x0000001C, �0x00000034, 0x00008014
  • 4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields

000000000000000000 0000000001 0100

000000000000000000 0000000001 1100

000000000000000000 0000000011 0100

000000000000000010 0000000001 0100� Tag Index Offset

Caches III (4)

Garcia

5 of 44

Example: 16 KB Direct-Mapped Cache, 16B blocks

  • Valid bit: determines whether anything is stored in that row (when computer initially powered up, all entries invalid)

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

0

0

0

0

0

0

0

0

0

Caches III (5)

Garcia

6 of 44

1. Read 0x00000014

  • 000000000000000000 0000000001 0100

Tag Index Offset

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

0

0

0

0

0

0

0

0

0

Caches III (6)

Garcia

7 of 44

So we read block 1 (0000000001)

  • 000000000000000000 0000000001 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

0

0

0

0

0

0

0

0

0

Tag Index Offset

Caches III (7)

Garcia

8 of 44

No valid data

  • 000000000000000000 0000000001 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

0

0

0

0

0

0

0

0

0

Tag Index Offset

Caches III (8)

Garcia

9 of 44

So load that data into cache, setting tag, valid

  • 000000000000000000 0000000001 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

0

0

0

0

0

0

0

0

d

c

b

a

Tag Index Offset

Caches III (9)

Garcia

10 of 44

Read from cache at offset, return word b�

  • 000000000000000000 0000000001 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

0

0

0

0

0

0

0

0

d

c

b

a

Tag Index Offset

Caches III (10)

Garcia

11 of 44

2. Read 0x0000001C = 0…00 0..001 1100

  • 000000000000000000 0000000001 1100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

0

0

0

0

0

0

0

0

d

c

b

a

Tag Index Offset

Caches III (11)

Garcia

12 of 44

Index is Valid

  • 000000000000000000 0000000001 1100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

0

0

0

0

0

0

0

0

d

c

b

a

Tag Index Offset

Caches III (12)

Garcia

13 of 44

Index is Valid, Tag Matches

  • 000000000000000000 0000000001 1100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

0

0

0

0

0

0

0

0

d

c

b

a

Tag Index Offset

Caches III (13)

Garcia

14 of 44

Index is Valid, Tag Matches, return d

  • 000000000000000000 0000000001 1100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

0

0

0

0

0

0

0

0

d

c

b

a

Tag Index Offset

Caches III (14)

Garcia

15 of 44

3. Read 0x00000034 = 0…00 0..011 0100

  • 000000000000000000 0000000011 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

0

0

0

0

0

0

0

0

d

c

b

a

Tag Index Offset

Caches III (15)

Garcia

16 of 44

So read block 3

  • 000000000000000000 0000000011 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

0

0

0

0

0

0

0

0

d

c

b

a

Tag Index Offset

Caches III (16)

Garcia

17 of 44

No valid data

  • 000000000000000000 0000000011 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

0

0

0

0

0

0

0

0

d

c

b

a

Tag Index Offset

Caches III (17)

Garcia

18 of 44

Load that cache block, return word f

  • 000000000000000000 0000000011 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

1

0

0

0

0

0

0

0

d

c

b

a

0

h

g

f

e

Tag Index Offset

Caches III (18)

Garcia

19 of 44

4. Read 0x00008014 = 0…10 0..001 0100

  • 000000000000000010 0000000001 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

1

0

0

0

0

0

0

0

d

c

b

a

0

h

g

f

e

Tag Index Offset

Caches III (19)

Garcia

20 of 44

So read Cache Block 1, Data is Valid

  • 000000000000000010 0000000001 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

1

0

0

0

0

0

0

0

d

c

b

a

0

h

g

f

e

Tag Index Offset

Caches III (20)

Garcia

21 of 44

Cache Block 1 Tag does not match (0 ≠ 2)

  • 000000000000000010 0000000001 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

1

0

0

0

0

0

0

0

d

c

b

a

0

h

g

f

e

Tag Index Offset

Caches III (21)

Garcia

22 of 44

Miss, so replace block 1 with new data & tag

  • 000000000000000010 0000000001 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

1

0

0

0

0

0

0

2

l

k

j

i

0

h

g

f

e

Tag Index Offset

Caches III (22)

Garcia

23 of 44

And return word j

  • 000000000000000010 0000000001 0100

...

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

4

5

6

7

1022

1023

...

Index

0

1

0

1

0

0

0

0

0

0

2

l

k

j

i

0

h

g

f

e

Tag Index Offset

Caches III (23)

Garcia

24 of 44

Do an example yourself. What happens?

  • Chose from: Cache: Hit, Miss, Miss w. replace�Values returned: a ,b, c, d, e, ..., k, l
  • Read address 0x00000030 ?000000000000000000 0000000011 0000

0

1

2

3

4

5

6

7

Index

2

l

k

j

i

0

h

g

f

e

0

1

0

1

0

0

0

0

Caches III (24)

Garcia

25 of 44

Caches III (25)

Garcia

26 of 44

Do an example yourself. What happens?

  • Chose from: Cache: Hit, Miss, Miss w. replace�Values returned: a ,b, c, d, e, ..., k, l

  • Read address 0x0000001c ?� 000000000000000000 0000000001 1100

0

1

2

3

4

5

6

7

Index

2

l

k

j

i

0

h

g

f

e

0

1

0

1

0

0

0

0

Caches III (26)

Garcia

27 of 44

Caches III (27)

Garcia

28 of 44

Answers

  • 0x00000030 a hit

Index = 3, Tag matches, �Offset = 0, value = e

  • 0x0000001c a miss

Index = 1, Tag mismatch, so replace from memory, �Offset = 0xc, value = d

  • Since reads, values �must = memory values �whether or not cached:
    • 0x00000030 = e
    • 0x0000001c = d

Address (hex)

Value of Word

Memory

00000010

00000014

00000018

0000001C

a

b

c

d

...

...

00000030

00000034

00000038

0000003C

e

f

g

h

00008010

00008014

00008018

0000801C

i

j

k

l

...

...

...

...

...

...

🗹

Caches III (28)

Garcia

Caches III (28)

Garcia

29 of 44

Writes, Block Sizes, Misses

Caches III (29)

Garcia

30 of 44

Multiword-Block Direct-Mapped Cache

  • Four words/block, cache size = 4K words�

10

Index

Data

Index

Tag

Valid

0

1

2

.

.

.

1021

1022

1023

31 30 . . . 15 14 13 . . . 4 3 2 1 0

Byte offset

18

18

Tag

Hit

AND

“MUX”�4🡪1 Multiplexor

Data

32

Block offset

2

What kind of locality are we taking advantage of?

Caches III (30)

Garcia

31 of 44

What to do on a write hit?

  • Write-through
    • Update both cache and memory
  • Write-back
    • update word in cache block
    • allow memory word to be “stale”
    • 🡺 add ‘dirty’ bit to block
      • memory & Cache inconsistent
      • needs to be updated when block is replaced
    • …OS flushes cache before I/O…
  • Performance trade-offs?

Caches III (31)

Garcia

32 of 44

Block Size Tradeoff

  • Benefits of Larger Block Size
    • Spatial Locality: if we access a given word, we’re likely to access other nearby words soon
    • Very applicable with Stored-Program Concept
    • Works well for sequential array accesses
  • Drawbacks of Larger Block Size
    • Larger block size means larger miss penalty
      • on a miss, takes longer time to load a new block from next level
    • If block size is too big relative to cache size, then there are too few blocks
      • Result: miss rate goes up

Caches III (32)

Garcia

33 of 44

Extreme Example: One Big Block

  • Cache Size = 4 bytes Block Size = 4 bytes
    • Only ONE entry (row) in the cache!
  • If item accessed, likely accessed again soon
    • But unlikely will be accessed again immediately!
  • The next access will likely to be a miss again
    • Continually loading data into the cache but�discard data (force out) before use it again
    • Nightmare for cache designer: Ping Pong Effect

Cache Data

Valid Bit

B 0

B 1

B 3

Tag

B 2

Caches III (33)

Garcia

34 of 44

Block Size Tradeoff Conclusions

Miss

Penalty

Block Size

Increased Miss Penalty

& Miss Rate

Average

Access

Time

Block Size

Exploits Spatial Locality

Fewer blocks:

compromises

temporal locality

Miss

Rate

Block Size

Caches III (34)

Garcia

35 of 44

Types of Cache Misses (1/2)

  • “Three Cs” Model of Misses
  • 1st C: Compulsory Misses
    • occur when a program is first started
    • cache does not contain any of that program’s data yet, so misses are bound to occur
    • can’t be avoided easily, so won’t focus on these in this course
    • Every block of memory will have one compulsory miss (NOT only every block of the cache)

Caches III (35)

Garcia

36 of 44

Types of Cache Misses (2/2)

  • 2nd C: Conflict Misses
    • miss that occurs because two distinct memory addresses map to the same cache location
    • two blocks (which happen to map to the same location) can keep overwriting each other
    • big problem in direct-mapped caches
    • how do we lessen the effect of these?
  • Dealing with Conflict Misses
    • Solution 1: Make the cache size bigger
      • Fails at some point
    • Solution 2: Multiple distinct blocks can fit in the same cache Index?

🗹

Caches III (36)

Garcia

37 of 44

Fully Associative Caches

Caches III (37)

Garcia

38 of 44

Fully Associative Cache (1/3)

  • Memory address fields:
    • Tag: same as before
    • Offset: same as before
    • Index: non-existant
  • What does this mean?
    • no “rows”: any block can go anywhere in the cache
    • must compare with all tags in entire cache to see if data is there

Caches III (38)

Garcia

39 of 44

Fully Associative Cache (2/3)

  • Fully Associative Cache (e.g., 32 B block)
    • compare tags in parallel

Byte Offset

:

Cache Data

B 0

0

4

31

:

Cache Tag (27 bits long)

Valid

:

B 1

B 31

:

Cache Tag

=

=

=

=

=

:

Caches III (39)

Garcia

40 of 44

Fully Associative Cache (3/3)

  • Benefit of Fully Assoc Cache
    • No Conflict Misses (since data can go anywhere)
  • Drawbacks of Fully Assoc Cache
    • Need hardware comparator for every single entry: if we have a 64KB of data in cache with 4B entries, we need 16K comparators: infeasible

Caches III (40)

Garcia

41 of 44

Final Type of Cache Miss

  • 3rd C: Capacity Misses
    • miss that occurs because the cache has a limited size
    • miss that would not occur if we increase the size of the cache
    • sketchy definition, so just get the general idea
  • This is the primary type of miss for Fully Associative caches.

Caches III (41)

Garcia

42 of 44

How to categorize misses

  • Run an address trace against a set of caches:
    • First, consider an infinite-size, fully-associative cache. For every miss that occurs now, consider it a compulsory miss.
    • Next, consider a finite-sized cache (of the size you want to examine) with full-associativity. Every miss that is not in #1 is a capacity miss.
    • Finally, consider a finite-size cache with finite-associativity. All of the remaining misses that are not #1 or #2 are conflict misses.
    • (Thanks to Prof. Kubiatowicz for the algorithm)

Caches III (42)

Garcia

43 of 44

Caches III (43)

Garcia

44 of 44

And in Conclusion…

  1. Divide into T I O bits, Go to Index = I, check valid
    1. If 0, load block, set valid and tag (COMPULSORY MISS) and use offset to return the right chunk (1,2,4-bytes)
    2. If 1, check tag
      1. If Match (HIT), use offset to return the right chunk
      2. If not (CONFLICT MISS), load block, set valid and tag, use offset to return the right chunk

Valid

Tag

0xc-f

0x8-b

0x4-7

0x0-3

0

1

2

3

...

1

0

d

c

b

a

000000000000000000 0000000001 1100

address: tag index offset

🗹

Caches III (44)

Garcia