1 of 51

Lab 4 Details

1

2 of 51

Administrivia

  • Lab 4 Design Doc due Friday 5/30! (no late days)
  • Lab 4 Code + Questions due 6/9 (No grace period)

2

3 of 51

Agenda

  1. Part C: Crash Safety
  2. Part A clarification & tips
  3. Part B

3

4 of 51

Part C: Crash Safety

4

5 of 51

Guaranteeing Atomic Operations

Now that our file system is writable, we need to ensure that it is crash consistent. But what do we mean by crash consistent? Let’s take a look at an example:

Say we have “file.txt” which is 512 bytes long. We try to append 50 bytes to this file.

Is this operation atomic?

5

6 of 51

Guaranteeing Atomic Operations

No! We need to multiple block writes to accomplish this simple append:

  1. Invoke file_write
  2. Compute new file size
  3. Update size on disk (dinode)
  4. Update file contents in memory
  5. Write the new file contents to disk

That’s three block writes!

6

7 of 51

Guaranteeing Atomic Operations

What happens if we crash before we write the new file contents to disk?

  1. Invoke file_write
  2. Compute new file size
  3. Update size on disk (dinode)
  4. Update file contents in memory
  5. Write the new file contents to disk CRASH

When we reboot the system… We think “file.txt” is 562 bytes long, but the last 50 bytes are garbage, not what we tried to write!

So how do we solve this problem?

7

8 of 51

Journaling

For any operation which must write multiple disk blocks atomically…

  1. Write new blocks into the log, rather than target place. Track what target is.
  2. Once all blocks are in the log, mark the log as “committed”
  3. Copy data from the log to where they should be (apply the log!)
  4. Clear the commit flag

On file system initialization(iinit) check the log for recovery

If not committed, do nothing

If so, apply the log (this is idempotent!)

8

9 of 51

Designing Your Log

  • Specify a log header (metadata for the log)
    • a structure that lives on disk
    • should not exceed a sector
  • Designed by you! Should at least track:
    • transaction status (committed or not)
    • usage status of log region
    • where to apply logged blocks

9

10 of 51

Log API

  • The spec recommends designing an API for yourself for log operations:
    • log_begin_tx: (optional) begin the process of a transaction
    • log_write: wrapper function around normal block writes
    • log_commit_tx: complete a transaction and write out the commit block
    • log_apply: apply the actual content of the log
      • use at commit time and during recovery time

10

11 of 51

More on log_write

  • log_write is intended to be a wrapper function for bwrite() operations
  • Instead of writing the block to its location on disk, we want to:
    • Write the block information to our log region
    • Update the log header with the location of the block

11

12 of 51

More on log_commit_tx

  • Should first write the log header to disk to indicate that txn is committed
  • Then apply the log content (log_apply)
    • Copy blocks from previous log_writes to their actual location on disk
  • Reset commit flag when done

12

13 of 51

Journaling: Quick Recap

For any operation which must write multiple disk blocks atomically…

  1. Write new blocks into the log, rather than target place. Track what target is.
  2. Once all blocks are in the log, mark the log as “committed”
  3. Copy data from the log to where they should be
  4. Clear the commit flag

On system boot, check the log. If not committed, do nothing. If so, redo the copy (copy is idempotent)

13

14 of 51

Log Header Format

  • Log header = metadata for the log
    • a structure that lives on disk
    • should not exceed a sector

This is designed by you!

Should at least track:

    • transaction status (committed or not)
    • where to apply logged blocks

14

15 of 51

How journaling works without crashes

15

16 of 51

Step 1: “log_begin()”

16

The Log

(on disk)

Rest of the Disk

Make sure the log is cleared

Log Header

commit = 0

17 of 51

Step 2: “log_write(data block 1)”

17

The Log

(on disk)

Rest of the Disk

Write into the log, rather than the place in the inode/extents region we want it to go

Also need to track the actual location of the data block so you know where to write logged blocks to on recovery!

Log Header

commit = 0

Data

Block 1

18 of 51

Step 3: “log_write(data block 2)”

18

The Log

(on disk)

Rest of the Disk

Write into the log, rather than the place in the inode/extents region we want it to go

Log Header

commit = 0

..

Data

Block 1

Data

Block 2

19 of 51

Step 4: “log_commit()” [1]

19

The Log

(on disk)

Data

Block 1

Data

Block 2

Rest of the Disk

Mark the log as “committed”

Log Header

commit = 1

20 of 51

Step 5: “log_commit()” [2]

20

The Log

(on disk)

Data

Block 1

Data

Block 2

Rest of the Disk

Data

Block 1

Copy the first block from log onto disk

Log Header

commit = 1

21 of 51

Step 6: “log_commit()” [3]

21

The Log

(on disk)

Data

Block 1

Data

Block 2

Rest of the Disk

Data

Block 1

Copy the second block from log onto disk

Data

Block 2

Log Header

commit = 1

..

22 of 51

Done!

22

The Log

(on disk)

Data

Block 1

Data

Block 2

Rest of the Disk

Data

Block 1

We have both data blocks 1 and 2 on disk - everything was successful.

For efficiency, we can zero out the commit flag so the system doesn’t try to redo this

Data

Block 2

Log Header

commit = 0

23 of 51

But what if we crash?

23

24 of 51

Example: before commit--CRASH

24

The Log

(on disk)

Data

Block 1

Rest of the Disk

On reboot (start up)…

There’s no commit in the log, so we should not copy anything to the disk

Log Header

commit = 0

25 of 51

Example: after commit, before clear–CRASH

25

The Log

(on disk)

Data

Block 1

Data

Block 2

Rest of the Disk

Data

Block 1

On reboot, we see that there is a commit flag

We can then copy block 1 and 2 to disk -- even though DB1 was already copied over, overwriting it with the same data is fine

Data

Block 2

Log Header

commit = 1

26 of 51

Where Do I put the Log?

It’s just blocks on disk, so you can put it anywhere you want (within reason)

  • After-bitmap, before-inodes is a pretty good place

26

27 of 51

Reflect the log on disk

In order to reflect log region in the initial disk image, what do you need to update?

  • Mkfs.c (what needs to be updated here?)
  • Superblock struct
    • to track the location of the log region

27

28 of 51

What should log_write() do differently?

  • log_write() instead of bwrite()
    • Just replace the bwrite calls with log_write!
  • Instead of writing the block to its location on disk, we want to:
    • Write the block information to our log region
    • Update the log header with the location of the block

28

29 of 51

How do we synchronize log access?

We recommend tracking a single transaction in the log

  • How do we ensure that log access remains synchronized?

29

30 of 51

Part A: Clarification & Tips

30

31 of 51

In-memory inode

31

inode cache

inum = 0,

ref = 0, valid = 0,

garbage…

unused cache entry

unused cache entry

NINODE

entries

struct inode

initially, all entries of icache.inodes are unused

*diagram skipped cached root dir inode for simplicity

32 of 51

In-memory inode

32

inode cache

unused cache entry

unused cache entry

NINODE

entries

struct inode

*diagram skipped cached root dir inode for simplicity

inum = 0,

ref = 0, valid = 0,

garbage…

user program:

open(file, O_RDWR)

33 of 51

In-memory inode

33

inode cache

inum = 0,

ref = 0, valid = 0,

garbage…

unused cache entry

unused cache entry

NINODE

entries

struct inode

*diagram skipped cached root dir inode for simplicity

user program:

open(file, O_RDWR)

kernel/fs.c:

iopen(file)

34 of 51

In-memory inode

34

inode cache

unused cache entry

unused cache entry

NINODE

entries

struct inode

*diagram skipped cached root dir inode for simplicity

inum = 0,

ref = 0, valid = 0,

garbage…

user program:

open(file, O_RDWR)

kernel/fs.c:

iopen(file)

kernel/fs.c:

namei(file)

*path traversal, finds inum 10 for file

35 of 51

In-memory inode

35

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

*in this example, inode 10 is opened for the first time,

if inode 10 is already cached, iget simply increments existing entry's ref count and returns a pointer to it

user program:

open(file, O_RDWR)

kernel/fs.c:

iopen(file)

kernel/fs.c:

namei(file)

*path traversal, finds inum 10 for file

kernel/fs.c:

iget(10)

*returns a pointer to the cached inode

36 of 51

In-memory inode

36

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

user program:

open(file, O_RDWR)

kernel/fs.c:

iopen(file)

kernel/fs.c:

namei(file)

*path traversal, finds inum 10 for file

kernel/fs.c:

iget(10)

*returns a pointer to the cached inode

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

inode=namei(file)

kernel/fs.c:

locki(inode)

37 of 51

In-memory inode

37

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

user program:

open(file, O_RDWR)

kernel/fs.c:

iopen(file)

kernel/fs.c:

namei(file)

*path traversal, finds inum 10 for file

kernel/fs.c:

iget(10)

*returns a pointer to the cached inode

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

freshly allocated (valid ==0) inode returned by iget does not contain accurate disk inode data yet (hence garbage)!

*in this example, inode 10 is opened for the first time,

if inode 10 is already cached, iget simply increments existing entry's ref count and returns a pointer to it

38 of 51

In-memory inode

38

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

kernel/fs.c:

locki(inode)

39 of 51

In-memory inode

39

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

kernel/fs.c:

locki(inode)

if inode is not valid

read in disk inode

kernel/fs.c:

read_dinode(10, …)

40 of 51

In-memory inode

40

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

kernel/fs.c:

locki(inode)

kernel/fs.c:

read_dinode(10, …)

kernel/fs.c:

readi(inodetable, …, INODEOFF(10))

*uses bread to request the disk block containing inode 10

if inode is not valid

read in disk inode

41 of 51

In-memory inode

41

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 1,

dinode data

allocated entry

struct inode

kernel/fs.c:

locki(inode)

kernel/fs.c:

read_dinode(10, …)

kernel/fs.c:

readi(inodetable, …, INODEOFF(10))

*uses bread to request the disk block containing inode 10

use the result of read_dinode to populate the in memory inode!

if inode is not valid

read in disk inode

42 of 51

In-memory inode

42

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 1,

dinode data

allocated entry

struct inode

kernel/fs.c:

locki(inode)

kernel/fs.c:

read_dinode(10, …)

use the result of read_dinode to populate the in memory inode!

kernel/fs.c:

readi(inodetable, …, INODEOFF(10))

*uses bread to request the disk block containing inode 10

And now this inode is ready to be used for fs operations!

if inode is not valid

read in disk inode

43 of 51

More on read_dinode

  • Do you need to call read_dinode anywhere yourself?
    • No, read_dinode is called in locki to cache the on disk inode into in memory inode
    • you only need to modify the in memory inode and write back changes via write_dinode

43

44 of 51

More on write_dinode

  • Should look just like read_dinode but calls writei instead of readi
  • When should write_dinode be called?
    • in writei, when the inode changes!
  • But wait, write_dinode calls writei, wouldn't there be an infinite recursion?
    • writei(file inode) => write_dinode(file inum) => writei(inodetable, INODEOFF(file inum))
    • does the last writei make changes to the inodetable's inode?
      • no! it's an overwrite! the writei to inodetable will not trigger more write_dinode!

44

45 of 51

More on write_dinode

  • write_dinode(inum, struct dinode *)
    • mhmm… what should I pass as arguments to write_dinode?
      • get inum from the cached inode
      • set up a temporary struct dinode and populate it with data from cached inode!

45

46 of 51

Bitmap API

You interact with the block bitmap via fs.c: balloc & bfree

  • balloc and bfree only updates cached bitmap sectors in memory
    • this is done by marking the specified bits in bp->data as free (0) or used (1) in bmark

  • if you want to write the changed bitmap sector back to disk, you must call bwrite yourself!
    • hint: you can update balloc and bfree

46

47 of 51

Block Cache API

You can read and write disk blocks via the Block Cache in bio.c !

    • brings blocks/sectors into memory and manages them (evict, writeback)
    • struct buf
      • metadata for managing buffer
      • buf->data = block data
      • buf->blockno = the cached block #, very helpful for debugging!
    • APIs
      • bread: brings the block into memory, locks (exclusive access) the cached block
      • bwrite: marks the block dirty and issues a write
      • brelse: releases the lock on the cached block

47

48 of 51

Part B: Concurrent FS Ops

48

49 of 51

Concurrent Create

  • When there are multiple create calls to a single file, only 1 process can actually create the file!
    • How can we achieve this?
      • only one process can look up whether the file exists and create it at a time!
      • time of check to time of use! the entire read modify needs to be atomic
        • hint: is there any lock on the path traversal that can be used to prevent concurrent lookups?

49

50 of 51

Concurrent Delete

  • When a file being deleted has open references, it cannot be deleted!
    • how to check if a file has open references?
      • all opened files have their inodes in the inode cache!
      • that's what each file info struct track to request fs operations!
  • When there are multiple delete calls to a single file, only 1 process can successfully delete the file!
    • should only allow one process to check whether the file has open reference and perform the delete atomically (similar to create)!

50

51 of 51

Questions?

51