1 of 51

Lab 4 Details

1

2 of 51

Administrivia

Lab 4 Design Doc due Friday 5/30! (no late days)
Lab 4 Code + Questions due 6/9 (No grace period)

2

3 of 51

Agenda

Part C: Crash Safety
Part A clarification & tips
Part B

3

4 of 51

Part C: Crash Safety

4

5 of 51

Guaranteeing Atomic Operations

Now that our file system is writable, we need to ensure that it is crash consistent. But what do we mean by crash consistent? Let’s take a look at an example:

Say we have “file.txt” which is 512 bytes long. We try to append 50 bytes to this file.

Is this operation atomic?

5

6 of 51

Guaranteeing Atomic Operations

No! We need to multiple block writes to accomplish this simple append:

Invoke file_write
Compute new file size
Update size on disk (dinode)
Update file contents in memory
Write the new file contents to disk

That’s three block writes!

6

7 of 51

Guaranteeing Atomic Operations

What happens if we crash before we write the new file contents to disk?

Invoke file_write
Compute new file size
Update size on disk (dinode)
Update file contents in memory
Write the new file contents to disk CRASH

When we reboot the system… We think “file.txt” is 562 bytes long, but the last 50 bytes are garbage, not what we tried to write!

So how do we solve this problem?

7

8 of 51

Journaling

For any operation which must write multiple disk blocks atomically…

Write new blocks into the log, rather than target place. Track what target is.
Once all blocks are in the log, mark the log as “committed”
Copy data from the log to where they should be (apply the log!)
Clear the commit flag

On file system initialization(iinit) check the log for recovery

If not committed, do nothing

If so, apply the log (this is idempotent!)

8

Why do we log?

Without crash safety, files could end up having inconsistent data. We want to ensure our file system is either entirely the new data, or the old data

Example:

Say we have a txt file with size = 512, and we want to append 50 bytes to it. We will need multiple block writes

Updating dinode size to 562
Adding new sector for the 50 bytes
Updating bitmap to reflect newly allocated sector

Because this is not atomic, we could crash after we update file contents in memory, but before we write to disc.

So, when we reboot the system, we will think the txt file is 562 bytes long, but the last 50 bytes are garbage and not what we tried to write

Idempotent meaning the operation can be repeated multiple times without changing the outcome (essentially we can apply the log as many times as we want to)

9 of 51

Designing Your Log

Specify a log header (metadata for the log)

a structure that lives on disk
should not exceed a sector

Designed by you! Should at least track:

transaction status (committed or not)
usage status of log region
where to apply logged blocks

9

10 of 51

Log API

The spec recommends designing an API for yourself for log operations:

log_begin_tx: (optional) begin the process of a transaction
log_write: wrapper function around normal block writes
log_commit_tx: complete a transaction and write out the commit block
log_apply: apply the actual content of the log

use at commit time and during recovery time

10

11 of 51

12 of 51

13 of 51

Journaling: Quick Recap

For any operation which must write multiple disk blocks atomically…

Write new blocks into the log, rather than target place. Track what target is.
Once all blocks are in the log, mark the log as “committed”
Copy data from the log to where they should be
Clear the commit flag

On system boot, check the log. If not committed, do nothing. If so, redo the copy (copy is idempotent)

13

14 of 51

Log Header Format

Log header = metadata for the log

a structure that lives on disk
should not exceed a sector

This is designed by you!

Should at least track:

transaction status (committed or not)
where to apply logged blocks

14

15 of 51

How journaling works without crashes

15

16 of 51

Step 1: “log_begin()”

16

The Log

(on disk)

Rest of the Disk

Make sure the log is cleared

Log Header

commit = 0

…

17 of 51

Step 2: “log_write(data block 1)”

17

The Log

(on disk)

Rest of the Disk

Write into the log, rather than the place in the inode/extents region we want it to go

Also need to track the actual location of the data block so you know where to write logged blocks to on recovery!

Log Header

commit = 0

…

Data

Block 1

18 of 51

Step 3: “log_write(data block 2)”

18

The Log

(on disk)

Rest of the Disk

Write into the log, rather than the place in the inode/extents region we want it to go

Log Header

commit = 0

..

Data

Block 1

Data

Block 2

19 of 51

Step 4: “log_commit()” [1]

19

The Log

(on disk)

Data

Block 1

Data

Block 2

Rest of the Disk

Mark the log as “committed”

Log Header

commit = 1

…

20 of 51

Step 5: “log_commit()” [2]

20

The Log

(on disk)

Data

Block 1

Data

Block 2

Rest of the Disk

Data

Block 1

Copy the first block from log onto disk

Log Header

commit = 1

…

21 of 51

Step 6: “log_commit()” [3]

21

The Log

(on disk)

Data

Block 1

Data

Block 2

Rest of the Disk

Data

Block 1

Copy the second block from log onto disk

Data

Block 2

Log Header

commit = 1

..

22 of 51

Done!

22

The Log

(on disk)

Data

Block 1

Data

Block 2

Rest of the Disk

Data

Block 1

We have both data blocks 1 and 2 on disk - everything was successful.

For efficiency, we can zero out the commit flag so the system doesn’t try to redo this

Data

Block 2

Log Header

commit = 0

…

23 of 51

But what if we crash?

23

24 of 51

Example: before commit--CRASH

24

The Log

(on disk)

Data

Block 1

Rest of the Disk

On reboot (start up)…

There’s no commit in the log, so we should not copy anything to the disk

Log Header

commit = 0

…

25 of 51

Example: after commit, before clear–CRASH

25

The Log

(on disk)

Data

Block 1

Data

Block 2

Rest of the Disk

Data

Block 1

On reboot, we see that there is a commit flag

We can then copy block 1 and 2 to disk -- even though DB1 was already copied over, overwriting it with the same data is fine

Data

Block 2

Log Header

commit = 1

…

26 of 51

Where Do I put the Log?

It’s just blocks on disk, so you can put it anywhere you want (within reason)

After-bitmap, before-inodes is a pretty good place

26

27 of 51

Reflect the log on disk

In order to reflect log region in the initial disk image, what do you need to update?

Mkfs.c (what needs to be updated here?)
Superblock struct

to track the location of the log region

27

28 of 51

What should log_write() do differently?

log_write() instead of bwrite()

Just replace the bwrite calls with log_write!

Instead of writing the block to its location on disk, we want to:

Write the block information to our log region
Update the log header with the location of the block

28

29 of 51

How do we synchronize log access?

We recommend tracking a single transaction in the log

How do we ensure that log access remains synchronized?

29

30 of 51

Part A: Clarification & Tips

30

31 of 51

In-memory inode

31

inode cache

inum = 0,

ref = 0, valid = 0,

garbage…

unused cache entry

NINODE

entries

struct inode

initially, all entries of icache.inodes are unused

*diagram skipped cached root dir inode for simplicity

32 of 51

In-memory inode

32

inode cache

unused cache entry

NINODE

entries

struct inode

*diagram skipped cached root dir inode for simplicity

inum = 0,

ref = 0, valid = 0,

garbage…

user program:

open(file, O_RDWR)

33 of 51

In-memory inode

33

inode cache

inum = 0,

ref = 0, valid = 0,

garbage…

unused cache entry

NINODE

entries

struct inode

*diagram skipped cached root dir inode for simplicity

user program:

open(file, O_RDWR)

kernel/fs.c:

iopen(file)

34 of 51

In-memory inode

34

inode cache

unused cache entry

NINODE

entries

struct inode

*diagram skipped cached root dir inode for simplicity

inum = 0,

ref = 0, valid = 0,

garbage…

user program:

open(file, O_RDWR)

kernel/fs.c:

iopen(file)

kernel/fs.c:

namei(file)

*path traversal, finds inum 10 for file

35 of 51

In-memory inode

35

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

*in this example, inode 10 is opened for the first time,

if inode 10 is already cached, iget simply increments existing entry's ref count and returns a pointer to it

user program:

open(file, O_RDWR)

kernel/fs.c:

iopen(file)

kernel/fs.c:

namei(file)

*path traversal, finds inum 10 for file

kernel/fs.c:

iget(10)

*returns a pointer to the cached inode

36 of 51

In-memory inode

36

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

user program:

open(file, O_RDWR)

kernel/fs.c:

iopen(file)

kernel/fs.c:

namei(file)

*path traversal, finds inum 10 for file

kernel/fs.c:

iget(10)

*returns a pointer to the cached inode

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

inode=namei(file)

kernel/fs.c:

locki(inode)

37 of 51

In-memory inode

37

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

user program:

open(file, O_RDWR)

kernel/fs.c:

iopen(file)

kernel/fs.c:

namei(file)

*path traversal, finds inum 10 for file

kernel/fs.c:

iget(10)

*returns a pointer to the cached inode

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

freshly allocated (valid ==0) inode returned by iget does not contain accurate disk inode data yet (hence garbage)!

*in this example, inode 10 is opened for the first time,

if inode 10 is already cached, iget simply increments existing entry's ref count and returns a pointer to it

38 of 51

In-memory inode

38

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

kernel/fs.c:

locki(inode)

39 of 51

In-memory inode

39

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

kernel/fs.c:

locki(inode)

if inode is not valid

read in disk inode

kernel/fs.c:

read_dinode(10, …)

40 of 51

In-memory inode

40

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 0,

garbage…

allocated entry

struct inode

kernel/fs.c:

locki(inode)

kernel/fs.c:

read_dinode(10, …)

kernel/fs.c:

readi(inodetable, …, INODEOFF(10))

*uses bread to request the disk block containing inode 10

if inode is not valid

read in disk inode

41 of 51

In-memory inode

41

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 1,

dinode data

allocated entry

struct inode

kernel/fs.c:

locki(inode)

kernel/fs.c:

read_dinode(10, …)

kernel/fs.c:

readi(inodetable, …, INODEOFF(10))

*uses bread to request the disk block containing inode 10

use the result of read_dinode to populate the in memory inode!

if inode is not valid

read in disk inode

42 of 51

In-memory inode

42

inode cache

NINODE

entries

*diagram skipped cached root dir inode for simplicity

allocated entry

inum = 10,

ref = 1, valid = 1,

dinode data

allocated entry

struct inode

kernel/fs.c:

locki(inode)

kernel/fs.c:

read_dinode(10, …)

use the result of read_dinode to populate the in memory inode!

kernel/fs.c:

readi(inodetable, …, INODEOFF(10))

*uses bread to request the disk block containing inode 10

And now this inode is ready to be used for fs operations!

if inode is not valid

read in disk inode

43 of 51

44 of 51

45 of 51

46 of 51

Bitmap API

You interact with the block bitmap via fs.c: balloc & bfree

balloc and bfree only updates cached bitmap sectors in memory

this is done by marking the specified bits in bp->data as free (0) or used (1) in bmark

if you want to write the changed bitmap sector back to disk, you must call bwrite yourself!

hint: you can update balloc and bfree

46

47 of 51

Block Cache API

You can read and write disk blocks via the Block Cache in bio.c !

brings blocks/sectors into memory and manages them (evict, writeback)
struct buf

metadata for managing buffer
buf->data = block data
buf->blockno = the cached block #, very helpful for debugging!

APIs

bread: brings the block into memory, locks (exclusive access) the cached block
bwrite: marks the block dirty and issues a write
brelse: releases the lock on the cached block

47

48 of 51

Part B: Concurrent FS Ops

48

49 of 51

Concurrent Create

When there are multiple create calls to a single file, only 1 process can actually create the file!

How can we achieve this?

only one process can look up whether the file exists and create it at a time!
time of check to time of use! the entire read modify needs to be atomic

hint: is there any lock on the path traversal that can be used to prevent concurrent lookups?

49

50 of 51

Concurrent Delete

When a file being deleted has open references, it cannot be deleted!

how to check if a file has open references?

all opened files have their inodes in the inode cache!
that's what each file info struct track to request fs operations!

When there are multiple delete calls to a single file, only 1 process can successfully delete the file!

should only allow one process to check whether the file has open reference and perform the delete atomically (similar to create)!

50

51 of 51

Questions?

51