Lab 4 Details
1
Administrivia
2
Agenda
3
Part C: Crash Safety
4
Guaranteeing Atomic Operations
Now that our file system is writable, we need to ensure that it is crash consistent. But what do we mean by crash consistent? Let’s take a look at an example:
Say we have “file.txt” which is 512 bytes long. We try to append 50 bytes to this file.
Is this operation atomic?
5
Guaranteeing Atomic Operations
No! We need to multiple block writes to accomplish this simple append:
That’s three block writes!
6
Guaranteeing Atomic Operations
What happens if we crash before we write the new file contents to disk?
When we reboot the system… We think “file.txt” is 562 bytes long, but the last 50 bytes are garbage, not what we tried to write!
So how do we solve this problem?
7
Journaling
For any operation which must write multiple disk blocks atomically…
On file system initialization(iinit) check the log for recovery
If not committed, do nothing
If so, apply the log (this is idempotent!)
8
Designing Your Log
9
Log API
10
More on log_write
11
More on log_commit_tx
12
Journaling: Quick Recap
For any operation which must write multiple disk blocks atomically…
On system boot, check the log. If not committed, do nothing. If so, redo the copy (copy is idempotent)
13
Log Header Format
This is designed by you!
Should at least track:
14
How journaling works without crashes
15
Step 1: “log_begin()”
16
The Log
(on disk)
Rest of the Disk
Make sure the log is cleared
Log Header
commit = 0
…
Step 2: “log_write(data block 1)”
17
The Log
(on disk)
Rest of the Disk
Write into the log, rather than the place in the inode/extents region we want it to go
Also need to track the actual location of the data block so you know where to write logged blocks to on recovery!
Log Header
commit = 0
…
Data
Block 1
Step 3: “log_write(data block 2)”
18
The Log
(on disk)
Rest of the Disk
Write into the log, rather than the place in the inode/extents region we want it to go
Log Header
commit = 0
..
Data
Block 1
Data
Block 2
Step 4: “log_commit()” [1]
19
The Log
(on disk)
Data
Block 1
Data
Block 2
Rest of the Disk
Mark the log as “committed”
Log Header
commit = 1
…
Step 5: “log_commit()” [2]
20
The Log
(on disk)
Data
Block 1
Data
Block 2
Rest of the Disk
Data
Block 1
Copy the first block from log onto disk
Log Header
commit = 1
…
Step 6: “log_commit()” [3]
21
The Log
(on disk)
Data
Block 1
Data
Block 2
Rest of the Disk
Data
Block 1
Copy the second block from log onto disk
Data
Block 2
Log Header
commit = 1
..
Done!
22
The Log
(on disk)
Data
Block 1
Data
Block 2
Rest of the Disk
Data
Block 1
We have both data blocks 1 and 2 on disk - everything was successful.
For efficiency, we can zero out the commit flag so the system doesn’t try to redo this
Data
Block 2
Log Header
commit = 0
…
But what if we crash?
23
Example: before commit--CRASH
24
The Log
(on disk)
Data
Block 1
Rest of the Disk
On reboot (start up)…
There’s no commit in the log, so we should not copy anything to the disk
Log Header
commit = 0
…
Example: after commit, before clear–CRASH
25
The Log
(on disk)
Data
Block 1
Data
Block 2
Rest of the Disk
Data
Block 1
On reboot, we see that there is a commit flag
We can then copy block 1 and 2 to disk -- even though DB1 was already copied over, overwriting it with the same data is fine
Data
Block 2
Log Header
commit = 1
…
Where Do I put the Log?
It’s just blocks on disk, so you can put it anywhere you want (within reason)
26
Reflect the log on disk
In order to reflect log region in the initial disk image, what do you need to update?
27
What should log_write() do differently?
28
How do we synchronize log access?
We recommend tracking a single transaction in the log
29
Part A: Clarification & Tips
30
In-memory inode
31
inode cache
inum = 0,
ref = 0, valid = 0,
garbage…
unused cache entry
unused cache entry
NINODE
entries
struct inode
initially, all entries of icache.inodes are unused
*diagram skipped cached root dir inode for simplicity
In-memory inode
32
inode cache
unused cache entry
unused cache entry
NINODE
entries
struct inode
*diagram skipped cached root dir inode for simplicity
inum = 0,
ref = 0, valid = 0,
garbage…
user program:
open(file, O_RDWR)
In-memory inode
33
inode cache
inum = 0,
ref = 0, valid = 0,
garbage…
unused cache entry
unused cache entry
NINODE
entries
struct inode
*diagram skipped cached root dir inode for simplicity
user program:
open(file, O_RDWR)
kernel/fs.c:
iopen(file)
In-memory inode
34
inode cache
unused cache entry
unused cache entry
NINODE
entries
struct inode
*diagram skipped cached root dir inode for simplicity
inum = 0,
ref = 0, valid = 0,
garbage…
user program:
open(file, O_RDWR)
kernel/fs.c:
iopen(file)
kernel/fs.c:
namei(file)
*path traversal, finds inum 10 for file
In-memory inode
35
inode cache
NINODE
entries
*diagram skipped cached root dir inode for simplicity
allocated entry
inum = 10,
ref = 1, valid = 0,
garbage…
allocated entry
struct inode
*in this example, inode 10 is opened for the first time,
if inode 10 is already cached, iget simply increments existing entry's ref count and returns a pointer to it
user program:
open(file, O_RDWR)
kernel/fs.c:
iopen(file)
kernel/fs.c:
namei(file)
*path traversal, finds inum 10 for file
kernel/fs.c:
iget(10)
*returns a pointer to the cached inode
In-memory inode
36
inode cache
NINODE
entries
*diagram skipped cached root dir inode for simplicity
user program:
open(file, O_RDWR)
kernel/fs.c:
iopen(file)
kernel/fs.c:
namei(file)
*path traversal, finds inum 10 for file
kernel/fs.c:
iget(10)
*returns a pointer to the cached inode
allocated entry
inum = 10,
ref = 1, valid = 0,
garbage…
allocated entry
struct inode
inode=namei(file)
kernel/fs.c:
locki(inode)
In-memory inode
37
inode cache
NINODE
entries
*diagram skipped cached root dir inode for simplicity
user program:
open(file, O_RDWR)
kernel/fs.c:
iopen(file)
kernel/fs.c:
namei(file)
*path traversal, finds inum 10 for file
kernel/fs.c:
iget(10)
*returns a pointer to the cached inode
allocated entry
inum = 10,
ref = 1, valid = 0,
garbage…
allocated entry
struct inode
freshly allocated (valid ==0) inode returned by iget does not contain accurate disk inode data yet (hence garbage)!
*in this example, inode 10 is opened for the first time,
if inode 10 is already cached, iget simply increments existing entry's ref count and returns a pointer to it
In-memory inode
38
inode cache
NINODE
entries
*diagram skipped cached root dir inode for simplicity
allocated entry
inum = 10,
ref = 1, valid = 0,
garbage…
allocated entry
struct inode
kernel/fs.c:
locki(inode)
In-memory inode
39
inode cache
NINODE
entries
*diagram skipped cached root dir inode for simplicity
allocated entry
inum = 10,
ref = 1, valid = 0,
garbage…
allocated entry
struct inode
kernel/fs.c:
locki(inode)
if inode is not valid
read in disk inode
kernel/fs.c:
read_dinode(10, …)
In-memory inode
40
inode cache
NINODE
entries
*diagram skipped cached root dir inode for simplicity
allocated entry
inum = 10,
ref = 1, valid = 0,
garbage…
allocated entry
struct inode
kernel/fs.c:
locki(inode)
kernel/fs.c:
read_dinode(10, …)
kernel/fs.c:
readi(inodetable, …, INODEOFF(10))
*uses bread to request the disk block containing inode 10
if inode is not valid
read in disk inode
In-memory inode
41
inode cache
NINODE
entries
*diagram skipped cached root dir inode for simplicity
allocated entry
inum = 10,
ref = 1, valid = 1,
dinode data
allocated entry
struct inode
kernel/fs.c:
locki(inode)
kernel/fs.c:
read_dinode(10, …)
kernel/fs.c:
readi(inodetable, …, INODEOFF(10))
*uses bread to request the disk block containing inode 10
use the result of read_dinode to populate the in memory inode!
if inode is not valid
read in disk inode
In-memory inode
42
inode cache
NINODE
entries
*diagram skipped cached root dir inode for simplicity
allocated entry
inum = 10,
ref = 1, valid = 1,
dinode data
allocated entry
struct inode
kernel/fs.c:
locki(inode)
kernel/fs.c:
read_dinode(10, …)
use the result of read_dinode to populate the in memory inode!
kernel/fs.c:
readi(inodetable, …, INODEOFF(10))
*uses bread to request the disk block containing inode 10
And now this inode is ready to be used for fs operations!
if inode is not valid
read in disk inode
More on read_dinode
43
More on write_dinode
44
More on write_dinode
45
Bitmap API
You interact with the block bitmap via fs.c: balloc & bfree
46
Block Cache API
You can read and write disk blocks via the Block Cache in bio.c !
47
Part B: Concurrent FS Ops
48
Concurrent Create
49
Concurrent Delete
50
Questions?
51