1 of 30

Week 4

File Systems

2 of 30

Building the Root File System

  • One of the first tasks of an OS during bootup is to build the root file system
  • Locate all bootable media
    • Internal and external hard disks
    • SSDs
    • Floppy disks, CDs, DVDs, USB sticks
  • Locate all the partitions on each media
    • Read MBR(s), extended partition tables, etc.
  • Mount one or more partitions
    • Makes the file system(s) available for access

2

3 of 30

The Master Boot Record

3

MBR

Partition 1

(ext3)

Partition 2

(swap)

Partition 3

(NTFS)

Partition 4

(FAT32)

Address

Description

Size (Bytes)

Hex

Dec.

0x000

0

Bootstrap code area

446

0x1BE

446

Partition Entry #1

16

0x1CE

462

Partition Entry #2

16

0x1DE

478

Partition Entry #3

16

0x1EE

494

Partition Entry #4

16

0x1FE

510

Magic Number

2

Total:

512

Includes the starting LBA and length of the partition

Disk 1

MBR

Partition 1

(NTFS)

Disk 2

4 of 30

Extended Partitions

  • In some cases, you may want >4 partitions
  • Modern OSes support extended partitions

4

Partition 1

(ext3)

Partition 2

(swap)

Partition 3

(Extended Partition)

Partition 4

(FAT32)

Disk 1

Logical Partition 1

(NTFS)

Logical Partition 2

(NTFS)

  • Extended partitions may use OS-specific partition table formats (meta-data)
    • Thus, other OSes may not be able to read the logical partitions

MBR

Ext. Part.

5 of 30

Types of Root File Systems

  • Windows exposes a multi-rooted system
    • Each device and partition is assigned a letter
    • Internally, a single root is maintained
  • Linux has a single root
    • One partition is mounted as /
    • All other partitions are mounted somewhere under /
  • Typically, the partition containing the kernel is mounted as / or C:

5

[cbw@ativ9 ~] df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda7 39G 14G 23G 38% /

/dev/sda2 296M 48M 249M 16% /boot/efi

/dev/sda5 127G 86G 42G 68% /media/cbw/Data

/dev/sda4 61G 34G 27G 57% /media/cbw/Windows

/dev/sdb1 1.9G 352K 1.9G 1% /media/cbw/NDSS-2013

1 drive, 4 partitions

1drive, 1 partition

6 of 30

Mounting a File System

  1. Read the super block for the target file system
    • Contains meta-data about the file system
    • Version, size, locations of key structures on disk, etc.
  2. Determine the mount point
    • On Windows: pick a drive letter
    • On Linux: mount the new file system under a specific directory

6

Filesystem Size Used Avail Use% Mounted on

/dev/sda5 127G 86G 42G 68% /media/cbw/Data

/dev/sda4 61G 34G 27G 57% /media/cbw/Windows

/dev/sdb1 1.9G 352K 1.9G 1% /media/cbw/NDSS-2013

7 of 30

Virtual File System Interface

  • Problem: the OS may mount several partitions containing different underlying file systems
    • It would be bad if processes had to use different APIs for different file systems
  • Linux uses a Virtual File System interface (VFS)
    • Exposes POSIX APIs to processes
    • Forwards requests to lower-level file system specific drivers
  • Windows uses a similar system

7

8 of 30

VFS Flowchart

8

Kernel

Process 1

Process 2

Process 3

Virtual File System Interface

ext3 Driver

NTFS Driver

FAT32 Driver

ext3 Partition

NTFS Partition

FAT32 Partition

Processes (usually) don’t need to know about low-level file system details

Relatively simple to add additional file system drivers

9 of 30

Mount isn’t Just for Bootup

  • When you plug storage devices into your running system, mount is executed in the background
  • Example: plugging in a USB stick
  • What does it mean to “safely eject” a device?
    • Flush cached writes to that device
    • Cleanly unmount the file system on that device

9

10 of 30

Status Check

  • At this point, the OS can locate and mount partitions
  • Next step: what is the on-disk layout of the file system?
    • We expect certain features from a file system
      • Named files
      • Nested hierarchy of directories
      • Meta-data like creation time, file permissions, etc.
    • How do we design on-disk structures that support these features?

10

11 of 30

The Directory Tree

  • Navigated using a path
    • E.g. /home/amislove/music.mp3

11

home

/ (root)

bin

tmp

python

cbw

amislove

cs5600

12 of 30

Absolute and Relative Paths

  • Two types of file system paths
    • Absolute
      • Full path from the root to the object
      • Example: /home/cbw/cs5600/hw4.pdf
      • Example: C:\Users\cbw\Documents\
    • Relative
      • OS keeps track of the working directory for each process
      • Path relative to the current working directory
      • Examples [working directory = /home/cbw]:
        • syllabus.docx [ 🡪 /home/cbw/syllabus.docx]
        • cs5600/hw4.pdf [ 🡪 /home/cbw/cs5600/hw4.pdf]
        • ./cs5600/hw4.pdf [ 🡪 /home/cbw/cs5600/hw4.pdf]
        • ../amislove/music.mp3 [ 🡪 /home/amislove/music.mp3]

12

13 of 30

Files

  • A file is a composed of two components
    • The file data itself
      • One or more blocks (sectors) of binary data
      • A file can contain anything
    • Meta-data about the file
      • Name, total size
      • What directory is it in?
      • Created time, modified time, access time
      • Hidden or system file?
      • Owner and owner’s group
      • Permissions: read/write/execute

13

14 of 30

File Extensions

  • File name are often written in dotted notation
    • E.g. program.exe, image.jpg, music.mp3
  • A file’s extension does not mean anything
    • Any file (regardless of its contents) can be given any name or extension

14

Rename

  • Graphical shells (like Windows explorer) use extensions to try and match files 🡪 programs
    • This mapping may fail for a variety of reasons

Has the data in the file changed from music to an image?

15 of 30

More File Meta-Data

  • Files have additional meta-data that is not typically shown to users
    • Unique identifier (file names may not be unique)
    • Structure that maps the file to blocks on the disk
  • Managing the mapping from files to blocks is one of the key jobs of the file system

15

Disk

16 of 30

Mapping Files to Blocks

  • Every file is composed of >=1 blocks
  • Key question: how do we map a file to its blocks?

16

0

2

3

4

5

7

9

1

8

6

[1]

[4, 5, 7, 8]

[6]

List of blocks

0

2

3

4

5

6

8

1

7

9

(1, 1)

(4, 4)

(9, 1)

As (start, length) pairs

  • Problem?
    • Really large files
  • Problem?
    • Fragmentation
    • E.g. try to add a new file with 3 blocks

17 of 30

Directories

  • Traditionally, file systems have used a hierarchical, tree-structured namespace
    • Directories are objects that contain other objects
      • i.e. a directory may (or may not) have children
    • Files are leaves in the tree
  • By default, directories contain at least two entries

17

/ (root)

bin

python

.

..

“.” self pointer

“..” points the the parents directory

18 of 30

More on Directories

  • Directories have associated meta-data
    • Name, number of entries
    • Created time, modified time, access time
    • Permissions (read/write), owner, and group
  • The file system must encode directories and store them on the disk
    • Typically, directories are stored as a special type of file
    • File contains a list of entries inside the directory, plus some meta-data for each entry

18

19 of 30

Example Directory File

19

2

3

4

5

6

7

8

9

Disk

C:\

Windows

Users

C:\

Name

Index

Dir?

Perms

.

2

Y

rwx

Windows

3

Y

rwx

Users

4

Y

rwx

pagefile.sys

5

N

r

1

0

pagefile.sys

20 of 30

Directory File Implementation

  • Each directory file stores many entries
  • Key Question: how do you encode the entries?

Name

Index

Dir?

Perms

.

2

Y

rwx

Windows

3

Y

rwx

Users

4

Y

rwx

pagefile.sys

5

N

r

Unordered List of Entries

  • Good: O(1) to add new entries
    • Just append to the file
  • Bad: O(n) to search for an entry

Name

Index

Dir?

Perms

.

2

Y

rwx

pagefile.sys

5

N

r

Users

4

Y

rwx

Windows

3

Y

rwx

Sorted List of Entries

  • Good: O(log n) to search for an entry
  • Bad: O(n) to add new entries
    • Entire file has the be rewritten
  • Other alternatives: hash tables, B-trees
    • More on B-trees later…
  • In practice, implementing directory files is complicated
    • Example: do filenames have a fixed, maximum length or variable length?

21 of 30

File Allocation Tables (FAT)

  • Simple file system popularized by MS-DOS
    • First introduced in 1977
    • Most devices today use the FAT32 spec from 1996
    • FAT12, FAT16, VFAT, FAT32, etc.
  • Still quite popular today
    • Default format for USB sticks and memory cards
    • Used for EFI boot partitions
  • Name comes from the index table used to track directories and files

21

22 of 30

22

Super Block

Disk

  • Stores basic info about the file system
  • FAT version, location of boot files
  • Total number of blocks
  • Index of the root directory in the FAT
  • Store file and directory data
  • Each block is a fixed size (4KB – 64KB)
  • Files may span multiple blocks
  • File allocation table (FAT)
  • Marks which blocks are free or in-use
  • Linked-list structure to manage large files

23 of 30

  • Directories are special files
    • File contains a list of entries inside the directory
  • Possible values for FAT entries:
    • 0 – entry is empty
    • 1 – reserved by the OS
    • 1 < N < 0xFFFF – next block in a chain
    • 0xFFFF – end of a chain

23

2

3

4

5

6

7

8

9

Super Block

Disk

C:\

Windows

Users

2

3

4

5

6

7

8

9

Root directory index = 2

C:\

Name

Index

Dir?

Perms

.

2

Y

rwx

Windows

3

Y

rwx

Users

4

Y

rwx

pagefile.sys

5

N

r

24 of 30

Fat Table Entries

  • len(FAT) == Number of clusters on the disk
    • Max number of files/directories is bounded
    • Decided when you format the partition
  • The FAT version roughly corresponds to the size in bits of each FAT entry
    • E.g. FAT16 🡪 each FAT entry is 16 bits
    • More bits 🡪 larger disks are supported

24

25 of 30

Fragmentation

  • Blocks for a file need not be contiguous

25

68

67

65

64

63

62

61

60

59

58

57

56

0

61

67

0

58

0

0xFFFF

0

0

65

0

0

68

67

65

64

63

62

61

60

59

58

57

56

FAT

Blocks

Start

End

Possible values for FAT entries:

    • 0 – entry is empty
    • 1 < N < 0xFFFF – next block in a chain
    • 0xFFFF – end of a chain

26 of 30

FAT: The Good and the Bad

  • The Good – FAT supports:
    • Hierarchical tree of directories and files
    • Variable length files
    • Basic file and directory meta-data
  • The Bad
    • At most, FAT32 supports 2TB disks
    • Locating free chunks requires scanning the entire FAT
    • Prone to internal and external fragmentation
      • Large blocks 🡪 internal fragmentation
    • Reads require a lot of random seeking

26

27 of 30

File Systems for SSDs

  • SSD hardware constraints
    • To implement wear leveling, writes must be spread across the blocks of flash
    • Periodically, old blocks need to be garbage collected to prevent write-amplification
  • Does this sounds familiar?
  • LFS is the ideal file system for SSDs!
  • Internally, SSDs manage all files in a LFS
    • This is transparent to the OS and end-users
    • Ideal for wear-leveling and avoiding write-amplification

27

28 of 30

Copy-on-write

  • Modern file systems incorporate ideas from LFS
  • Copy-on-write sematics
    • Updated data is written to empty space on disk, rather than overwriting the original data
    • Helps prevent data corruption, improves sequential write performance
  • Pioneered by LFS, now used in ZFS and btrfs
    • btrfs will probably be the next default file system in Linux

28

29 of 30

Versioning File Systems

  • LFS keeps old copies of data by default
  • Old versions of files may be useful!
    • Example: accidental file deletion
    • Example: accidentally doing open(file, ‘w’) on a file full of data
  • Turn LFS flaw into a virtue
  • Many modern file systems are versioned
    • Old copies of data are exposed to the user
    • The user may roll-back a file to recover old versions

29

30 of 30

Thank You

30